maiden eagle Jun 7, 2019, 1:26 AM

#

Could be a bunch of stuff

#

Usually when I get that error it's because my array is dtype.object because I wasn't careful about building my array

#

The fact that X.shape[1] returns a value error makes me think you have an array of lists or something

desert oar Jun 7, 2019, 1:50 AM

#

@median siren you'll need to show your code or a sample of the data or something

median siren Jun 7, 2019, 1:51 AM

#

Alright

#

📎 unknown.png

lapis sequoia Jun 7, 2019, 1:54 AM

#

check the values first

#

before trying to fit

desert oar Jun 7, 2019, 1:57 AM

#

@median siren you have an array of arrays

#

that might be causing problems

median siren Jun 7, 2019, 2:01 AM

#

That makes sense right, considering my data points represents a vector. So my X_train variable is a list of vectors?

desert oar Jun 7, 2019, 2:03 AM

#

well, no, it's a dataframe

#

usually you don't get an array of arrays when you use .values

#

something is funny in your data

median siren Jun 7, 2019, 2:07 AM

#

🤔

median siren Jun 7, 2019, 2:37 AM

#

Yes, I figured out what's wrong.

hard veldt Jun 7, 2019, 4:11 AM

#

hey does anyone have a recommendation for a free online course to learn ML

lean ledge Jun 7, 2019, 4:44 AM

#

Check pinned

karmic geyser Jun 7, 2019, 4:55 AM

#

@desert oar Hey I tried your code. I think I need to edit the cython module or change the data type I'm passing. "ValueError: Buffer dtype mismatch, expected 'DTYPE_t' but got 'float'"

lapis sequoia Jun 7, 2019, 7:08 AM

#

And why do we use fit_transform() on training set and only transform() on test set?

charred onyx Jun 7, 2019, 7:14 AM

#

We use fit_transform() on the train data so that we learn the parameters of scaling on the train data and in the same time we scale the train data.

lapis sequoia Jun 7, 2019, 7:19 AM

#

parameters of scaling on train data means?

sand reef Jun 7, 2019, 7:32 AM

#

Does anyone know how do I fix the issue with TensorBoardColab? The one where I make the tensorboard and pass it in, and it says:

#

AttributeError: TensorBoardColab does not have a parameter, on_batch_training_begin()

#

or something close to that, let me get the error

#

AttributeError                            Traceback (most recent call last)
<ipython-input-6-98410153379f> in <module>()
      1 with tf.Session() as sess:
      2   sess.run(tf.global_variables_initializer())
----> 3   model.fit(X, Y, batch_size = 32, epochs = 10,validation_split = 0.1, callbacks = [TensorBoardColabCallback(tbc)])

2 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, max_queue_size, workers, use_multiprocessing, **kwargs)
    878           initial_epoch=initial_epoch,
    879           steps_per_epoch=steps_per_epoch,
--> 880           validation_steps=validation_steps)
    881 
    882   def evaluate(self,

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training_arrays.py in model_iteration(model, inputs, targets, sample_weights, batch_size, epochs, verbose, callbacks, val_inputs, val_targets, val_sample_weights, shuffle, initial_epoch, steps_per_epoch, validation_steps, mode, validation_in_fit, **kwargs)
    323         # Callbacks batch_begin.
    324         batch_logs = {'batch': batch_index, 'size': len(batch_ids)}
--> 325         callbacks._call_batch_hook(mode, 'begin', batch_index, batch_logs)
    326         progbar.on_batch_begin(batch_index, batch_logs)
    327 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/callbacks.py in _call_batch_hook(self, mode, hook, batch, logs)
    194     t_before_callbacks = time.time()
    195     for callback in self.callbacks:
--> 196       batch_hook = getattr(callback, hook_name)
    197       batch_hook(batch, logs)
    198     self._delta_ts[hook_name].append(time.time() - t_before_callbacks)

AttributeError: 'TensorBoardColabCallback' object has no attribute 'on_train_batch_begin'```

#

and here is the model made

#

import tensorflow as tf
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Flatten, Dense, Activation, Conv2D, MaxPooling2D
from google.colab import drive
drive.mount('/content/drive')
import pickle
X = pickle.load(open('/content/drive/My Drive/data/X.pickle', 'rb'))
Y = pickle.load(open('/content/drive/My Drive/data/Y.pickle', 'rb'))
model = Sequential()
model.add(Conv2D(64, (3,3), input_shape = X.shape[1:]))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size = (2,2)))

model.add(Conv2D(64,(3,3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size = (2,2)))

model.add(Flatten())
model.add(Dense(64))
model.add(Activation("relu"))

model.add(Dense(1))
model.add(Activation("sigmoid"))

model.compile(loss = "binary_crossentropy", optimizer = "adam", metrics = ["accuracy"])
!pip install -U tensorboardcolab
from tensorboardcolab import TensorBoardColab, TensorBoardColabCallback
tbc = TensorBoardColab()
with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  model.fit(X, Y, batch_size = 32, epochs = 10,validation_split = 0.1, callbacks = [TensorBoardColabCallback(tbc)])```

#

pls halp

lapis sequoia Jun 7, 2019, 7:49 AM

#

tensorboard is to visualize the flow of your program?

#

can you look it up and see where it fails?

#

oh

#

misunderstood your problem

#

try this

#

tbCallBack = TensorBoard()

#

and use that in model.fit

#

i'm not sure if you need to pass arguments into TensorBoard.. but I think you probably might need to

#

@sand reef

sand reef Jun 7, 2019, 8:05 AM

#

But the TensorBoardColab is what is imported, the regular one is only TensorBoardv2.0 which is supported by Colab

#

The TensorBoardv2.0 is having another set of issues of not reading any of my tensorboard event files

#

despite me running ngork on it

olive willow Jun 7, 2019, 8:45 AM

#

btw guys what should I know before starting to learn calc?

sand reef Jun 7, 2019, 9:30 AM

#

functions

olive willow Jun 7, 2019, 9:31 AM

#

so f(x)

sand reef Jun 7, 2019, 9:31 AM

#

and a bit of set theory

olive willow Jun 7, 2019, 9:31 AM

#

f(x) = x^2

#

what's that? set theory

sand reef Jun 7, 2019, 9:31 AM

#

like x is a real number or x belongs to an interval between 2 and 5

#

that notation

#

f:x->x

olive willow Jun 7, 2019, 9:31 AM

#

what's an interval idk the english terms that good

sand reef Jun 7, 2019, 9:32 AM

#

yeah, that sort of stuff, check it out

#

cuz they will use a lot of that weird notation

olive willow Jun 7, 2019, 9:32 AM

#

sure I'm doing linear algebra rn after that it's calc and stuff

#

and after that the holy motherland MACHINE LEARNING

sand reef Jun 7, 2019, 9:33 AM

#

okay

spice cargo Jun 7, 2019, 9:53 AM

#

thanks @lean ledge

desert oar Jun 7, 2019, 11:05 AM

#

@karmic geyser I did warn you it was untested 😃 but the error means what it says

#

What is a bit weird is that DTYPE_t should be np.float32

#

Maybe the issue is native python float vs numpy float

lost sinew Jun 7, 2019, 1:30 PM

#

import pandas as pd

df_btcusdt = pd.read_csv("BTCUSDT.csv", parse_dates=True, index_col=0)
df_ethusdt = pd.read_csv("ETHUSDT.csv", parse_dates=True, index_col=0)
df_ltcusdt = pd.read_csv("LTCUSDT.csv", parse_dates=True, index_col=0)

df_btcusdt = df_btcusdt.drop(columns=['Open', 'High', 'Low', 'Volume'])
df_ethusdt = df_ethusdt.drop(columns=['Open', 'High', 'Low', 'Volume'])
df_ltcusdt = df_ltcusdt.drop(columns=['Open', 'High', 'Low', 'Volume'])

df_btcusdt.rename(columns={'Close':'BTCUSDT Close'}, inplace=True)
df_ethusdt.rename(columns={'Close':'ETHUSDT Close'}, inplace=True)
df_ltcusdt.rename(columns={'Close':'LTCUSDT Close'}, inplace=True)


main_df = pd.concat([df_btcusdt, df_ethusdt, df_ltcusdt], axis=1, sort=False)

print(main_df.corr())

#

how do i make this code neater?

desert oar Jun 7, 2019, 1:33 PM

#

@lost sinew inplace= i think is discouraged nowadays. but other than that, seems neat enough to me

#

alternatively you can use usecols= in the read_csv call instead of dropping columns afterward

lost sinew Jun 7, 2019, 1:40 PM

#

alright thanks.. new to programming so im scared if its messy lol

#

Date,BTCUSDT Close,ETHUSDT Close,LTCUSDT Close
2018-01-23,10799.18,980.0,176.98
2018-01-24,11349.99,1061.0,180.89
2018-01-25,11175.27,1056.52,179.59
2018-01-26,11089.0,1051.03,177.09
2018-01-27,11491.0,1118.99,182.1
2018-01-28,11879.95,1251.96,196.74
2018-01-29,11251.0,1177.01,181.5
2018-01-30,10237.51,1085.5,168.21
2018-01-31,10285.1,1124.81,165.19
2018-02-01,9224.52,1041.94,143.69

#

how do i find the time lag between each of the * Close

#

this is a csv file

desert oar Jun 7, 2019, 1:42 PM

#

what do you mean time lag

lost sinew Jun 7, 2019, 1:44 PM

#

like the whether the price increases or decreases.. it follows each of the other prices because they are highly correlated.. is there a way to find the average lag/lead time

desert oar Jun 7, 2019, 1:44 PM

#

not sure i understand. you want to find lags or leads such that the series are all maximally correlated?

lost sinew Jun 7, 2019, 1:44 PM

#

for example, ETHUSDT Close has a +ve increase 10 minutes after BTCUSDT Close has a +ve increase

desert oar Jun 7, 2019, 1:45 PM

#

also your data is daily so you can't figure out +10 minutes from that. but i think i see

lost sinew Jun 7, 2019, 1:45 PM

#

ohh its just an example

desert oar Jun 7, 2019, 1:45 PM

#

i'm not sure of any principled way to do that other than making leads and lags of different lengths and computing the correlations

#

or making plots and eyeballing it

lost sinew Jun 7, 2019, 1:46 PM

#

so theres no quantitative way to do it?

desert oar Jun 7, 2019, 1:46 PM

#

im sure there is, but i dont know it

#

sometimes the "dumb way" is good enough

lost sinew Jun 7, 2019, 1:46 PM

#

alright thanks for ur help

#

been searching google for days and i still cant find the answer 😦

desert oar Jun 7, 2019, 1:47 PM

#

https://quant.stackexchange.com/a/14868

it seems like you really just have to compute a bunch of lagged correlations, or use Granger causality

Quantitative Finance Stack Exchange

detecting and measuring lead lag effect

Given two time series data. I remember there is one statistics that tells you one is the leading factor while the other is the lagging factor. However, i do not remember the exact details. correlat...

turbid bay Jun 7, 2019, 1:54 PM

#

hey. im making a neural network for detecting handwritten digits from scrath using numpy. rn when i train with 10-20 images and test with those too i get a very high accuracy. but when i go above using 50 training images it just guesses 3 every time. idk y. heres the code
https://pastebin.com/32Z5ZMgr

Pastebin

[Python] import numpy as np import pygame import cv2 def sigm...

desert oar Jun 7, 2019, 1:57 PM

#

guessing the same number every time suggests something degenerate in your training

#

if you print the gradient at each training step maybe you can see something going wrong

turbid bay Jun 7, 2019, 2:03 PM

#

which bits the gradient tho? 😂. sorry i dont 100% know whats going on

sand reef Jun 7, 2019, 2:07 PM

#

Well. You implemented the entirety of the neural network from scratch.

#

It's gonna be a bit hard to point out where you are going wrong.

#

Here what you can do.

#

Shuffle the dataset

#

And then take samples.

#

If it started predicting 3 a lot, 2 things are only possible. Either something is going wrong in your network, which is unlikely since you said that it was working with 10 examples, or your training data had a lot of 3s, so your network learnt to predict only 3 for high accuracy.

desert oar Jun 7, 2019, 2:10 PM

#

is your data really unbalanced

sand reef Jun 7, 2019, 2:15 PM

#

Altho, how did it get a very high accuracy with just 10 examples? That also the mnist dataset?

#

Is that even possible with a balanced out dataset?

turbid bay Jun 7, 2019, 2:30 PM

#

no my data consists of 20 of each number

#

i mean it is from the mnist dataset. but i just got the images online and saved them as png’s. i only got 200 of them

sand reef Jun 7, 2019, 2:39 PM

#

Well. Do one thing.

#

Use tensorflow.keras and get the mnist dataset

#

And train your model on that.

#

If the same issue persists, then your code for the neural network is having errors.

turbid bay Jun 7, 2019, 2:56 PM

#

i wanted to do that before. but i never knew how to get it

#

or how to use it if i did get it

lost sinew Jun 7, 2019, 3:01 PM

#

import requests
import csv
import pandas as pd

market = 'XRPUSDT'#'LTCUSDT'#'BTCUSDT'#'ETHUSDT'
interval = '1d'

url = 'https://api.binance.com/api/v1/klines?symbol=' + market + '&interval=' + interval
data = requests.get(url).json()

with open(market + '.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerows(data)

df = pd.read_csv(market + '.csv', names=['Date', 'Open', 'High', 'Low', 'Close',
                                         'Volume', 'Close time', 'Quote asset volume',
                                         'Number of trades', 'Taker buy base asset volume',
                                         'Taker buy quote asset volume', 'Ignore'])

df['Date'] = pd.to_datetime(df['Date'], unit='ms')

df = df.drop(columns=['Close time', 'Quote asset volume',
                      'Number of trades', 'Taker buy base asset volume',
                      'Taker buy quote asset volume', 'Ignore'])
# save file
df.to_csv(market + '.csv', index=False)

#

how do i make a loop for all of the 'market' commented

#

i wanna just type all of the different markets in a list and loop around it automatically instead of manually chanign the market

#

changing*

desert cradle Jun 7, 2019, 3:02 PM

#

markets = ['XRPUSDT', 'LTCUSDT', ...]
for market in markets:
    all the rest of your code```

#

the rest of your code could probably be made more efficient but that's how you'd make the loop

lost sinew Jun 7, 2019, 3:03 PM

#

how can i make it more efficient

desert cradle Jun 7, 2019, 3:05 PM

#

i'd probably create the dataframe directly from the json

lost sinew Jun 7, 2019, 3:05 PM

#

how would i do that tho.. im really new into programming

desert cradle Jun 7, 2019, 3:06 PM

#

something like this ```py
data = requests.get(url).json()
df = pd.DataFrame(data,
columns=['Date', 'Open', 'High', 'Low', 'Close', 'Volume',
'x', 'x', 'x', 'x', 'x', 'x'])
df['Date'] = pd.to_datetime(df['Date'], unit='ms')
df = df.drop(columns=['x'])
df.to_csv(market + '.csv', index=False)

#

(I just used 'x' instead of names for columns you're deleting anyway)

lost sinew Jun 7, 2019, 3:07 PM

#

alright thanks

#

import pandas as pd


markets = ['BNB', 'LTC', 'EOS', 'ONE', 'TRX', 'BCHABC', 'MATIC', 'XRP', 'LTC', 'BTC', 'ETH', 'BTT', 'FET', 'ZIL', 'ADA',
           'ATOM', 'LINK', 'NEO', 'ETC', 'CELR', 'XLM']

df_btcusdt = pd.read_csv("BTCUSDT.csv", parse_dates=True, index_col=0)
df_ethusdt = pd.read_csv("ETHUSDT.csv", parse_dates=True, index_col=0)
df_ltcusdt = pd.read_csv("LTCUSDT.csv", parse_dates=True, index_col=0)

df_btcusdt = df_btcusdt.drop(columns=['Open', 'High', 'Low', 'Volume'])
df_ethusdt = df_ethusdt.drop(columns=['Open', 'High', 'Low', 'Volume'])
df_ltcusdt = df_ltcusdt.drop(columns=['Open', 'High', 'Low', 'Volume'])

df_btcusdt.rename(columns={'Close':'BTCUSDT Close'}, inplace=True)
df_ethusdt.rename(columns={'Close':'ETHUSDT Close'}, inplace=True)
df_ltcusdt.rename(columns={'Close':'LTCUSDT Close'}, inplace=True)


main_df = pd.concat([df_btcusdt, df_ethusdt, df_ltcusdt], axis=1, sort=False)

#main_df.to_csv('combined.csv', index=False)

print(main_df.corr())

#

how would i do this for all of the markets now

#

i trieed it but it wouldnt work.. what shouyld i change the final 'main_df' line to

#

@desert cradle

desert cradle Jun 7, 2019, 3:17 PM

#

ok uh

#

that's very different from your other code that was making separate files

lost sinew Jun 7, 2019, 3:18 PM

#

yea..

#

is there a way to loop it?

desert cradle Jun 7, 2019, 3:18 PM

#

i think what you want is pd.merge

#

but i'm not 100% sure on the details of how to use it

lost sinew Jun 7, 2019, 3:19 PM

#

is there like a way to concat it to the main_df everytime it loops?

desert cradle Jun 7, 2019, 3:20 PM

#

actually...

#

since you're using an index column

#

wait can you tell me why the concat didn't work?

lost sinew Jun 7, 2019, 3:20 PM

#

illk show u what i tried.. wait

desert cradle Jun 7, 2019, 3:21 PM

#

but anyway, you just want the close columns, right?

#

also you have this long list of markets but just three csv files, what are you trying to do with that exactly

lost sinew Jun 7, 2019, 3:23 PM

#

import pandas as pd

markets = ['BNB', 'LTC', 'EOS', 'ONE', 'TRX', 'BCHABC', 'MATIC', 'XRP', 'LTC', 'BTC', 'ETH', 'BTT', 'FET', 'ZIL', 'ADA',
           'ATOM', 'LINK', 'NEO', 'ETC', 'CELR', 'XLM']
pair = 'USDT'
for market in markets:

    df = pd.read_csv(market + pair+ ".csv", parse_dates=True, index_col=0)

    df = df.drop(columns=['Open', 'High', 'Low', 'Volume'])

    df.rename(columns={'Close': market + pair +' Close'}, inplace=True)

    main_df = pd.concat([df], axis=1, sort=False)

print(main_df.corr())

desert cradle Jun 7, 2019, 3:23 PM

#

ok, that's your problem

#

concat works fine, you're just doing it in the wrong place

lost sinew Jun 7, 2019, 3:23 PM

#

ohh

#

where should it be

desert cradle Jun 7, 2019, 3:23 PM

#

import pandas as pd

markets = ['BNB', 'LTC', 'EOS', 'ONE', 'TRX', 'BCHABC', 'MATIC', 'XRP', 'LTC', 'BTC', 'ETH', 'BTT', 'FET', 'ZIL', 'ADA',
           'ATOM', 'LINK', 'NEO', 'ETC', 'CELR', 'XLM']
pair = 'USDT'
dfs = []
for market in markets:
    df = pd.read_csv(market + pair+ ".csv", parse_dates=True, index_col=0)
    df = df.drop(columns=['Open', 'High', 'Low', 'Volume'])
    df.rename(columns={'Close': market + pair +' Close'}, inplace=True)
    dfs.append(df)

main_df = pd.concat(dfs, axis=1, sort=False)

print(main_df.corr())

lapis sequoia Jun 7, 2019, 3:24 PM

#

@desert cradle is there any way of doing it without the append?

desert cradle Jun 7, 2019, 3:24 PM

#

why

lost sinew Jun 7, 2019, 3:25 PM

#

it cmae out with this error ValueError: No objects to concatenate

lapis sequoia Jun 7, 2019, 3:25 PM

#

list comprehension would be worst in this case?

desert cradle Jun 7, 2019, 3:25 PM

#

that doesn't make any sense

lost sinew Jun 7, 2019, 3:26 PM

#

lol nvm @desert cradle i forgot to put dfs.append(df)

#

THANKS

lapis sequoia Jun 7, 2019, 3:28 PM

#

import pandas as pd


markets = ['BNB', 'LTC', 'EOS', 'ONE', 'TRX', 'BCHABC', 'MATIC', 'XRP', 'LTC', 'BTC', 'ETH', 'BTT', 'FET', 'ZIL', 'ADA',
           'ATOM', 'LINK', 'NEO', 'ETC', 'CELR', 'XLM']

def op(market)
    pair = 'USDT'

    df = pd.read_csv(market + pair+ ".csv", parse_dates=True, index_col=0)
    df = df.drop(columns=['Open', 'High', 'Low', 'Volume'])
    df.rename(columns={'Close': market + pair +' Close'}, inplace=True)
    return df

dfs = [op(market)for market in markets]


main_df = pd.concat(dfs, axis=1, sort=False)


print(main_df.corr())

#

maybe something like this?

desert cradle Jun 7, 2019, 3:29 PM

#

eh

#

making a function just so you can have a list comprehension doesn't really improve readability that much

#

and it's not much of a difference for performance either

lost sinew Jun 7, 2019, 3:30 PM

#

how do i make it into a heatmap after having a correlation table?

desert cradle Jun 7, 2019, 3:30 PM

#

no idea

lost sinew Jun 7, 2019, 3:30 PM

#

okay thanks ill figure it out

desert cradle Jun 7, 2019, 3:31 PM

#

sounds like a lot of math, we're past my ability to help

#

i just knwo the basics of how pandas itself works

lost sinew Jun 7, 2019, 3:31 PM

#

ohh okay

lapis sequoia Jun 7, 2019, 3:31 PM

#

@desert cradle Okey ty!, i thought the performance would be better with the list comprehension

desert cradle Jun 7, 2019, 3:32 PM

#

it probably doesn't make much difference - a list comprehension might be very slightly faster than a loop with append, but adding an extra layer of function call might slow it down too, and it's not worth worrying about anyway

lost sinew Jun 7, 2019, 4:34 PM

#

how does python calculate correlation for the x.corr() code

desert oar Jun 7, 2019, 4:48 PM

#

Standard Pearson correlation

lost sinew Jun 7, 2019, 4:49 PM

#

thanks

desert oar Jun 7, 2019, 4:49 PM

#

Can choose Spearman or Kendall if you want

#

It's in the docs

lost sinew Jun 7, 2019, 4:50 PM

#

which one is the best

#

nvm

#

do you think the standard pearson correlation is suitable for finding the correlation between two stock prices?

#

or is the standard pearson correlation only suitbale for linear relationships

desert oar Jun 7, 2019, 10:26 PM

#

by definition it's only suitable for linear relationships, but you might be underestimating the value of measuring a linear relationship. if "priceA" generally goes up whenever "priceB" goes up, then you can see that with a linear relationship

gaunt gorge Jun 7, 2019, 11:05 PM

#

What project will be good to do to learn data science and to put on the resume?

desert oar Jun 7, 2019, 11:06 PM

#

anything tbh

#

your learning project likely won't be a good resume project

#

kaggle is never a bad place to start for machine learning

#

it's kind of hard to learn "data science" on your own tbh

#

you end up mostly learning technical stuff, which is maybe 80% of the equation

sand reef Jun 8, 2019, 8:57 AM

#

say, anyone here well versed in the concept of hopfield neural networks?

#

i m slowly starting to lose it on this neural network

#

please ping me if anyone can help

#

https://pastebin.com/z1HTx81w

Pastebin

[Python] Hopfield Neural Network - Pastebin.com

#

For some reason, this network is always converging only to the latest learnt pattern

sand reef Jun 8, 2019, 10:36 AM

#

to see what the error is, use this code in conjunction: "Processor.py" is the name of the pasted file in the above link

#

from Processor import *

def print_matrix(matrix):
    for i in range(len(matrix)):
        string = ''
        for j in range(len(matrix)):
            string += str(1 if matrix[i][j] == 1 else 0) + ' '
        print(string)


nn = Network()
a = [
[ 1, 1, 1, 1, 1, 1, 1],
[-1,-1,-1, 1,-1,-1,-1],
[-1,-1,-1, 1,-1,-1,-1],
[-1,-1,-1, 1,-1,-1,-1],
[-1,-1,-1, 1,-1,-1,-1],
[-1,-1,-1, 1,-1,-1,-1],
[-1,-1,-1, 1,-1,-1,-1]]

b = [
[ 1,-1,-1,-1,-1,-1, 1],
[-1, 1,-1,-1,-1, 1,-1],
[-1,-1, 1,-1, 1,-1,-1],
[-1,-1,-1, 1,-1,-1,-1],
[-1,-1, 1,-1, 1,-1,-1],
[-1, 1,-1,-1,-1, 1,-1],
[ 1,-1,-1,-1,-1,-1, 1]]

c = [
[ 1, 1, 1,-1, 1, 1, 1],
[-1,-1,-1, 1,-1,-1,-1],
[-1,-1,-1, 1, 1,-1,-1],
[-1,-1,-1, 1,-1,-1,-1],
[-1,-1,-1, 1,-1,-1,-1],
[-1,-1, 1,-1,-1,-1,-1],
[-1,-1,-1, 1,-1,-1,-1]]

nn.read_matrix(a)
nn.set_weights()
nn.read_matrix(b)
nn.set_weights()
nn.run_async(1000)
print_matrix(nn.get_matrix())```

#

okay....i m seeing an issue......

#

i think, the issue does not lie in the network, i think my gui is messing up

sand reef Jun 8, 2019, 11:17 AM

#

now who do i ask to help me see the error that i think my logic is causing?

#

since its a gui logic error that i cant seem to fit into the button events

naive shore Jun 8, 2019, 11:44 AM

#

sorry im of no help but i have a question ducky

#

if i have an array of say 50000 values and i iterate over them with my algorithm and extensively using a counter (like counter++ at each iteration and than use it in my calculations)

so.... it works fine for an array, but when i apply this algorithm to a data flow obviously i get overflow very fast

#

so i thought (to not reinvent the wheel) maybe there is a well known consept to deal with this kind of things, like phase iterators or something

#

googling "phase iterator" gives nothing useful, so maybe it has different name

#

but my image of it is like when your counter is more than some value it zero's out but we still know its not real 0 but 0 + whatever counter we zeroed

#

duh such a mess of a thought )

zenith nova Jun 8, 2019, 11:53 AM

#

As far as I am aware, instead of overflowing pythons ints become long's, and longs simply don't overflow?

#

>>> 10**10**3
100000000000....(snip too many zeroes)

naive shore Jun 8, 2019, 11:54 AM

#

oh...
so i must inspect why it gives me overflow more carefully

desert oar Jun 8, 2019, 12:49 PM

#

Numpy doesnt do that though, if its float32 its float32

#

But yeah use native python ints, they can get huge

knotty nexus Jun 8, 2019, 1:21 PM

#

does anyknow how how I can calculate permutations, but with multiple lists of combinations? ie. [3,4] [5,6] [7,8,9] would be (3,5,7) (3,6,7) etc

earnest prawn Jun 8, 2019, 1:38 PM

#

I've never done this but my intuition says take a look at the itertools module from stdlib @knotty nexus

hollow quartz Jun 8, 2019, 1:49 PM

#

Hi I am a beginner in Data Science. I have a machine learning problem. So I want to know what is the useful statistic for begin a machine learning problem?

desert oar Jun 8, 2019, 2:03 PM

#

Depends on the problem

#

Usually you want to learn something about the data

#

Summary statistics, or plot the data if you can

#

You should have a goal in mind so you can stay focused on that goal

silk forge Jun 8, 2019, 2:12 PM

#

made my first ever decision tree classifier

knotty nexus Jun 8, 2019, 3:36 PM

#

thanks @earnest prawn . I look at itertools, but as far I can see it can only handle combinations of a single list; [3,4] would be (3,3), (3,4), (4,3) etc. For now I'm gonna try for loops, but it's gonna be really slow

desert oar Jun 8, 2019, 3:45 PM

#

You want itertools.product() maybe @knotty nexus

hollow quartz Jun 8, 2019, 3:46 PM

#

@desert oar I use pandas for example data.describe() show mean, std, min, max, 1st , 2nd and 3rd quartile

#

Is it the only statistic that i can use?

desert oar Jun 8, 2019, 3:47 PM

#

You can use anything you want

#

Its better to start with a specific objective

#

What are you trying to achieve? What question do you want to answer?

hollow quartz Jun 8, 2019, 3:48 PM

#

ok thanks

wary fox Jun 8, 2019, 4:31 PM

#

This is kind of a simple question and I am not 100% sure if it belongs here under data-science but I figured it fits, so what exactly about numpy makes it "faster"

earnest prawn Jun 8, 2019, 4:55 PM

#

that its written in C and uses C arrays instead of python lists

#

also https://github.com/tensorflow/agents if anyone is interested

GitHub

tensorflow/agents

TF-Agents is a library for Reinforcement Learning in TensorFlow - tensorflow/agents

lost sinew Jun 8, 2019, 5:12 PM

#

how would i find the average lead/lag time of a time series

desert oar Jun 8, 2019, 8:00 PM

#

What do you mean "average" lead/lag?

lapis sequoia Jun 9, 2019, 7:11 AM

#

Can anyone pls explain me p-value in layman terms

#

I seriously can understand abit

#

i do understood what is null and alternate hypothesis

lean ledge Jun 9, 2019, 7:16 AM

#

When p<0.05, there is less than 5% chance that the results were a fluke of random chance

#

< 0.1 -> less than 10%

#

Etc

lapis sequoia Jun 9, 2019, 7:17 AM

#

so what does accepting the null hypothesis means when p>0.05

#

i was watching a video it gave an example that let the null hypothesis be people are on my website for population average time of 20 min before change and alternate hypothesis becomes people are on my website fore than 20 min

#

then it set significance value = 0.05

#

then it took sample mean of 100 people and found out to be 25 min

#

after that i didnt understand a thing

#

so can u tell me based on this example what exactly is p value..

#

like this much i understood that lower the p value lower are the chances that my observation was just a random chance

lean ledge Jun 9, 2019, 8:19 AM

#

@lapis sequoia p>0.05 means there's more than 5% chance that your results was due to chance

#

We consider that too likely

#

Hence we consider it to mean that "the experiment did not show the relationship we expected"

#

Hence we are unable to reject the null hypothesis

#

Say your hypothesis is A is correlated with B

#

Null hypothesis: there is no relationship

#

You do the experiment and find that A is correlated with B

#

But

#

There's a greater than 5% chance that it is due to random chance that you got that result

#

Hence you are unable to reject the idea that they are unrelated

#

And can't accept the hypothesis

lapis sequoia Jun 9, 2019, 8:23 AM

#

Ok ok..

lapis sequoia Jun 9, 2019, 1:18 PM

#

https://www.sciencedirect.com/science/article/pii/S1063458412007789

desert oar Jun 9, 2019, 2:19 PM

#

@lean ledge be careful, it means that if the null hypothesis is true there's more than 5% chance that your results was due to chance

sonic girder Jun 9, 2019, 3:55 PM

#

https://raw.githubusercontent.com/AndrewCathcart/got-sentiment-analysis/master/cleaned_got.csv if anyone wants a dataset of around 600k game of thrones tweets from roughly the time S8E3 aired

lapis sequoia Jun 9, 2019, 6:48 PM

#

I am just curious do we use the 3,4,5 method ever or do we use backward elimination most of the time

📎 Capture_2.PNG

#

coz at least thats what i am learning.. just backward elimination

lean ledge Jun 9, 2019, 9:34 PM

#

@desert oar You're meant to "not reject" it rather than accept it https://researchskills.epigeum.com/courses/researchskills/473/course_files/html/wht_1_10.html

desert oar Jun 9, 2019, 9:34 PM

#

I know what a hypothesis test is

lean ledge Jun 9, 2019, 9:35 PM

#

What was I being corrected on?

desert oar Jun 10, 2019, 2:50 AM

#

p>0.05 means there's more than 5% chance that your results was due to chance

#

that's only true under the null hypothesis

#

which of course is the whole point

#

if you get a value that's "rare" under the null (in this case 1-in-20 or rarer), then we say we don't believe the null

#

ah wait i think i misunderstood your comment 😄

wide gyro Jun 10, 2019, 3:40 AM

#

Is anyone good with Pandas and could help me with a problem regarding the csv reader?

lapis sequoia Jun 10, 2019, 4:54 AM

#

!ask

arctic wedgeBOT Jun 10, 2019, 4:54 AM

#

ask

Asking good questions will yield a much higher chance of a quick response:

• Don't ask to ask your question, just go ahead and tell us your problem.
• Try to solve the problem on your own first, we're not going to write code for you.
• Show us the code you've tried and any errors or unexpected results it's giving
• Keep your patience while we're helping you.

You can find a much more detailed explanation on our website.

wide gyro Jun 10, 2019, 6:30 AM

#

I am using csv reader to read data from one file and get rid of all the rows that are missing a value, and when i use dropna(inplace=true), it works fine. However, I want to exclude some columns from that so I tried to implement the dropna(subset=[]). Unfortunately, when appending to a new csv, that file is actually larger in size than my previous one.

supple ferry Jun 10, 2019, 6:41 AM

#

@wide gyro , hi. First, try not to use inplace = True at all cost. it is shorter, but it will bring you more headache in the long run. Also, it will be removed on v1.0 of Pandas.
Can you manually chekc both files and report their outputs?
Size can be affected also by data types too

#

shape of dataframes and which types you have

silent swan Jun 10, 2019, 7:23 AM

#

huh, why is inplace getting dropped? It sounds like something has reasonable usecases

wide gyro Jun 10, 2019, 7:33 AM

#

@supple ferry I will update you in a bit with the output file, I didn’t look too much into it as I still have a decent sample size but would like to refine it as well as I can

#

Also, how would I pass the csv reader to other functions? I’d like to manipulate the data however I’d want once it’s read through but I’m not sure how to pass it through. New to Python but I understand the basics due to knowing a couple other languages

#

Would I need to put the reader into a list or dictionary? I figured I wouldn’t need to as I can call the column and row number in the method I initialize the reader, and it returns whatever I need. However, when trying to use it in certain functions, it says something like “Missing argument”

supple ferry Jun 10, 2019, 7:50 AM

#

what do you want to achieve by putting it into the function?? you want to read on demand ??

wide gyro Jun 10, 2019, 8:09 AM

#

@supple ferry I want to read the data and be able to use it for whatever I’d like, with one column being time that I’m converting or simple arithmetic use

turbid bay Jun 10, 2019, 8:13 AM

#

how does one get the mnist dataset and how does one use it?

#

im using my own made neural network using pygame

#

numpy*

#

oops 😂😂

lapis sequoia Jun 10, 2019, 9:24 AM

#

my text file has uneven number of lines

#

want to read to dataframe.. how should I approach this

#

I want everything in one column

desert oar Jun 10, 2019, 11:27 AM

#

@silent swan I think in some cases it was actually less efficient, but it was misleading people into thinking it was somehow a performance optimization; also it leads to two discordant and incompatible programming styles, rather than just one

#

@wide gyro what CSV reader? The native python one, or the pandas function?

#

If you're getting an error in your code nobody can help you unless you post a sample of code that demonstrates the error, and also the full error message

#

Usually when a data frame is bigger than you expect, it's because of a join/merge that went wrong

gilded dagger Jun 10, 2019, 11:36 AM

#

I have a few questions about Machine Learning, is this the right thread for it?

#

In particular, I'd like to do some lip reading using TensorFlow, but I'm unsure as to what's already been done. Anybody knows which projects are still maintained?

supple ferry Jun 10, 2019, 11:47 AM

#

!ask

arctic wedgeBOT Jun 10, 2019, 11:47 AM

#

ask

Asking good questions will yield a much higher chance of a quick response:

• Don't ask to ask your question, just go ahead and tell us your problem.
• Try to solve the problem on your own first, we're not going to write code for you.
• Show us the code you've tried and any errors or unexpected results it's giving
• Keep your patience while we're helping you.

You can find a much more detailed explanation on our website.

wide gyro Jun 10, 2019, 12:13 PM

#

@desert oar Pandas

void anvil Jun 10, 2019, 1:51 PM

#

Where is the default save directory for WSL? I spent a couple days working on a project. Went to open it up today and it's gone. The .csv's I created in the folder are there, but the code is gone.

random jasper Jun 10, 2019, 1:54 PM

#

Sorry if this is the wrong section. I'm trying to use OpenCV to analyze images of particles . I have gotten to the point where I have binarized the image and the particles are decently defined, but what functions should I be looking at to analyze say the area or the diameter bounded by a countor?

void anvil Jun 10, 2019, 1:54 PM

#

And is there a way to just search the WSL drive

sand reef Jun 10, 2019, 5:28 PM

#

@random jasper I am not very well versed in openCV and there might be something already existing which does what you are asking for, but well, you can convert it into an array, and use a condition and mark those areas. If you can mathematically represent a contour that is.

void anvil Jun 10, 2019, 5:44 PM

#

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/.local/lib/python3.5/site-packages/pandas/core/ops.py in na_op(x, y)
   1504         try:
-> 1505             result = expressions.evaluate(op, str_rep, x, y, **eval_kwargs)
   1506         except TypeError:

~/.local/lib/python3.5/site-packages/pandas/core/computation/expressions.py in evaluate(op, op_str, a, b, use_numexpr, **eval_kwargs)
    207     if use_numexpr:
--> 208         return _evaluate(op, op_str, a, b, **eval_kwargs)
    209     return _evaluate_standard(op, op_str, a, b)

~/.local/lib/python3.5/site-packages/pandas/core/computation/expressions.py in _evaluate_standard(op, op_str, a, b, **eval_kwargs)
     67     with np.errstate(all='ignore'):
---> 68         return op(a, b)
     69 

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U32') dtype('<U32') dtype('<U32')


#line of my code
-> 1155    l = (a + 2 * i + 2 * j + k)/6


~/.local/lib/python3.5/site-packages/pandas/core/ops.py in wrapper(left, right)
   1581             rvalues = rvalues.values
   1582 
-> 1583         result = safe_na_op(lvalues, rvalues)
   1584         return construct_result(left, result,
   1585                                 index=left.index, name=res_name, dtype=None)

#

   1527         try:
   1528             with np.errstate(all='ignore'):
-> 1529                 return na_op(lvalues, rvalues)
   1530         except Exception:
   1531             if is_object_dtype(lvalues):

~/.local/lib/python3.5/site-packages/pandas/core/ops.py in na_op(x, y)
   1505             result = expressions.evaluate(op, str_rep, x, y, **eval_kwargs)
   1506         except TypeError:
-> 1507             result = masked_arith_op(x, y, op)
   1508 
   1509         result = missing.fill_zeros(result, x, y, op_name, fill_zeros)

~/.local/lib/python3.5/site-packages/pandas/core/ops.py in masked_arith_op(x, y, op)
   1024         if mask.any():
   1025             with np.errstate(all='ignore'):
-> 1026                 result[mask] = op(xrav[mask], y)
   1027 
   1028     result, changed = maybe_upcast_putmask(result, ~mask, np.nan)

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U32') dtype('<U32') dtype('<U32')```

sand reef Jun 10, 2019, 5:44 PM

#

?

#

pandas issue?

void anvil Jun 10, 2019, 5:45 PM

#

Not sure

#

maybe

#

I'm running on WSL for the first time

#

porting over some code

#

I've never seen anything like it running on windows

sand reef Jun 10, 2019, 5:45 PM

#

i see..

void anvil Jun 10, 2019, 5:45 PM

#

trying to figure out what the error is actually saying

sand reef Jun 10, 2019, 5:45 PM

#

seems to be stemming from pandas....so i guess something went wrong there..

#

didcha google this issue?

#

the type error, just copy and paste and see what it gives?

#

https://stackoverflow.com/questions/35013726/typeerror-ufunc-add-did-not-contain-a-loop-with-signature-matching-types

Stack Overflow

TypeError: ufunc 'add' did not contain a loop with signature match...

I am creating bag of words representation of the sentence. Then taking the words that exist in the sentence to compare to the file "vectors.txt", in order to get their embedding vectors. After gett...

#

see if this helps

#

if this doesn't help, well, i am sorry, this, for now, is out of my league...

wide gyro Jun 10, 2019, 5:52 PM

#

I am passing my pandas dataframe to a class's method and using df.at[row, column] to check if a certain value equals 0 or 1, and returns a String based on the outcome. However, nothing is being returned yet the program is being terminated without giving an error.

#

If I try to print that same df.at[row,column] outside of the method, though, I receive the value I need

desert oar Jun 10, 2019, 7:02 PM

#

@wide gyro did you forget to write return in the function? you'll need to show your code

wide gyro Jun 10, 2019, 7:02 PM

#

class CellTower():

    def __init__(self,data):
        self.changeable = data.changeable

    def checkChange(data,int1):
        if data.at[int1,'changeable'] == 1:
            return 'Firm'
        else:
            return 'Processing'

data = pd.read_csv('towers.csv')
CellTower.checkChange(data,3)

#

@desert oar

#

One suggested that I go through cmd line and that would give me the true error, but I'm struggling to get that going

desert oar Jun 10, 2019, 7:03 PM

#

and that code reproduces your problem?

wide gyro Jun 10, 2019, 7:04 PM

#

Well I found out that trying to access any method inside that class doesn't work

desert oar Jun 10, 2019, 7:04 PM

#

    def checkChange(data,int1)

this needs a :

#

it's likely that your class is failing to be instantiated and your program isn't even running

#

cause that's a syntax error

wide gyro Jun 10, 2019, 7:04 PM

#

Oh, it has that

#

I must've accidentally deleted it when putting it on here

desert oar Jun 10, 2019, 7:05 PM

#

is checkChange supposed to be a static method

wide gyro Jun 10, 2019, 7:05 PM

#

Yes

desert oar Jun 10, 2019, 7:05 PM

#

ok

#

did you forget to write @staticmethod then?

#

well actually

#

@classmethod in this case

#

well... oh

#

that's weird

#

that should work but it's not recommended

#

anyway, i can't reproduce your problem, works fine

wide gyro Jun 10, 2019, 7:07 PM

#

It appears that my program ends once it finishes the csv reader

#

Calling any method in the class I'm trying to get from returns nothing

#

I've been stuck on this for a few hours now, not sure what I can do

#

Been trying to figure out the cmd line too, looks like that's my only hope

desert oar Jun 10, 2019, 7:08 PM

#

are you expecting it to print the output?

#

is that the full script you have?

#

what do you mean "returns nothing" -- clearly that method returns something

#

it's also not even really a method, the way it's defined, more of a namespaced function. that will fail if you actually instantiate the CellTower class

wide gyro Jun 10, 2019, 7:09 PM

#

well right now I'm just using a simple IDE and going to transfer it to Linux machine once I sort this all out

#

Then that's probably where I go wrong

#

What's the difference between method and namespaced function?

desert oar Jun 10, 2019, 7:10 PM

#

can you share more of the code?

#

a namespaced function is something i made up...

#

it shouldn't matter

#

point being, i unfortunately can't help because i can't reproduce the error and it's not clear what's going wrong

wide gyro Jun 10, 2019, 7:12 PM

#

Hm, have you ever used the windows cmd line?

#

@desert oar

desert oar Jun 10, 2019, 7:12 PM

#

yes

#

what IDE are you using though

#

it shouldn't matter

wide gyro Jun 10, 2019, 7:12 PM

#

Spyder

#

Anaconda

desert oar Jun 10, 2019, 7:13 PM

#

ok

#

how are you running this

wide gyro Jun 10, 2019, 7:13 PM

#

the .py file?

desert oar Jun 10, 2019, 7:14 PM

#

how are you running the code

#

and what output are you expecting. some kind of print-out?

#

and can you share a more representative piece of code that demonstrates the issue?

#

unfortunately i have to head out now but maybe someone else can see this and help

modest scarab Jun 10, 2019, 9:01 PM

#

I might be overreaching but i could be using wrong keywords

#

Is there such a thing where i can predict what a person will say based on time of the day

#

Using previous messages in a group chat?

#

how can i go about finding resources for this?

desert oar Jun 10, 2019, 10:08 PM

#

yep, that's something you might call a "language model". usually they just predict the next word in a sentence but one of them can probably be adapted for whole text messages

#

it's not really my area of expertise but that might at least get you started. you can also look into the blog-literature on chatbots, which is abundant

lapis sequoia Jun 10, 2019, 10:17 PM

#

hi

#

I need some help..

#

How do I load my text file into a dataframe? I want the dataframe to have only one column.. the text file is delimited between sentences by a newline, and there length of lines is not fixed.. so I'm running into issues

desert oar Jun 10, 2019, 10:19 PM

#

just read the file as a list

#

then create a dataframe with that list as the only column

#

with open('myfile.txt') as f:
    text_lines = [line.strip() for line in f]

data = pd.DataFrame({'text': text_lines})

lapis sequoia Jun 10, 2019, 10:20 PM

#

@modest scarab you're talking about smart reply.. like in gmail.. there are pretrained embeddings that can help you do this.. just base next message prediction on the previous one, or on the fly

#

thanks a bunch!

desert oar Jun 10, 2019, 10:20 PM

#

ahhh "smart reply" thats what people call it

#

you might need to train your own model to incorporate "external" features like time of day

lapis sequoia Jun 10, 2019, 10:21 PM

#

or just switch between models depending on time of day.. easier

desert oar Jun 10, 2019, 10:26 PM

#

i'd only do that as a last resort if i couldn't use fine-tuning or reuse the embeddings in another model. but again i don't know this area specifically

modest scarab Jun 11, 2019, 2:02 AM

#

this is probably too advanced and probably havent been done

#

but it isn't "smart reply" that i am looking for

#

it's just basically what sort of words or phrases my friend would predictably say

#

at a certain time of the day

lapis sequoia Jun 11, 2019, 2:09 AM

#

it's not advanced..

#

the next level down is if else blocks :v

desert oar Jun 11, 2019, 3:19 AM

#

@lapis sequoia i wouldnt say that at all man. those language models are really complicated

lapis sequoia Jun 11, 2019, 3:26 AM

#

well it's all relative.. for me programming concepts are hard at the moment.. but language comes easy..

#

https://github.com/tensorflow/tfjs-models/tree/master/universal-sentence-encoder

GitHub

tensorflow/tfjs-models

Pretrained models for TensorFlow.js. Contribute to tensorflow/tfjs-models development by creating an account on GitHub.

#

https://arxiv.org/abs/1705.00652

arXiv.org

Efficient Natural Language Response Suggestion for Smart Reply

This paper presents a computationally efficient machine-learned method for
natural language response suggestion. Feed-forward neural networks using n-gram
embedding features encode messages into...

desert oar Jun 11, 2019, 3:29 AM

#

language? maybe. but actually understanding the math and design decisions that goes into a SOTA language model? i dont think anyone would ever call that easy

#

unless you're already very experienced in machine learning

lean ledge Jun 11, 2019, 3:30 AM

#

tbf, dont have to understand something to reuse pretrained SOTA models

lapis sequoia Jun 11, 2019, 3:32 AM

#

math yes.. design decisions.. I'm not really sure these were built to be efficient.. just as poc..

wide gyro Jun 11, 2019, 1:42 PM

#

What is the correct format to set your constructor with your dataframe?

#

I figured you would just set the init and match the columns like

#

class CT:
    def init(self,data)
        self.radio = data.radio
        self.cell = data.cell
        self.range = data.range
data = pd.read_csv("ct.csv")

#

Would I then do

#

ct1 = CT(data.iloc[i])

#

or

#

ct1 = CT(data[i])

desert oar Jun 11, 2019, 1:56 PM

#

@wide gyro what are you trying to do exactly

#

you want radio, cell, and range to be columns in the data frame?

wide gyro Jun 11, 2019, 1:57 PM

#

I might be approaching this entirely wrong

#

But I wanna make instances of class CT that could hold a row of my dataframe

desert oar Jun 11, 2019, 1:57 PM

#

oh i see

#

what you wrote should actually work

#

oh youre asking about iloc

#

iloc is positional indexing

#

loc uses the index of the dataframe

#

[] changes meaning depending on what you pass into it

#

i always use .loc or .iloc for extracting rows, for clarity

wide gyro Jun 11, 2019, 1:58 PM

#

I'm getting back "init() missing 1 required positional argument: data" with iloc, but when I just use say "data[i]" I get a KeyError for whatever number is i

desert oar Jun 11, 2019, 1:58 PM

#

oh

#

__init__ not init

wide gyro Jun 11, 2019, 1:59 PM

#

Yeah, sorry, is there a major difference? Can I just change it to init?

#

if init works and init doesn't?

desert oar Jun 11, 2019, 1:59 PM

#

no

#

python specifically looks for __init__

#

not init

wide gyro Jun 11, 2019, 2:00 PM

#

I tried putting init into the code but it made me update it

desert oar Jun 11, 2019, 2:00 PM

#

class CT:
    def __init__(self,data):
        self.radio = data.radio
        self.cell = data.cell
        self.range = data.range

data = pd.read_csv("ct.csv")

ct1 = CT(data.iloc[1])

#

that should work

#

err try now. syntax

#

you were missing a :

wide gyro Jun 11, 2019, 2:02 PM

#

oh oops, I always mess up translating the code over, I have it on desktop but use my laptop for this

#

apologies

#

but now that i initialized ct1, how could i extract say "self.radio"?

#

I couldn't right

#

or like

#

ct1.radio

#

that wouldn't work right

desert oar Jun 11, 2019, 2:05 PM

#

why not?

#

did you try it?

wide gyro Jun 11, 2019, 2:06 PM

#

I did right before you sent that, works well noice

#

I'm finally getting somewhere hehe

#

@desert oar are you familiar with dropna as well?

desert oar Jun 11, 2019, 2:06 PM

#

yes

wide gyro Jun 11, 2019, 2:07 PM

#

I tried using dropna(subset=['']) but I don't believe it gets rid of any data, and instead just adds another index which then puts on storage

#

but when I use inplace=True, it lowers the size of the file

#

Well I guess that might be because every column I'm taking out of subset could be the only ones containing missing data?

#

Also, how would I stop it from adding another index if I use the subset drop

desert oar Jun 11, 2019, 2:08 PM

#

er

#

you have a column called ''?

wide gyro Jun 11, 2019, 2:09 PM

#

No, I was just putting it in

desert oar Jun 11, 2019, 2:09 PM

#

??

#

the effect youre describing doesnt match up with what you are saying you did

#

can you share actual code

wide gyro Jun 11, 2019, 2:10 PM

#

Now that I'm thinking about it I think the reason it isn't dumping any rows is because everything I'm subsetting in dropna contains data in each row and every column I'm excluding are the ones that are missing the data

#

which is why inplace=True works better but I can fix that

desert oar Jun 11, 2019, 2:10 PM

#

why would inplace=True have any bearing on this at all

wide gyro Jun 11, 2019, 2:10 PM

#

However do you know how to stop the subset from adding another index in front of the updated csv file?

desert oar Jun 11, 2019, 2:11 PM

#

also inplace=True is deprecated and will be removed in pandas 1.0

#

it shouldn't add an index

#

that's why im confused

#

and want to see your code

wide gyro Jun 11, 2019, 2:11 PM

#

Yeah I was told not to use it, inplace doesn't add an index but subset does

#

alright one sec

#

def clean(df)
    #df.dropna(inplace=True)
    df.dropna(subset=['radio', 'range'])
    df.to_csv('updated.csv')

#

Is that correct format?

desert oar Jun 11, 2019, 2:14 PM

#

no subset should not add an index

#

but that's correct syntax yes

#

oh i know whats happening

wide gyro Jun 11, 2019, 2:14 PM

#

I genuinely think it's just that every column I'm adding to that subset isn't missing a single piece of data and the columns I'm excluding are the ones that are

desert oar Jun 11, 2019, 2:14 PM

#

that shouldn't have anything to do with any index

wide gyro Jun 11, 2019, 2:14 PM

#

yeah

desert oar Jun 11, 2019, 2:14 PM

#

so the extra index is coming from somewhere else

#

do you have a column in the csv that you already intend to use as an index?

wide gyro Jun 11, 2019, 2:15 PM

#

Well if I took a raw csv file that didn't have an index in first column and ran it through that clean method, it would add an index to first column and shift everything else 1 over

desert oar Jun 11, 2019, 2:16 PM

#

pandas always adds an index

wide gyro Jun 11, 2019, 2:16 PM

#

and if I were to clean that clean file, it would add another

desert oar Jun 11, 2019, 2:16 PM

#

every pandas dataframe has an index

wide gyro Jun 11, 2019, 2:16 PM

#

So just keep it?

desert oar Jun 11, 2019, 2:16 PM

#

if you want to omit the index when saving, use to_csv(..., index=False)

wide gyro Jun 11, 2019, 2:16 PM

#

I'm never gonna clean a clean csv so I don't have to worry about 2 index's, but will removing the initial index cause any issues down the line?

#

I guess it doesn't matter that much if I have it, but with millions of rows I feel like it adds some storage

desert oar Jun 11, 2019, 2:17 PM

#

no, unless you are planning to use the index for something

#

it does add

#

if you aren't using it, omit it when saving

wide gyro Jun 11, 2019, 2:17 PM

#

so it would be something like

#

df.to_csv('updated.csv',index=False)

#

I was also initially using data as the df's variable name, but I switched it to df because I feel like data could be a keyword in some cases

desert oar Jun 11, 2019, 2:19 PM

#

its not a keyword

#

go ahead

wide gyro Jun 11, 2019, 2:19 PM

#

oo

#

noice

wide gyro Jun 11, 2019, 2:40 PM

#

I'm trying to use chunksize to only get a portion of the csv file I'm reading off, but that turns the dataframe into a TextFileReader which I don't want

#

if I'm using pandas csv reader do I have to use something other than chunksize or do I have to read everything from the file?

#

Or do I somehow have to convert it back from a textfilereader to dataframe

earnest spear Jun 11, 2019, 2:51 PM

#

I'm not suer how to quote on discord, but your message from 10:11 with the code snippet- you do the df.dropna() without inplace=True, but you don't ever set anything to its return value. I think you need df = df.dropna(.) since you aren't doing it inplace anymore

wide gyro Jun 11, 2019, 2:53 PM

#

@earnest spear what do you mean by that?

#

Are you saying if I append to another csv file after dropna, it would keep the values I used before dropna?

#

But if I do df = df.dropna, it would take the new values into other csv file?

earnest spear Jun 11, 2019, 2:54 PM

#

df.dropna() returns a dataframe with the na values dropped. It doesn't modify the dataframe you call it on (unless you use inplace=True)

wide gyro Jun 11, 2019, 2:55 PM

#

so if I wanted to use the subset dropna I would have to set df equal to it

earnest spear Jun 11, 2019, 2:56 PM

#

So in the code snippet you posted, the df.dropna() is functionally doing nothing because you don't assign it's return value to anything

#

Right

wide gyro Jun 11, 2019, 2:56 PM

#

Gotcha, can't believe I never thought of that

#

I'm assuming I'd do the same thing for fillna

earnest spear Jun 11, 2019, 2:57 PM

#

Pandas as a whole likes the idea of immutable dataframes. Most operations don't change the dataframe but rather returns a new one. The inplace=True is saying, instead of returning a new dataframe, I want to change the dataframe I'm calling this on.

wide gyro Jun 11, 2019, 2:57 PM

#

So could I use the subset followed by inplace = true

#

or would that not work

#

instead of setting it equal

#

like do they accomplish same thing

earnest spear Jun 11, 2019, 2:59 PM

#

df = df.dropna() and df.dropna(inplace=True will yield the same result (df being a dataframe with the NA values dropped). It's functionally different in that the first creates a new dataframe, whereas the second just changes the existing one, but they do accomplish the same thing.

#

The use of subset or other arguments shouldn't affect the usage of inplace

lapis sequoia Jun 11, 2019, 3:00 PM

#

I have one question too regarding pandas.. do u think i should buy a book regarding pandas library.. coz i know its very important library?

#

like do i need to know it on fingertips?

earnest spear Jun 11, 2019, 3:12 PM

#

My opinion is that the best way to learn pandas is just use it for something. It's documentation is pretty good and usually any question you have has a solution on stack overflow since its so widely used. But if you learn well from books then it's never bad

lapis sequoia Jun 11, 2019, 3:14 PM

#

So i can understand everything from documentation?

#

Like usually documentation are complicated..u get everything but its a little hard for me to understand

earnest spear Jun 11, 2019, 3:36 PM

#

Yeah that is the issue - pandas can go pretty deep, but being able to parse the documentation for what you need is a good skill to get comfortable with

wide gyro Jun 11, 2019, 3:41 PM

#

I'm trying to use iterrows to compare some floats that have been selected and then run it through each line of the dataframe's same float

#

but i'm not sure how to access that df's float

#

I tried using at and iat but it says that i need integer indexers

#

and when i use loc and iloc it returns "KeyError: None of [Index] are in the [Index]"

lapis sequoia Jun 11, 2019, 4:14 PM

#

is Linear Regression more reliable or support vector machines?

#

i noticed that when i use SVM.SVR() in recognising a pattern it only works for certain extent after that it gets wrong where is LinearRegression() was on point when predicting the pattern

#

so which would be better to use when it comes to Predicting stock prices?

wide gyro Jun 11, 2019, 7:09 PM

#

Is there a way I can drop any values that are NaN but not those with an input of 0?

#

dropna for pandas appears to get rid of 0/NaN but some of the 0 values are good for me

hollow quartz Jun 11, 2019, 7:27 PM

#

Does pandas take account the missing values for the calcul of correlation?

desert oar Jun 11, 2019, 9:27 PM

#

@wide gyro thats not what chunksize does. Read the docs carefully

#

@wide gyro you are asking a lot of XY questions. Maybe start by saying what you are trying to do first, then tell us what specific thing isnt working

wide gyro Jun 11, 2019, 9:29 PM

#

Yeah I switched to nrows and it works fine @desert oar

sand reef Jun 11, 2019, 11:41 PM

#

@hollow quartz if you mean the Nan values, the values might not be taken, but their existence is very likely taken into account.

hollow quartz Jun 12, 2019, 1:07 AM

#

@sand reef if the line have a a missing value this line might not be taken?

lapis sequoia Jun 12, 2019, 6:40 AM

#

#Building Multiple Linear Regression Model
import statsmodels.formula.api as sm
#adding the orignal x to a column of ones so that ones column is at first
x = np.append(arr = np.ones((50,1)).astype(int), values = x, axis=1)
x_opt = x[:, [0, 1, 2, 3, 4, 5]]
regressor_OLS = sm.OLS(endog = y, exog = x_opt).fit()
regressor_OLS.summary()

#removing the index x with p values greater than SL 
x_opt = x[:, [0, 1, 3, 4, 5]]
regressor_OLS = sm.OLS(endog = y, exog = x_opt).fit()
regressor_OLS.summary()

x_opt = x[:, [0, 3, 4, 5]]
regressor_OLS = sm.OLS(endog = y, exog = x_opt).fit()
regressor_OLS.summary()

x_opt = x[:, [0, 3, 5]]
regressor_OLS = sm.OLS(endog = y, exog = x_opt).fit()
regressor_OLS.summary()

x_opt = x[:, [0, 3]]
regressor_OLS = sm.OLS(endog = y, exog = x_opt).fit()
regressor_OLS.summary()

#

Is there a way in which i can run a loop so that p values checks if its less than 0.05 and removes the variable automatically..

#

I know automated backward elimination can be done using r squared values.. but any way of p values

#

Also after i did the backward elimination now what.. do i create a new test set and training set and predict new values?

ripe sundial Jun 12, 2019, 9:08 AM

#

Heya. What exactly is the difference between MSE in keras and the loss? The MSE in my case is as seen in the image. I am not sure I understand why it is growing or if the values are good

📎 unknown.png

sand reef Jun 12, 2019, 9:50 AM

#

Mean squared error is growing means something is wrong. It should go down.

#

MSE is just basic telling if you are close or far. Loss is used to train your model.

#

@hollow quartz https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html

#

It has the answer to all such technicalities.

#

@lapis sequoia try making a list with those indexes. Now remove values from that list with your loop and condition. And then pass that list of indices into the x_opt = x[:, custom_list]

#

To remove items from list:
https://www.quora.com/How-do-I-remove-an-item-from-a-python-list

How to remove an item from a python list - Quora

You can remove an item from a list in three ways: 1. using list object's remove() method. Here you need to specify an item to be removed. If there are multiple occurrences, then the first such item is removed. This can be seen as removal by item's...

lapis sequoia Jun 12, 2019, 10:02 AM

#

We can also use .pop method right?

sand reef Jun 12, 2019, 10:02 AM

#

Well. Pop will only remove the last element of the list

lapis sequoia Jun 12, 2019, 10:02 AM

#

oh ok

sand reef Jun 12, 2019, 10:03 AM

#

Wait. I think you can use that.

#

I got it confused for the stack pop

lapis sequoia Jun 12, 2019, 10:03 AM

#

Ok...

sand reef Jun 12, 2019, 10:03 AM

#

Sorry about that

#

An error has occurred?

lapis sequoia Jun 12, 2019, 10:05 AM

#

Ok so what u r saying is I make a list with all my indices and then i compare p values?

#

and gradually remove it and then fit in x_opt

sand reef Jun 12, 2019, 10:06 AM

#

Yes

lapis sequoia Jun 12, 2019, 10:06 AM

#

ok

#

Now how will get p values?

sand reef Jun 12, 2019, 10:06 AM

#

Good question.

lapis sequoia Jun 12, 2019, 10:06 AM

#

Is there any particular command?

#

Ok wait i saw this somewhere..

#

regressor_OLS.pvalues[j].astype(float)

#

Is this any good?

sand reef Jun 12, 2019, 10:08 AM

#

Welp. You could try it.

#

I am not familiar with the api

lapis sequoia Jun 12, 2019, 10:09 AM

#

Ok well thanks for the idea tho.. i will try to implement this

sand reef Jun 12, 2019, 10:09 AM

#

No problem!

lapis sequoia Jun 12, 2019, 10:17 AM

#

def backwardElimination(x, sl):
    numVars = len(x[0])
    for i in range(0, numVars):
        regressor_OLS = sm.OLS(y, x).fit()
        maxVar = max(regressor_OLS.pvalues).astype(float)
        if maxVar > sl:
            for j in range(0, numVars - i):
                if (regressor_OLS.pvalues[j].astype(float) == maxVar):
                    x = np.delete(x, j, 1)
    regressor_OLS.summary()
    return x
 
SL = 0.05
X_opt = X[:, [0, 1, 2, 3, 4, 5]]
X_Modeled = backwardElimination(X_opt, SL)

#

@sand reef is this something it would look like? This was the code given to be done in automatic backward elimination but to be honest i didnt get it somewhat

#

Like the numVars = len(x[0])

sand reef Jun 12, 2019, 10:21 AM

#

Numvars = len(x[0]) means number of columns in x

lapis sequoia Jun 12, 2019, 10:23 AM

#

Ok..

sand reef Jun 12, 2019, 10:23 AM

#

Well. Kind of yes. It is kind of similar to what I suggested.

#

Instead of making a copy and assigning it to x_opt

#

It directly removes the index from x itself

#

Altho I am kind of afraid of the fact that this might throw a index out of range error

#

Nah it won't.

#

i is never used in indexing, so it won't throw.

#

So, what's the issue?

lapis sequoia Jun 12, 2019, 10:27 AM

#

Ok.. so i are all the entries in a particular line right?

sand reef Jun 12, 2019, 10:29 AM

#

Yus. Index of the columns.

lapis sequoia Jun 12, 2019, 10:30 AM

#

Ok and then j is picking up each entry in that line

#

ok this is a very noob question but why did we do numVars - i?

#

also just to confirm

sand reef Jun 12, 2019, 10:31 AM

#

No. J is still a column index.

lapis sequoia Jun 12, 2019, 10:32 AM

#

i will look something like this a b c d

sand reef Jun 12, 2019, 10:32 AM

#

Because if you see in np.delete(), we are passing it with axis =1

lapis sequoia Jun 12, 2019, 10:32 AM

#

and not this ```a
b
c
d

sand reef Jun 12, 2019, 10:33 AM

#

Well. The one above this comment is a column.

lapis sequoia Jun 12, 2019, 10:33 AM

#

ok .. sorry i am getting confused in loops..

#

So j is column index

sand reef Jun 12, 2019, 10:34 AM

#

Yes

#

So what is being done here is,

#

Max value of p for a column is being taken, and then compared. If a p value of a column is the same as the max p value, it is then removed from x

#

And the numvars - i

#

That is saying that, we are leaving i number of columns. Means we will not check the last i columns in that iteration.

lapis sequoia Jun 12, 2019, 10:37 AM

#

Oh ok.... now i get it

#

so in first for loop it does i=[a,b,c,d] takes out the max p value out of them

#

and then comparing it with sl so the first time i will be 0 so j will take all index .. and then delete the max value

sand reef Jun 12, 2019, 10:41 AM

#

Yes

lapis sequoia Jun 12, 2019, 10:42 AM

#

and then the second loop will continue till the condition is satisfied i.e. all the indexes with p values more than sl will be removed and then it will move onto next line starting from the first loop

#

right?

sand reef Jun 12, 2019, 10:43 AM

#

Yus

lapis sequoia Jun 12, 2019, 10:43 AM

#

Ok... jeez thanks a lot

sand reef Jun 12, 2019, 10:44 AM

#

Yeah. All values will be removed until either nothing remains or only columns with p values less than sl remain.

#

No problem! Now I am off.

lapis sequoia Jun 12, 2019, 10:45 AM

#

ok

wide gyro Jun 12, 2019, 1:08 PM

#

Do NaN values and 0 give the same result if you used that part of your data?

#

Like if I asked for a row of a column that contains 0 or NaN, will they both return 0?

ripe sundial Jun 12, 2019, 1:19 PM

#

@sand reef Would you expect it to look like this: Also what does the number 5.X mean? Is it a lot?

📎 unknown.png

#

Also @sand reef I do not use the MSE as loss but rather a metric (in Keras)

heavy tundra Jun 12, 2019, 2:14 PM

#

I'm trying to learn how to use Beautiful Soup by scraping data from Twitch.tv

#

Twitch.tv Top Channels Data Scrape

import csv
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

URL = "https://www.twitch.tv/directory/all"
uClient = uReq(URL)
page_html = uClient.read()
uClient.close()

page_soup = soup(page_html , "html.parser")
containers = page_soup.findAll("div",{"data-target" : "directory-game__card_container"})
containers2 = page_soup.findAll("div",{"class" : "tw-mg-b-2"})
print(len(containers))
print(len(containers2))

Twitch

Twitch is the world's leading video platform and community for gamers.

#

I got this far

#

but it says both containers have length 0

#

containers and containers 2 should be how I separate each of the channels on the top channels page but it doesn't pick them up

#

also I'm sorry I just dumped the code in chat, I don't know how to format it in this server

shut widget Jun 12, 2019, 2:17 PM

#

for x in v:
print(x)

#

dangit

#

how do you make it activate

#

oh

#

this isn't a help channel

#

@heavy tundra is this data science related?

heavy tundra Jun 12, 2019, 2:18 PM

#

yeah

#

I can go to a different channel though

shut widget Jun 12, 2019, 2:18 PM

#

seems like a scraping question to me

heavy tundra Jun 12, 2019, 2:18 PM

#

is that not data science

shut widget Jun 12, 2019, 2:19 PM

#

scraping? uh, no...?

heavy tundra Jun 12, 2019, 2:19 PM

#

it would lead to a dataset and statistics

shut widget Jun 12, 2019, 2:20 PM

#

yes but the scraping itself is not data science and that's what your question is about

hardy solstice Jun 12, 2019, 3:12 PM

#

Hello

#

I want to start in Data science carear

#

do i have to master np /panda or i can go through them as any lib

#

so i start in Data visual.. and ML

void anvil Jun 12, 2019, 3:26 PM

#

@supple ferry numba is lit. I'm seeing speed ups of 100-1000x with minimal work. You basically convert pandas objects to np objects and wrap loops in @jit functions.

earnest prawn Jun 12, 2019, 3:28 PM

#

@hardy solstice libs like numpy for example "just" provide implementations of common mathematical concepts, if you understand them mastering numpy is a question of reading the docs

hardy solstice Jun 12, 2019, 3:29 PM

#

this is what i mean, it is just normal lib where i can go through docs

#

but someone told me i have to master it as first step

earnest prawn Jun 12, 2019, 3:31 PM

#

if you know the maths behind it Id call bullshit on that

void anvil Jun 12, 2019, 3:31 PM

#

^

#

Just know how to use it to do your data transforms from A to B in a timely matter

hardy solstice Jun 12, 2019, 3:32 PM

#

oooh this explains alot

sand reef Jun 12, 2019, 3:38 PM

#

@ripe sundial yes that's what it's supposed to look like

#

5.x?

ripe sundial Jun 12, 2019, 4:43 PM

#

@sand reef so I found the error (no pun intended :D) thanks to @feral lodge Turns out MSE is not usually used for classification problems, this is why the MSE I linked earlier is wrong

sand reef Jun 12, 2019, 4:44 PM

#

I see. So you were classifying.

void anvil Jun 12, 2019, 4:51 PM

#

Is there an equivalent of np.argmin() that is compatible with numba or should I rewrite it in python and use it there?

sand reef Jun 12, 2019, 5:02 PM

#

https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html

#

It's mentioned there.

void anvil Jun 12, 2019, 5:10 PM

#

            index_min = np.argmin(close.iloc[i-25:i].values)       ```

#

I'm on the latest pip3 install of numba and it's not liking it

wide gyro Jun 12, 2019, 6:13 PM

#

I'd like to take a column of time in my dataframe that is set to Epoch, and change it to the exact date

#

which I accomplish in one of my functions

#

but how do I loop through an entire dataframe and run that time column through my epochTimeConverter function until there are no more rows

#

I tried this but I'm not sure if it'd work

#

df.created = CT.epochCre()
df.updated = CT.epochUpd()

#

would that put every value in created and updated column through their respective functions?

#

Got TypeError: epochCre() missing 1 required positional argument: 'self' when I tried to run it

void anvil Jun 12, 2019, 6:45 PM

#

I use dt.datetime.now()

lapis sequoia Jun 12, 2019, 7:13 PM

#

I just watched a YouTube ML program by some expert at Google. Man it's so intimidating to not know so many concepts in machine learning.
The code seems so unrealistic and difficult to understand. It makes me wanna give up.
I am a beginner, been studying for few months and feels like am no where near.

What to do when you're confidence level is down?

junior dragon Jun 12, 2019, 8:00 PM

#

Every beginner is same as you, you aren't alone

#

Listen to some music ig

steep herald Jun 13, 2019, 1:03 AM

#

Anyone awake that can answer a question about SaaS metrics?

#

Posting it in Help-5 if any of you can help

lapis sequoia Jun 13, 2019, 3:05 AM

#

hi

#

how do I drop rows in my df, if they contain datetime..

modest scarab Jun 13, 2019, 3:08 AM

#

@lapis sequoia is it already a csv?

#

most of the time, it's easier to manipulate the csv in SQL and then save it as a new csv file

lapis sequoia Jun 13, 2019, 3:54 AM

#

no it's not csv

#

it's in the dataframe

#

I read a text file to get this

#

im thinking I need to do something like

#

df = df.drop(df[df.text.str.contains].index)

#

df = df.drop(df[df.text.str.contains].index)

#

but not sure how to implement the contains here..for excluding datetime

steep herald Jun 13, 2019, 7:51 AM

#

subscription_start subscription_end
35:09.0
09:48.0
00:51.0
53:30.0 10:28.0

I am trying to apply ARPU & Churn rate.

but example 53:30.0 or format XX:XX.X doesn't look like any date format I can find online.

#

i think its most likely minutes:seconds

However start is sometimes bigger than end.

For instance

28:33.0 and 16:02.0

if there was a counter raising evertime it exceeds an hour I would understand. but there is nothing

lapis sequoia Jun 13, 2019, 9:12 AM

#

need code to help buddy..

foggy bridge Jun 13, 2019, 10:10 AM

#

Hello guys

sand reef Jun 13, 2019, 10:10 AM

#

Hoi

foggy bridge Jun 13, 2019, 10:11 AM

#

hey whats up

#

i wanted to ask

#

currently im learning pandas

sand reef Jun 13, 2019, 10:11 AM

#

Mmm?

foggy bridge Jun 13, 2019, 10:11 AM

#

and what would you guys recommend me

#

to train myself with

sand reef Jun 13, 2019, 10:12 AM

#

To train yourself with?

foggy bridge Jun 13, 2019, 10:12 AM

#

yeah

#

i mean

#

like a project so i could test my skills and improve

#

or test my knowledge

sand reef Jun 13, 2019, 10:13 AM

#

Well. There are a lot of things you can do.

#

And it depends where you want to go

#

Try googling for some projects

#

Related to the skills you've learnt I guess?

#

And try pulling them off.

foggy bridge Jun 13, 2019, 10:14 AM

#

@errorsans to be honest i like data analyzing that's why i dove into pandas, but the thing is i feel like lost

lyric canopy Jun 13, 2019, 10:15 AM

#

Maybe Kaggle's micro-courses are something for you to get a feel for the field: https://www.kaggle.com/learn/overview

sand reef Jun 13, 2019, 10:15 AM

#

I mean, well, then I guess I am the wrong person to be asking that then.

#

Cuz I literally am trying to do something like that myself.

foggy bridge Jun 13, 2019, 10:16 AM

#

@sand reef its ok , thank you !

#

@lyric canopy i just finished Sentdex tutorials

#

@lyric canopy will do! Thank you very much!

tender lance Jun 13, 2019, 11:27 AM

#

I'm trying to click on a HTML5 canvas.

#

or stimulate some clicks.

#

I can't get selenium basic to work, can someone point me in a direction to stimulate clicks and drags on a html5 canvas?

sleek otter Jun 13, 2019, 11:59 AM

#

Hi

#

im doing a project with python and dataframes, can i ask for a little help with pandas

olive willow Jun 13, 2019, 1:14 PM

#

just post the question dude @sleek otter

outer marsh Jun 13, 2019, 4:22 PM

#

Hey!

#

I'm writing a report where I have to explain MLE

#

But what's the intuition behind multiplying probabilities together if you've got a large dataset?

lapis sequoia Jun 13, 2019, 4:54 PM

#

any dot graphs making application?

shut widget Jun 13, 2019, 4:55 PM

#

@lapis sequoia what you're asking for makes no sense

lapis sequoia Jun 13, 2019, 4:55 PM

#

in the file i downloaded i dont see EXE file

shut widget Jun 13, 2019, 4:55 PM

#

the issue here is that you don't understand how to run programs, not that they all need to be "built"

#

there's a lot of exe files

#

in the zip

lapis sequoia Jun 13, 2019, 4:55 PM

#

none

#

that run the program

#

i think just some command lines that does nothing

shut widget Jun 13, 2019, 4:56 PM

#

they don't do nothing

lapis sequoia Jun 13, 2019, 4:56 PM

#

they dont run it

#

yeah i just want simply open program and write graph

#

📎 unknown.png

#

exactly like that

#

thats all i want

#

any app

#

anybody knows app for it?

desert oar Jun 13, 2019, 5:09 PM

#

https://dreampuf.github.io/GraphvizOnline/

#

http://www.webgraphviz.com/

#

http://sandbox.kidstrythisathome.com/erdos/

#

https://duckduckgo.com/?q=graphviz+generator+online

lapis sequoia Jun 13, 2019, 5:12 PM

#

those are good

#

but i want to insert image to graphs

#

and im pretty sure its not possible on web ones

desert oar Jun 13, 2019, 5:12 PM

#

that isn't generally possible

lapis sequoia Jun 13, 2019, 5:12 PM

#

it is

desert oar Jun 13, 2019, 5:12 PM

#

make a graph out of an image?

lapis sequoia Jun 13, 2019, 5:13 PM

#

no

#

adding images to graphs

desert oar Jun 13, 2019, 5:13 PM

#

oh. no im not aware of one either

#

just use an image editor?

lapis sequoia Jun 13, 2019, 5:13 PM

#

noo

desert oar Jun 13, 2019, 5:15 PM

#

@outer marsh it's just the arithmetic of probabilities. each data point is the realization of a random variable, and we assume those random variables are independent -- this is the "iid" assumption.

that means each observation is an "event". let's say you have a data set with 5 individuals, and you know their favorite ice cream flavor (chocolate or vanilla), and you know their age -- you can propose a linear model

Pr(Y_i = "chocolate" | AGE_i) = Bernoulli(p_i)
logit(p_i) = b1*AGE_i + b0

This is pretty typical of what you'd use maximum likelihood for.

Our data set might be like

age | flavor
----|----------
 21 | chocolate
 27 | vanilla
 18 | chocolate
 20 | vanilla
 30 | vanilla

which means we have 5 random variables (one per individual), and one event per individual (one data point per individual).

You can think of the entire dataset as one big event: the intersection of all the independent events that correspond to individual data points. Mathematically you might write it like this:

Dataset = Y_1 ∩ ... ∩ Y_5

so that

Pr(Dataset | AGE_1, ..., AGE_5 ) = Pr(Y_1 ∩ ... ∩ Y_5 | AGE_1, ..., AGE_5)

Then you just apply the usual rule for computing the probability of the intersection independent events.

which gives us

Pr(Dataset | AGE) = Pr(Y_1 | AGE_1) * ... * Pr(Y_5 | AGE_5)

lapis sequoia Jun 13, 2019, 5:17 PM

#

📎 unknown.png

#

definelly possible

outer marsh Jun 13, 2019, 5:21 PM

#

Hmmm

#

So it's like the probability of the dataset?

#

Like I don't get what the product of all these probabilities represents

#

@desert oar

desert oar Jun 13, 2019, 5:22 PM

#

You're exactly right, when you do maximum likelihood you are looking at the likelihood over the entire data

#

So the product of the probabilities you can interpret literally as the product of the intersection of all those events describing all the data points

#

Thinking about it that way also makes it obvious why independent and identically distributed are necessary assumptions

#

If they aren't identically distributed, you need a different expression for each data point, which is fine of course but you can't implement that as efficiently in a computer, and computing the gradient, not to mention the Hessian, is more involved

outer marsh Jun 13, 2019, 5:23 PM

#

Yeah I see

desert oar Jun 13, 2019, 5:24 PM

#

Whereas if they aren't independent, the whole expression falls apart

outer marsh Jun 13, 2019, 5:24 PM

#

The video I'm watchin (https://www.youtube.com/watch?v=I_dhPETvll8) describes the event as the variable x_1 took on the value x_1 and the variable x_2 took on the value of x_2 etc.

desert oar Jun 13, 2019, 5:24 PM

#

Yeah, although usually you distinguish the variable from its realization with capital and lowercase letters

#

x_1 is the realization of X_1

outer marsh Jun 13, 2019, 5:25 PM

#

Ah okay

desert oar Jun 13, 2019, 5:25 PM

#

Meaning the event is "X_1 = x_1"

#

Which of course has zero probability if X is continuous...

outer marsh Jun 13, 2019, 5:25 PM

#

Oh yeah he also does that, just isn't that clear

#

But why is that 🤔

desert oar Jun 13, 2019, 5:26 PM

#

It's kind of just how probability theory is set up. that's why the event thing isn't technically correct in general

#

Think about a number line, the point "1" is infinitely small, because there are an infinite number of real numbers that are arbitrarily close to it

outer marsh Jun 13, 2019, 5:27 PM

#

Yeah

desert oar Jun 13, 2019, 5:27 PM

#

This stuff gets very esoteric very quickly, but suffice it to say that an infinitely small interval must have probability zero

outer marsh Jun 13, 2019, 5:27 PM

#

Well yes

desert oar Jun 13, 2019, 5:27 PM

#

This is where probability density comes in

#

A probability density is the derivative of the distribution function, right? Well, a derivative is basically a function of an infinitely small interval, that's how they are defined

outer marsh Jun 13, 2019, 5:29 PM

#

Yeah

desert oar Jun 13, 2019, 5:29 PM

#

So even though the event itself is infinitely small and has zero probability, the way the stuff works is you can just drop in the probability density instead

#

And that's where you get the usual expression of multiplying the probability density for each point in the data set

outer marsh Jun 13, 2019, 5:30 PM

#

Hmmm

#

I think I'll ask my math teacher tomorrow, I think it's better if I can talk to someone irl

#

But thanks for the help

lapis sequoia Jun 13, 2019, 5:41 PM

#

@desert oar the lecturer said to use this line plt.plot(x, lin_reg2.predict(poly_reg.fit_transform(x)), color = 'blue')

#

instead of this plt.plot(x, lin_reg2.predict(x_poly), color = 'blue')

#

coz he said that then we can use the model for other dataset

#

so was he saying that if we had training and test set instead of the current dataset where we dont split it.... then x_poly would already have been assigned to something'?

#

the code is this

#

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

#importing dataset
dataset=pd.read_csv('Position_Salaries.csv')
x = dataset.iloc[:, 1:2].values
#if we give only index value i.e. 1 then it will return a vector rather than matrix
#hence we give range i.E. 1:2 rather than 1
y = dataset.iloc[:,2].values

#polynomial linear regression
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree = 2)
x_poly = poly_reg.fit_transform(x)
lin_reg2 = LinearRegression()
lin_reg2.fit(x_poly, y)

#plotting polynomial regression model
plt.scatter(x, y, color = 'red')
plt.plot(x, lin_reg2.predict(poly_reg.fit_transform(x)), color = 'blue')  #<----- this line
plt.title('truth vs bluff (without x_poly)')
plt.xlabel('Level')
plt.ylabel('Salary')
plt.show()

#another model using x_poly
plt.scatter(x, y, color = 'red')
plt.plot(x, lin_reg2.predict(x_poly), color = 'blue')  #<---- and this line
plt.title('truth vs bluff (with x_poly)')
plt.xlabel('Level')
plt.ylabel('Salary')
plt.show()

#

because no matter what the line is i get the same graph for both

desert oar Jun 13, 2019, 5:56 PM

#

the first one re-fits the poly reg model, the second one doesnt

lapis sequoia Jun 13, 2019, 6:03 PM

#

ok so if i had a dataset in which i had training and test set then would i understand this thing better?

#

also when i increase my degree the regression line keeps on improving but somewhere in q and a the lecturer said that if we increase the degree too much then i would overfit the model... how is that possible

desert oar Jun 13, 2019, 6:10 PM

#

i dont really understand your question about training and test sets

#

i also dont know why you would re-fit your model every plot...

#

err, oh. i see

#

maybe hes saying that if you want to use a different x, you can use the first one instead of hardcoding x_poly

#

that's an arbitrary distinction... just write a function if you need to reuse

#

as for the degree and overfitting... https://statisticsbyjim.com/regression/overfitting-regression-models/

lapis sequoia Jun 13, 2019, 6:12 PM

#

Okkk..

#

ok thanks

lapis sequoia Jun 13, 2019, 10:04 PM

#

does anyone have any reference graphics for ML metrics

#

need a quick revision..

lapis sequoia Jun 13, 2019, 10:48 PM

#

guys any favorite books or courses to start data science

onyx granite Jun 13, 2019, 11:50 PM

#

@lapis sequoia https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics-ebook/dp/B01IBM7790/ref=pd_sim_351_1/141-3990481-6878629?_encoding=UTF8&pd_rd_i=B01IBM7790&pd_rd_r=ffff19b8-8e35-11e9-bd2f-f9fbdb80feb9&pd_rd_w=yWzzk&pd_rd_wg=ovmZi&pf_rd_p=a098ee4c-2e0f-4821-b463-d4b049053104&pf_rd_r=S1GCVJVAN1H2YD17XWBW&psc=1&refRID=S1GCVJVAN1H2YD17XWBW

#

prereq for this is that you know basic statistics, calculus, and matrices

#

but a decent beginner book

lapis sequoia Jun 14, 2019, 12:52 AM

#

Will check it out

lapis sequoia Jun 14, 2019, 5:42 AM

#

Im stuck

#

how do I get my df to have equal number of rows per group..

warm orbit Jun 14, 2019, 7:41 AM

#

could someone dumb down the kalman filter for me pls lol

#

so that i can implement it in python

lean ledge Jun 14, 2019, 8:45 AM

#

It looks at the state variable (the one you're trying to measure), and assume it's measurement, it's changes etc are affected by noise that's Gaussian (follows a normal distribution)

#

And then it uses probability to predict the most likely state and standard deviation/variance given previous measurements, the current measurements and any "control" you put into it

#

@warm orbit

warm orbit Jun 14, 2019, 8:48 AM

#

i see

lean ledge Jun 14, 2019, 8:48 AM

#

It also assumes the current state is related to the control/previous state/measurement using a linear function

hazy hare Jun 14, 2019, 9:17 AM

#

hello guys, i stuck some problem i didnt fix 3-4 hours... when i try mean to convert my Reviews column (i write this code data['Reviews'] = data['Reviews'].astype('float') ) I face like that error message "You have categorical data, but your model needs something numerical. See our one hot encoding tutorial for a solution." I try to make One-Hot Encoding " but i face different error... If somebody help me, i ll glad, thank you

https://www.kaggle.com/berkeyilmaz/my-first-data-analysis

My First Data Analysis

Using data from Google Play Store Apps

lapis sequoia Jun 14, 2019, 10:08 AM

#

what is the error

hazy hare Jun 14, 2019, 10:23 AM

#

ValueError: could not convert string to float: '3.0M'

lapis sequoia Jun 14, 2019, 10:43 AM

#

says it's a string

#

is this from you trying to do one hot encoding?

#

what column does this value belong to

hazy hare Jun 14, 2019, 11:11 AM

#

📎 unknown.png

wide gyro Jun 14, 2019, 1:59 PM

#

@desert oar you available? I counted the columns of my rows in dataframe, and while most have 14, some have 13 or less. I'm assuming it's telling me the ones less than 14 contain NaN values

#

However, isn't 0 considered NaN? If that's the case, I wanted to know if I could remove rows that are missing an input, but keep those with 0 as those are useful

desert oar Jun 14, 2019, 5:00 PM

#

what do you mean 'counted the columns of my rows'

#

and why would 0 be NaN? NaN stands for "not a number" (part of the IEEE floating point specification, used/abused in pandas to represent missing data)

desert oar Jun 14, 2019, 5:55 PM

#

what do you mean "counted"

wide gyro Jun 14, 2019, 5:56 PM

#

I ran print(df.count(axis='columns')) and it showed 14 for some, 13 for others

desert oar Jun 14, 2019, 5:57 PM

#

do you know what .count does?

#

i recommend checking out the docs. i don't think it does what you think it does

#

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.count.html

#

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.count.html

wide gyro Jun 14, 2019, 5:57 PM

#

"Count non-NA cells for each column or row."

#

Where am I getting the 13 and 14 from then?

#

My dataframe holds 14 values

desert oar Jun 14, 2019, 5:58 PM

#


The values None, NaN, NaT, and optionally numpy.inf (depending on pandas.options.mode.use_inf_as_na) are considered NA.```

wide gyro Jun 14, 2019, 5:59 PM

#

I figured the ones that showed 13 were missing a value inside

desert oar Jun 14, 2019, 6:03 PM

#

exactly

#

cause thats what the function does..

wide gyro Jun 14, 2019, 6:05 PM

#

Is there a difference between dropna inplace and subset?

desert oar Jun 14, 2019, 6:05 PM

#

they do completely different things

#

what do the docs say

wide gyro Jun 14, 2019, 6:05 PM

#

`inplace : bool, default False
If True, do operation inplace and return None.

`

#

subset : array-like, optional Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.

#

so for subset, I would list the columns that I wanna check for missing values, and those according rows would be marked to be eliminated

#

I guess I don't really understand what inplace fully does

desert oar Jun 14, 2019, 6:51 PM

#

In place modifies the data frame, rather than creating a new data frame. It sounds like itd be faster or more efficient but in practice it's not. It will also be deprecated in 1.0 so don't bother with it

spark nimbus Jun 14, 2019, 7:35 PM

#

Live right now, in need of some help. Lacking some insight into what I'm doing wrong

desert oar Jun 14, 2019, 7:39 PM

#

@spark nimbus on stream? i cant join but if you describe here maybe i can help

spark nimbus Jun 14, 2019, 7:43 PM

#

so basically

#

I'm trying to play audio

#

when I play it all at once it works fine

#

but I want to play the next few samples every 20ms or so

#

and now it's all static-y

wide gyro Jun 14, 2019, 7:48 PM

#

is using dropna(subset=) realistic if I'm placing every column in the subset

desert oar Jun 14, 2019, 8:05 PM

#

@wide gyro what do you mean placing? if you omit subset= it uses every column by default

#

@spark nimbus is this a data science question?

#

not sure what you mean. is this about python? OBS?

wide gyro Jun 14, 2019, 8:07 PM

#

Oh so I don't need to add every column like subset='radio','mcc',etc.

#

didn't know that

desert oar Jun 14, 2019, 8:11 PM

#

yeah that one isn't explicitly mentioned in the docs, but the examples demonstrate how it's used

sand reef Jun 14, 2019, 8:16 PM

#

If I am not wrong, isn't NaN counted as 0 during computation?

spark nimbus Jun 14, 2019, 8:16 PM

#

@desert oar audio is closest to data science than anything else here

sand reef Jun 14, 2019, 8:18 PM

#

Say. I have learnt conv nets and all and rnns and all. What do I do now? Like what skills are needed for a decent job in this field? And is data science a completely different aspect all together? Or is it somewhat related?

#

Basically the coursera Machine Learning course and the Deep Learning Specialization.

desert oar Jun 14, 2019, 8:19 PM

#

@sand reef depends on the job

#

there arent many "junior ML engineer" roles out there afaik yet

#

@spark nimbus ok, unfortunately im not sure what the context for the problem is, nor am i an audio guy. good luck though

#

@sand reef also my career path has been very much not "machine learning oriented" -- so maybe i'm not the best to ask. i know that, honestly, i wouldn't hire a data scientist that's only done a couple online machine learning courses

#

i'd personally prefer someone with math and stats background, who can reason about data

#

data cleaning, missing values, basic statistical analysis, etc

#

and who knows how to code

#

if they have a math background i can teach them any of the fancy modeling they need to know

#

ideally someone who can write well and make good data visualizations too

#

maybe try a project now? something "end to end" where you have to choose a problem statement, get your own data, clean it yourself, come up with your own model, and then make some kind of report w/ your results

#

that's quite a bit of work but if i was hiring at the very basic junior level it would make me more interested

lean ledge Jun 14, 2019, 8:46 PM

#

@sand reef if you're going for a computer vision role, unless you're familiar and up to date with research, I don't think you'll find many to hire you. That means being familiar with Resnet, ResNext, Alexnet, VGG, GoogLenet InceptionNet, unet, (Fast/Faster) RCNM + knowing about GANs, NLP based models description models, etc etc. Computer vision is sort of high barrier of entry and you should definitely be familiar enough that you can pick up random papers from CVPR and understand them

#

If you're going for generic deep learning, well... You'll be disappointed because no one actually uses just deep learning that much. It's used in research, and for CV, NLP and then a few things here and there. Outside of that, it either works worse or there isn't enough data for it

#

The only reason it's popular as it is is because people like the sound of "neural networks" and because it's easier to learn without much maths.

prisma verge Jun 14, 2019, 11:18 PM

#

so, uh, i have no idea what i'm doing since i'm bad at ml
can anybody explain what should i do so it predicts continuation of csv file?

#

https://pastebin.com/JsD7meme

Pastebin

import numpy as np import pandas as pd from keras.models import...

#

📎 l.csv

#

1 and 2 mean wins (though 1 = safer, 2 = closer to lose), while 3 means lose, 4 5 6 - are teams, mostleft column is number of round

#

i'm quite bad at ml but i just wanna try to predict some stuff

#

would be nice to improve knowledge of myself with this project
also, anyone got good keras books?

#

i just have no idea how to predict it

warm orbit Jun 14, 2019, 11:34 PM

#

@lean ledge is there an advantage to using a kalman filter over a linear regression

#

i think they both identify a linear trend with gaussian noise

lean ledge Jun 14, 2019, 11:53 PM

#

@warm orbit they don't do the same thing at all

#

Nothing alike

warm orbit Jun 14, 2019, 11:53 PM

#

yes kalman just predicts the next one right

lean ledge Jun 14, 2019, 11:57 PM

#

linear regression is finding an approximate A for a given
y=Ax
When y and x are known for a lot of examples

Kalman filter is finding y_(k+1) for a
x_(k+1) = Ax_k + Bu_k + Ω
y_(k+1) = Cx_(k+1) + μ
Where μ and Ω are Gaussian noise vectors
Given previous values of x, y and known A, B, C

#

@warm orbit

#

I think you should really just study a bunch of maths

warm orbit Jun 14, 2019, 11:58 PM

#

lol

#

thanks

lean ledge Jun 14, 2019, 11:58 PM

#

Kalman filters are generally the kind of stuff you learn in a 3rd year electrical engineering (signal processing) class

#

They're the most basic form of Bayesian filters and still aren't very approachable

lapis sequoia Jun 15, 2019, 12:20 AM

#

that's what I told him yesterday..

queen vigil Jun 15, 2019, 2:21 AM

#

what's a good example to start learning about neural networks

#

i know the concept but i need a project/example to do it on

#

im thinking a game for the computer to play or processing images but idk

reef bone Jun 15, 2019, 2:37 AM

#

i think the MNIST handwritten digits one is pretty much canonical in terms of image classification
http://yann.lecun.com/exdb/mnist/
this is a good starter dataset imo, the samples are small which makes it feasible to quickly retrain and play with parameters of your network, and you can easily find others' implementations and see what they did differently
otherwise this is a good repository of commonly used datasets
https://archive.ics.uci.edu/ml/index.php
and of course you can check out kaggle too

queen vigil Jun 15, 2019, 2:41 AM

#

ive seen sentdex use mnist in his tensorflow tutorial series but i wanna try my own thing instead of just following the tutorial blindly

small ore Jun 15, 2019, 2:44 AM

#

Is NN needed for handwriting recognition or will just normal regresson work?

reef bone Jun 15, 2019, 2:53 AM

#

regression is a problem, rather than an algorithm, and digit recognition is inherently a classification task

#

you could use logistic regression to classify with one vs all

#

after all, logistic regression is essentially a sigmoid function, and you often see the sigmoid used as an activation function in simple networks

#

so you can think of a simple NN as multiple sigmoids connected together, which allows them each to learn a distinct relationship and work together to solve a more complex problem

sand reef Jun 15, 2019, 7:52 AM

#

So, if I did contribute in a research paper, will that boost me up?

prisma verge Jun 15, 2019, 10:31 AM

#

... honestly, what i'm doing wrong

📎 unknown.png

sand reef Jun 15, 2019, 10:33 AM

#

Well, whats the error?

prisma verge Jun 15, 2019, 10:33 AM

#

it gives absurd -16 loss

#

and predicts nan values

sand reef Jun 15, 2019, 10:33 AM

#

mmm

#

Y = dataset[:, 3]```

#

You are using the Y values in the X values?

prisma verge Jun 15, 2019, 10:35 AM

#

uhhh, have no idea honestly

#

it doesn't seem to touch y values on slice

sand reef Jun 15, 2019, 10:36 AM

#

Well, according to the code, you are taking the first 10 columns into X

#

and the third column into Y

#

*11 columns for X

prisma verge Jun 15, 2019, 10:36 AM

#

huh

#

let me try

sand reef Jun 15, 2019, 10:37 AM

#

no, its 10 only, my bad xD

#

but yeah, the Y is being used to train?

prisma verge Jun 15, 2019, 10:37 AM

#

i'm just trying to finetune existing network to predict my csv

#

i have very little idea behind ml honestly

sand reef Jun 15, 2019, 10:39 AM

#

well, I can't seem to pinpoint the major issue here other than the labels being a part of the training data

prisma verge Jun 15, 2019, 10:40 AM

#

should i bring up csv here?

#

maybe it'll help?

sand reef Jun 15, 2019, 10:40 AM

#

i guess?

prisma verge Jun 15, 2019, 10:40 AM

#

📎 m.csv

sand reef Jun 15, 2019, 10:40 AM

#

maybe i'll try to run it and see

prisma verge Jun 15, 2019, 10:41 AM

#

from sklearn.model_selection import train_test_split
from sklearn import preprocessing
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.utils import to_categorical

df = pd.read_csv('m.csv')
dataset = df.values
X = dataset
Y = dataset
min_max_scaler = preprocessing.MinMaxScaler()
X_scale = min_max_scaler.fit_transform(X)
print(X_scale)
X_train, X_val_and_test, Y_train, Y_val_and_test = train_test_split(X, Y, test_size=0.3)
X_val, X_test, Y_val, Y_test = train_test_split(X_val_and_test, Y_val_and_test, test_size=0.5)

model = Sequential([Dense(32, activation='relu', input_shape=(4,)), 
                    Dense(32, activation='relu', input_shape=(4,)),
                    Dense(4, activation='sigmoid')])
model.compile(optimizer='sgd',
              loss='binary_crossentropy',
              metrics=['accuracy'])
hist = model.fit(X_train, Y_train,
                 batch_size=1, 
                 epochs=100,          
                 validation_data=(X_val, Y_val))
print(model.predict(X_scale))

sand reef Jun 15, 2019, 10:43 AM

#

welp, gimme a sec to download sklearn

prisma verge Jun 15, 2019, 10:49 AM

#

any comments so far?

lapis sequoia Jun 15, 2019, 10:59 AM

#

what are you trying to do

#

why does your X and Y both point to dataset

prisma verge Jun 15, 2019, 11:06 AM

#

i'm trying to make nn that'd predict the contents of csv

#

and i have no idea, because i'm bad at ml actually and just trying to finetune existing network

#

before it was
X = dataset[:,0:10]
Y = dataset[:,3]

#

but that didn't work

#

though it still doesn't

#

also, @sand reef, i'm calling you!
since i didn't got any far by changing layers

sand reef Jun 15, 2019, 11:13 AM

#

mew?

#

ah yes

#

i am having an issue here

#

for some reason, its not importing scipy

#

even tho i have it installed

prisma verge Jun 15, 2019, 11:17 AM

#

huh

#

try http://colab.research.google.com

Google Colaboratory

#

or intsall anaconda since afaik it has this stuff

#

... or just stop wasting too much power to help me lol

sand reef Jun 15, 2019, 11:18 AM

#

well to instal anaconda means to set up a lot of stuff lol

prisma verge Jun 15, 2019, 11:18 AM

#

i didn't use it tbh so don't know how hard it is

#

i'm just using colab since it provides free gpu

sand reef Jun 15, 2019, 11:18 AM

#

yeah thats there..

#

wellp, i'll try it out on colab

#

what i dont like about colab is the tensorboard issue\

prisma verge Jun 15, 2019, 11:20 AM

#

have no idea what it is, works finely with keras for me

sand reef Jun 15, 2019, 11:20 AM

#

tensorboard?

prisma verge Jun 15, 2019, 11:22 AM

#

yeh

sand reef Jun 15, 2019, 11:23 AM

#

well, its basically to see the progress of your model and how its going

#

but tensorboard for some reason has some issues on google colab

prisma verge Jun 15, 2019, 11:23 AM

#

keras outputs it automatically on calling fit method

lapis sequoia Jun 15, 2019, 11:23 AM

#

can someone explain me what a dimension is?

sand reef Jun 15, 2019, 11:24 AM

#

like length, breadth and height. Matrices can have those too.

lapis sequoia Jun 15, 2019, 11:25 AM

#

how is input_shape(781,)
same as
input_dim=781

sand reef Jun 15, 2019, 11:26 AM

#

well, @prisma verge

#

there is one issue, its reading the head also, which has NaN in it

prisma verge Jun 15, 2019, 11:26 AM

#

that's the only problem?

sand reef Jun 15, 2019, 11:27 AM

#

@lapis sequoia (781, ) thats how one dimensional matrices are represented

#

meaning there are 781 elements in it

prisma verge Jun 15, 2019, 11:27 AM

#

removed nans

#

gonna try it now

sand reef Jun 15, 2019, 11:27 AM

#

okay

prisma verge Jun 15, 2019, 11:27 AM

#

how do i define x and y though?

#

i'm really not sure which should go in y and which should go to x

sand reef Jun 15, 2019, 11:27 AM

#

well, you need to figure out that by reading the csv file

lapis sequoia Jun 15, 2019, 11:27 AM

#

so input_dim takes number of elements present ?

sand reef Jun 15, 2019, 11:28 AM

#

no, it takes the shape of the input matrix

prisma verge Jun 15, 2019, 11:28 AM

#

csv represents 47 columns with 4 rows

#

i just wanna know that so i can know it for future projects

#

and get deeper into ml

lapis sequoia Jun 15, 2019, 11:28 AM

#

if possible can u give an example please :) ?

sand reef Jun 15, 2019, 11:28 AM

#

okay. meaning you have 4 examples?

prisma verge Jun 15, 2019, 11:28 AM

#

just i learn best by practice, haha

sand reef Jun 15, 2019, 11:29 AM

#

@lapis sequoia well, like, for example, you want to input an image

prisma verge Jun 15, 2019, 11:29 AM

#

welp, yeah
first one goes from 1 to 47, and the others have 1, 2, and 3 in "random" sequences every

#

because those are logs from one game which has 2/3 chances to win

#

1 and 2 means wins, 3 means lose

sand reef Jun 15, 2019, 11:29 AM

#

a gray image is of the shape (200, 200, 1)

prisma verge Jun 15, 2019, 11:29 AM

#

47,1,3,1
46,1,3,2
45,3,1,2
44,2,1,3
43,2,1,3
42,2,1,3
41,2,3,1
40,3,2,1
39,1,3,2
38,2,1,3
37,2,3,1
36,1,2,3
35,1,3,2
34,2,3,1
33,2,1,3
32,1,3,2
31,2,1,3
30,2,1,3
29,3,2,1
28,1,2,3
27,1,3,2
26,3,1,2
25,1,3,2
24,3,2,1
23,3,2,1
22,2,3,1
21,3,2,1
20,3,2,1
19,2,1,3
18,2,3,1
17,2,1,3
16,3,2,1
15,2,1,3
14,2,3,1
13,1,2,3
12,1,3,2
11,1,3,2
10,1,3,2
9,3,2,1
8,2,1,3
7,2,3,1
6,2,1,3
5,1,2,3
4,1,2,3
3,3,2,1
2,3,2,1
1,2,3,1

lapis sequoia Jun 15, 2019, 11:29 AM

#

so its dimension is 3?

prisma verge Jun 15, 2019, 11:29 AM

#

kinda like that it looks

sand reef Jun 15, 2019, 11:29 AM

#

yes, its a 3D matrix

#

a color image is (200, 200, 3)

prisma verge Jun 15, 2019, 11:30 AM

#

also, removing nans didn't help

sand reef Jun 15, 2019, 11:30 AM

#

so tell me something about your csv

#

does your csv have 4 features and 47 examples?

#

or 4 examples nad 47 features?

prisma verge Jun 15, 2019, 11:31 AM

#

how should i know that?

#

i guess 4 features and 47 examples

sand reef Jun 15, 2019, 11:32 AM

#

well, you made the csv or know about its origin right?

prisma verge Jun 15, 2019, 11:32 AM

#

yeah, i do

lapis sequoia Jun 15, 2019, 11:32 AM

#

but in this yt video https://www.youtube.com/watch?v=VGCHcgmZu24
timestamp 9:05-9:15
what this dude says is kinda opposite can u explain to me if u dont mind?

YouTube

Data Talks

Sequential Model - Keras

Here we go over the sequential model, the basic building block of doing anything that's related to Deep Learning in Keras. (this is super important to unders...

▶ Play video

prisma verge Jun 15, 2019, 11:32 AM

#

original table looks like that

📎 unknown.png

#

biggest team means lose, middle means close to lose but wins, and smallest means highest chances to win
then i've changed them to numbers and removed rounds

#

well, not removed rounds, removed word round

#

now i wanna make ai predict from that csv

sand reef Jun 15, 2019, 11:33 AM

#

I see

prisma verge Jun 15, 2019, 11:33 AM

#

because it'd be amazing experience to machine learning, and also very useful

#

i know that predictions won't predict reality, but i wanna make it at least as project for fun

lapis sequoia Jun 15, 2019, 11:34 AM

#

9:05*

sand reef Jun 15, 2019, 11:34 AM

#

oh that thing

#data-science-and-ml

Twitch.tv Top Channels Data Scrape