#data-science-and-ml

1 messages ยท Page 264 of 1

past pewter
#

There are some keywords

#

key phrases

#

But not a set number, and the order matters for when and where to split

unreal lily
#

Hey guys, not sure if this is the right place to ask for help, but here it goes: I want to make a small "app" to read data from a database. That I know how to do, but my problem is that I want to show it on a graphic, anyone free to help me?

lapis sequoia
#

!rank

arctic wedgeBOT
#

Iterating over range(len(...)) is a common approach to accessing each item in an ordered collection.

for i in range(len(my_list)):
    do_something(my_list[i])

The pythonic syntax is much simpler, and is guaranteed to produce elements in the same order:

for item in my_list:
    do_something(item)

Python has other solutions for cases when the index itself might be needed. To get the element at the same index from two or more lists, use zip. To get both the index and the element at that index, use enumerate.

willow rose
#

Anyone here familiar with DVC ? I got an ML project (gonna be creating a CNN for a final), and we're wondering if it's worth learning to use versus just using regular git
im not familiar with some of the things theyre pointing out in their Getting Started tutorial, so im a little lost with how we can integrate this with out project

mild glade
#

@steel palm

real wigeon
#

i need to write a script to download a pandas df

#

like i have a flask app, it queries mysql db, places the result in a pandas df

#

i then convert it to xls, because my goal is to have the user be able to downlaod that xls file

austere swift
#

idk how flask works so idk how youd get them to download the file but you can convert the df to xls by using df.to_excel()

hollow ferry
#

Hello guys, is here someone willing to help me how to make appending to a list from csv.reader faster? I profiled my code and 90% of time is in a block of if-else with type control and then appending to a list. It is kinda annyoing as i have aroun 500k values

hearty token
#

@hollow ferry Maybe use pandas.read_csv() instead? Then do type conversions one column at a time, and export contents to lists if that's how you want them

hollow ferry
#

i am supposed to use csv.reader only :/ but i dont think its problem with csv reader i think i am just looping to much and appending too much thinkmon

hearty token
#

Ah. Well what's the code?

hollow ferry
#
reader = csv.reader(
                        TextIOWrapper(
                            opened_csv,
                            'iso8859-2'),
                        delimiter=';',
                        quotechar='"')
                    all_rows = list(reader)
                    for row in all_rows:
                        for i, value in enumerate(row):
                            if data_types[i] == float:
                                try:
                                    region_data[1][i].append(
                                        float(value.replace(',', '.')))
                                except ValueError:
                                    region_data[1][i].append(np.nan)
                            elif data_types[i] == int:
                                try:
                                    region_data[1][i].append(int(value))
                                except ValueError:
                                    region_data[1][i].append(-999999)
                            else:
                                region_data[1][i].append(value)```
#

i have to do there that replacements and type controls...also data_types[] is a list with desired types matching column order

hearty token
#

What's with the TextIOWrapper? Also, you don't need to make a list of all rows, you can iterate over the reader:
for row in reader:

hollow ferry
#

oh i am opening csv inside zip files and without that wrapper it screamed some bytes error

#

yeah you were right with that unnecessary conversion to list, but still it wont make much difference

#

maybe around 5%

#

or my computer has warmed up ๐Ÿ˜„ and its no diff

sweet owl
#

umm

#

how would you

#

do backpropogation for this

#
from matplotlib import pyplot as plt

def NN(m1,m2,w1,w2,b):
  z = m1*w1+m2*w2+b
  return sigmoid(z)

def sigmoid(x):
  return 1/(1+numpy.exp(-x))

w1 = numpy.random.randn()
w2 = numpy.random.randn()
b = numpy.random.randn()
#

given that the input is a list of lists with two elements

#

??

final garnet
#
self.w_h -= self.eta * delta_w_h
self.b_h -= self.eta * delta_b_h
self.w_out -= self.eta * delta_w_out
self.b_out -= self.eta * delta_b_out
sweet owl
#

NN(b)

#

(b-target)^2

#

cost = (prediction-target)^2

final garnet
sweet owl
#

NN(3.5,4,w1,w2,b)

#

1

final garnet
#

z = dot(x, w) = dot(wT, x)
o = step(z)

#
dot = x1*w1+x2*w2
#

derivative = o * (1. - o)

sweet owl
#

[[3,1.5,1],

#

I can only type now

#

@final garnet I can only type now sorry

#

hmm

#

sorry

#

umm

#

sorry I have to go

#

I can still type

#

tho

#

@final garnet

final garnet
#

@sweet owl

# Calculate neuron activation for an input
def activate(weights, inputs):
    activation = weights[-1]
    for i in range(len(weights)-1):
        activation += weights[i] * inputs[i]
    return activation

# Transfer neuron activation
def transfer(activation):
    return 1.0 / (1.0 + exp(-activation))

# Forward propagate input to a network output
def forward_propagate(network, row):
    inputs = row
    for layer in network:
        new_inputs = []
        for neuron in layer:
            activation = activate(neuron['weights'], inputs)
            neuron['output'] = transfer(activation)
            new_inputs.append(neuron['output'])
        inputs = new_inputs
    return inputs

# Calculate the derivative of an neuron output
def transfer_derivative(output):
    return output * (1.0 - output)

# Backpropagate error and store in neurons
def backward_propagate_error(network, expected):
    for i in reversed(range(len(network))):
        layer = network[i]
        errors = list()
        if i != len(network)-1:
            for j in range(len(layer)):
                error = 0.0
                for neuron in network[i + 1]:
                    error += (neuron['weights'][j] * neuron['delta'])
                errors.append(error)
        else:
            for j in range(len(layer)):
                neuron = layer[j]
                errors.append(expected[j] - neuron['output'])
        for j in range(len(layer)):
            neuron = layer[j]
            neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])
sweet owl
#

thanks

half mountain
#

Hi, I am trying to web scrape data and this is my first project. Here is the HTML I am trying to scrape.
<tr>
<td class="posterColumn">
<span data-value="47" name="rk">
</span>
<span data-value="8.459750370323722" name="ir">
</span>
<span data-value="1.1610432E12" name="us">
</span>
<span data-value="1166711" name="nv">
</span>
<span data-value="-2.5402496296762784" name="ur">
</span>
<a href="/title/tt0482571/">
<img alt="The Prestige" height="67" src="https://m.media-amazon.com/images/M/MV5BMjA4NDI0MTIxNF5BMl5BanBnXkFtZTYwNTM0MzY2._V1_UY67_CR0,0,45,67_AL_.jpg" width="45"/>
</a>
</td>
<td class="titleColumn">
47.
<a href="/title/tt0482571/" title="Christopher Nolan (dir.), Christian Bale, Hugh Jackman">
The Prestige
</a>
<span class="secondaryInfo">
(2006)
</span>
</td>
I bolded the title I want to get. Here is my code:
def movieLinks(content, classTitle):
table = content.find('table', {'class': classTitle})
rows = table.find_all('tr')

for row in rows:
    cells = row.find_all('td')
    if len(cells) > 1:
        movie_link = cells[1].find('a')
        print(movie_link.get(title'))

This is the output:
Christopher Nolan (dir.), Christian Bale, Hugh Jackman
I do not want this information though. I just want the movie title. Any idea of how to do this?

hollow sentinel
#

you using bs4?

#

this might be helpful

#

I've still been using Kaggle datasets lol

half mountain
#

I am following a tutorial and it had me using beautiful soup for a couple parts but not this part. I just don't know how to identify the title portion

hollow sentinel
#

idk haha maybe someone else knows

half mountain
#

I will watch these videos though. Maybe it will contain the answer. Thanks for the help

hollow sentinel
#

no problem

hollow sentinel
#

I'm getting this ValueError: could not convert string to float: 'Pentax Optio 430RS'

#

when i try to run this lm.fit(X_train,y_train)

#

i thought I dropped the model column

#
y = cameras["Price"]
X = cameras.drop(["Price"],axis=1)
#

I thought I dropped the models Idk why it's freaking out now

#

send help pls

chrome barn
#

@half mountain ```
test = ''' <td class="titleColumn">
47.
<a href="/title/tt0482571/" title="Christopher Nolan (dir.), Christian Bale, Hugh Jackman">
The Prestige
</a>
<span class="secondaryInfo">
(2006)
</span>
</td> '''
title = []
parse = BeautifulSoup(test, "html.parser")
cells = parse.find_all('td')
for x in cells:
title.append(x.get_text(separator=','))

#

it will get you all the text values including the title

#

if the structure of the whole html is the same you could easily remove the unwanted data so that in the end you only have left the titles

hollow sentinel
#

if anyone figures it out lmk

chrome barn
#

@hollow sentinel because your trying to convert a string to a float, so instead maybe try to use a labelencoder for example to convert string values to numerical values to it can be used in your model

hollow sentinel
#

idk how to do that

chrome barn
hollow sentinel
#

@chrome barn what does it mean to normalize values

chrome barn
hollow sentinel
#

i'm confused on how to use it lmao

#
ValueError: bad input shape (1038, 13)
#
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(cameras)
y = cameras["Price"]
X = cameras.drop(["Price"],axis=1)
chrome barn
#

what is the shape of x and y

#

also do df['camera'] = le.fit_transform(df['cameras'])

hollow sentinel
#

y is (1038,)

#

X is (1038,12)

chrome barn
#

and what is the full error you get now

hollow sentinel
#

ValueError: bad input shape (1038, 13)

#
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
df["camera"] = le.fit_transform(cameras)
y = cameras["Price"]
X = cameras.drop(["Price"],axis=1)
#y.shape
#X.shape 
#

i don't get the df["camera"] = le.fit_transform(cameras) line

#

cameras is the entire dataframe

chrome barn
#

which column was giving the error before about the string to float error

#

that is the column that you use the label encoder on

hollow sentinel
#
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit_transform(cameras["Model"])
y = cameras["Price"]
X = cameras.drop(["Price"],axis=1)
#y.shape
#X.shape
#

it was the model column

#

but i still have the same error

#

ValueError: could not convert string to float: 'Pentax Optio 430RS'

chrome barn
#

le.fit_transform(cameras["Model"]) this doesn't do anything you have to assign to to the dataframe back again

#

cameras['model_label'] = le.fit_transform(cameras["Model"])

#

then drop the Model column

hollow sentinel
#

IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

#

i dropped the model column before in my code

#

should i delete that line to drop it there

chrome barn
#

what is the code that you dropped it with?

hollow sentinel
#

cameras.drop(["Model"],axis=1)

chrome barn
#

it didn't drop it

#

cameras = cameras.drop(["Model"],axis=1)

#

or cameras.drop(["Model"],axis=1, inplace=True)

hollow sentinel
#

AttributeError: 'numpy.ndarray' object has no attribute 'drop'

chrome barn
#

your dataframe is probably not a pandas dataframe object anymore but has been converted to an numpy array

hollow sentinel
#

uhhhhhhhhh

chrome barn
#

your using jupyter notebook to work in

hollow sentinel
#

yes

chrome barn
#

reload the workbook and start at the top again

#

so all variables will be reset again

hollow sentinel
#

KeyError: "['Model'] not found in axis

#

it's spelled correctly

#

tf

chrome barn
#

how does the whole code look like because this way it is hard to understand what you are doing

hollow sentinel
#
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
cameras = pd.read_csv("camera_dataset.csv")
cameras.head()
cameras.tail()
cameras.isnull().sum()
cameras= cameras.drop(["Model"],axis=1, inplace=True)
sns.countplot(x=cameras["Max resolution"],data=cameras)
sns.jointplot(x="Max resolution", y="Low resolution", data=cameras)
sns.lmplot(x="Max resolution", y="Low resolution", data=cameras)
sns.heatmap(cameras.corr())
sns.pairplot(cameras)
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
cameras["model_label"] = le.fit_transform(cameras["Model"])
cameras.drop(["model"],axis=1)
y = cameras["Price"]
X = cameras.drop(["Price"],axis=1)
#y.shape
#X.shape
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
from sklearn.linear_model import LinearRegression
lm = LinearRegression()
lm.fit(X_train,y_train)
#

sorry it's long

chrome barn
#

there are 3 options on how to do a drop in a dataframe

cameras = cameras.drop(["Model"],axis=1)
cameras.drop(["Model"],axis=1, inplace=True)
cameras.drop(columns=['Model'], inplace = true)
#

cameras= cameras.drop(["Model"],axis=1, inplace=True) replace with cameras.drop(["Model"],axis=1, inplace=True)

#

also your trying to remove the model twice

#

first in row 9 and then later again in row 18

#

you can only drop a column once

#

also since the model column is dropped remove the whole label encoded

#

encoder

#

because you got the string error before because you didn't remove the model column properly from the dataframe

hollow sentinel
#

KeyError: "['Model'] not found in axis"

#

i have that from when I tried to remove it in the first place

chrome barn
#

cameras.drop(["Model"],axis=1, inplace=True)

#

use that not cameras= cameras.drop(["Model"],axis=1, inplace=True)

#

remove the cameras = part

hollow sentinel
#

cameras.drop(["Model"],axis=1, inplace=True)

#

that's what I'm using

#

nope i'm still getting KeyError: "['Model'] not found in axis"

chrome barn
#

is the dataset from a public source?

hollow sentinel
#

yep kaggle

chrome barn
#

can you link the url to the kaggle dataset that your using so i know for sure that we are looking at the same dataset

hollow sentinel
chrome barn
#

give me a moment

austere swift
#

@hollow sentinel i had this issue before, try doing cameras.drop([cameras.columns[0]], axis=1, inplace=True)

chrome barn
#

did you reload your notebook again?

#

because probably your trying to delete the model column while it is already deleted

austere swift
#

could be that

hollow sentinel
#

KeyError: 'Model'

chrome barn
#
import pandas as pd
cameras = pd.read_csv('camera_dataset.csv')
cameras.drop(["Model"],axis=1, inplace=True)
hollow sentinel
#

which means it's no longer there

chrome barn
#

works for me drops the model column

hollow sentinel
#

should i close the notebook and reopen it?

austere swift
#

when i had the issue it was because the name of the column had some sort of whitespace or encoded character, so when i called it on the column name from df.columns it worked

chrome barn
#

he probably just needs to reload the workbook

#

or execute the first lines again

austere swift
#

yeah his issue is probably that he already removed the column

chrome barn
#

yes

hollow sentinel
#

oh i think i got it lmao

#

the key error was bc i was trying to do a seaborn countplot with the Model column

#

F

#

ayyyyy it works thank you guys

#

y'all are the best

chrome barn
#

your initial error with the string to float was caused that you didn't remove the Model column properly in your code

hollow sentinel
#

yes

#

I have trouble with .drop

#

sometimes it doesn't work properly

chrome barn
#

did you also remove the label encoder in your code

#

cameras = cameras.drop(["Model"],axis=1)
cameras.drop(["Model"],axis=1, inplace=True)
cameras.drop(columns=['Model'], inplace = true)

#

remember choose 1 for dropping in the future

hollow sentinel
#

got it

#

and yes the label encoder is gone

#

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

#

UHHHHHHH

austere swift
#

theres probably nans in there

hollow sentinel
#

.isnull().sum() returns how many NANs there are right

austere swift
#

yeah

chrome barn
#

there are a few nan's not that many

austere swift
#

yeah if theres only a few you can just drop them

hollow sentinel
#

yeah

#

.dropna?

austere swift
#

df.dropna(inplace=True)

#

yep

chrome barn
#

or fillna

austere swift
#

well that would be bad in this case

#

cus you cant really average the columns and just assume the specs of a camera

#

but since there arent many then dropna wouldnt affect it much

hollow sentinel
#

that makes sense

chrome barn
#

it will go outside the scope but you could like at the model and see if the nan values have a comparable camera type to make a prediction from for the nan

#

but in this case it won't matter to drop them

hollow sentinel
#

what is a heatmap and a pairplot supposed to show

#

like the darker the color is in the heatmap means ...

austere swift
#

its a scale of the values

#

iirc darker is higher value

chrome barn
#

also to take into account next time daspecito you also have 0.0 values in your columns for example in weight

#

from a common sense the weight of a camera is not 0 but now your putting it in the model

#

0.0 then means probably it was also a nan values but somebody before you already put those nan values at 0

#

so in your current value you have nan classified as empty and 0.0

#

value=dataset

hollow sentinel
#

alright

#

what does weight mean tho

chrome barn
#

like what it weighs a human is 100kg like that

hollow sentinel
#

oh you mean like a column in the dataframe

#

i mean like weight in general in machine learning

chrome barn
#

oh no i am talking about your dataset

#

and since your doing linear regression the 0.0 values in your dataset that don't make sense are essentially nan values and will affect the model

#

but that is for next time just try to get the model/code to work for now

craggy belfry
#

anyone have a favorite or obscure pandas tricks? I just learned about read_clipboard() and to_clipboard() the other day. Curious what other cool and lesser known tricks are out there

river oasis
#

there was a whole post of obscure tricks including those on reddit r/learnpython a couple weeks back

rigid phoenix
#

can someone explain the folowing line of code. it will crate a list with the moving average, x being the list and n the number of added values: pd.Series(x).rolling(window=n).mean().iloc[n-1:].values

neat basin
rigid phoenix
#

@neat basin do you have an easy explination for rolling window calculations?

neat basin
#

I'm pretty new to pandas myself, maybe one of the seasoned people can explain it as I haven't used it myself

rigid phoenix
#

okay thanks for your help

velvet thorn
#

@neat basin do you have an easy explination for rolling window calculations?
@rigid phoenix basically

#

it's like an overlapping groupby

#

so like say you have 10 rows

#

and your window size is 3

#

and you want to calculate the rolling mean

#

you'll get a result with 8 rows

#

first row is the mean of original row 1, 2, 3, second row is the mean of original row 2, 3, 4, and so on

rare portal
#

Hi, I was wondering if I could get some tips on how to merge a dataframe based on id and date but with a twist.

I need the same dataframe to have rain data from the nearest station, and if there are dates with no rain data then I need to fill in those rows with data from a secondary station.

I've tried combined_first, mixing merge with isna() and a lot of other stackoverflow stuff but no luck so far.

Below I have some example code that will hopefully help.

  import pandas as pd

  level = pd.DataFrame(
    {
      "date": ["2020-10-01", "2020-10-02", "2020-10-03", "2020-10-04", "2020-10-05"],
      "id": ["asset1", "asset1", "asset1", "asset1", "asset1"],
      "level": [0.1, 0.25, 1.2, 1.3, 0.5]
    }
  )

  df2 = pd.DataFrame(
    {
      "date": ["2020-10-01", "2020-10-02", "2020-10-04"],
      "id": ["usgs1", "usgs1", "usgs1"],
      "rain": [0.1, 0.2, 0]
    }
  )

  df3 = pd.DataFrame(
    {
      "date": ["2020-10-01", "2020-10-03", "2020-10-05"],
      "id": ["nws1", "nws1", "nws1"],
      "rain": [0.5, 1.0, 0.89]
    }
  )

  rain = pd.concat([df2, df3])

  rel_table = pd.DataFrame(
    {
      "id": ["asset1"],
      "nearest_rg": ["usgs1"],
      "secondary_rg": ["nws1"]
    }
  )


  solution = pd.DataFrame(
    {
      "date": ["2020-10-01", "2020-10-02", "2020-10-03", "2020-10-04", "2020-10-05"],
      "id_level": ["asset1", "asset1", "asset1", "asset1", "asset1"],
      "level": [0.1, 0.25, 1.2, 1.3, 0.5],
      "nearest_rg": ["usgs1", "usgs1", "usgs1", "usgs1", "usgs1"],
      "secondary_rg": ["nws1", "nws1", "nws1", "nws1", "nws1"],
      "id_rg": ["usgs1", "usgs1", "nws1", "usgs1", "nws1"],
      "rain": [0.1, 0.2, 1.0, 0, 0.89]

    }
  )
  display(
      "Level",
      level,
      "Rain",
      rain,
      "Relation Table",
      rel_table,
      "Solution",
      solution
  )

neat basin
#

this would probably be better for a help channel

velvet thorn
#

@rare portal

#
intermediate = level.merge(rel, on='id').merge(rain.rename(columns={'id': 'nearest_rg'}), how='left', on=['date', 'nearest_rg']).merge(rain.rename(columns={'id': 'secondary_rg'}), how='left', on=['date', 'secondary_rg'])
intermediate.drop(columns=['rain_x', 'rain_y']).assign(rain=intermediate['rain_x'].combine_first(intermediate['rain_y']))
#

I renamed rel_table to rel

#

it's a bit messy since I just did it but you can probably clean it up a little

rare portal
#

@velvet thorn Thanks! Looks interesting. I'll try it out and report back soon.

edit: Reporting back and it works like a charm!

neat basin
#

@velvet thorn the real MVP today

#

So, was learning seaborn recently and something that is really bothering me is that they discontinued the .annotate() method and the integrated display for statistical significance values. Have they replaced this with anything so I can still have my plots display p-value or is this functionality just completely gone?

hollow sentinel
#

@neat basin @velvet thorn is always a MVP

neat basin
#

damn right

hollow sentinel
#

hahaha

hollow sentinel
#

UHHHHHHHHHHHH

#

i think that means it's a bad model

neat basin
#

yeah, I'm fairly certain linear regression isn't supposed to look like that

hollow sentinel
#

hahaha ๐Ÿ˜…

#

so we have an answer then you can't predict price from Release date Max resolution Low resolution Effective pixels Zoom wide (W) Zoom tele (T) Normal focus range Macro focus range Storage included Weight (inc. batteries) Dimensions

neat basin
#

maybe trying eliminating the outliers at the 5k and 8k Y test

hollow sentinel
#

hm

#

how do you do that

#

alright i'll look into it

neat basin
#

you can use .set_xlim()

#

so try setting the axis to .set_xlim([0,2000]) just to see if the model is working at all

hollow sentinel
#

what do you call .set_xlim() on

#

the model?

hollow sentinel
#

hmmmm

neat basin
#

so just do plt.set_xlim()

hollow sentinel
#

AttributeError: module 'matplotlib.pyplot' has no attribute 'set_xlim'

neat basin
#

hmmmm

hollow sentinel
#

maybe it's bc i didn't put an index

neat basin
#

this is what I get for playing phasmophobia and trying to do python at the same time

hollow sentinel
#

hahahaha

#

thanks for trying to help me tho

#

much appreciated

neat basin
#

no problem

#

I'm not entierly paying attention so probably someone who isn't just running off 2 brain cells can help

hollow sentinel
#

alright thank you

#

haha yeah i'm new to this stuff I just decided to practice linear regression

neat basin
#

it's pretty neat

hollow sentinel
#

right

neat basin
#

though seaborn is better for statistical plotting

hollow sentinel
#

definitely

#

i heard matplotlib gives you more control

#

I think seaborn is prettier

neat basin
#

it does, but there is only so much control someone needs

hollow sentinel
#

lmplots from seaborn are great for linear reg

neat basin
#

oh yeah

hollow sentinel
#

this is more like what you want to see from a linear regression

neat basin
#

oh yeah

hollow sentinel
#

yeah I could've used the max resolution to predict the min resolution

#

but i wanted to do price

#

It might not be a regression type problem

#

actually i take that back idk

unreal lily
#

Hey, I'm trying to do something for my school project. I had to choose between tkinter and matplotlib or pyqt and pyqtgraph and I choosed the second option. I managed to get the mysql connection working and make the graphic read data from there but now my problem is how to update the data since I'm going to use an ESP32 to import the data to the database. Basically I want to make a live updating graphic. Any idea how I can do that?

hollow sentinel
#

maybe that's helpful idk I've never tried that before

unreal lily
#

I think pyqtgraph is more suitable for my project

hollow sentinel
#

oh that's a graphing library i'm not familiar with

#

are you allowed to use seaborn?

#

i like seaborn lol

unreal lily
#

I can use whatever I want, but one guy here told me that those two might be better for me

#

Since I'm a beginner in python

hollow sentinel
#

oh ok

#

since you said you wanted to do PyQt5

unreal lily
#

I'm open to any tool I could use since the main goal of this project is to learn ๐Ÿ˜…

hollow sentinel
#

that might be good

unreal lily
#

I followed the second link, in fact, I used part of the code

hollow sentinel
#

nice

unreal lily
#

And the mysql part I did myself since I already worked with mysql before

neat basin
#

do all the plots look like you're still running DOS for pyqtgraph?

unreal lily
#

Works like a beauty... My problem is the "self update" part

#

do all the plots look like you're still running DOS for pyqtgraph?
@neat basin I didn't understand your question, sorry

neat basin
#

it was a joke because the plot looks like it's being rendered on a submarine radar

unreal lily
#

Oh... xD

#

Looks like it

hollow sentinel
#

lmao

unreal lily
#

Now it's all I can see ๐Ÿ˜‚

neat basin
#

if it works and no one else needs to see that it looks like someone had a stroke when opening MS paint then you're good

hollow sentinel
#

why i prefer seaborn lol

unreal lily
#

I'll take a look at seaborn

neat basin
#

seaborn is built for statistical plotting, what you using it for here?

unreal lily
#

Looks very complex for a begginer like me <-<

#

Yeah, I need a dynamic plot

neat basin
#

seaborn is actually stupid easy, but yeah you're doing fine for dynamic plotting

hollow sentinel
#

it's just sns.whatever graph you want()

#

that's why I like it

neat basin
#

still have no idea how to do annotations not that they got rid of the function

unreal lily
#

It looks like it's hard to find someone that worked with pyqtgraph already ._.

neat basin
#

they don't display p-values by default anymore and that irks me

hollow sentinel
#

F

neat basin
#

rip, maybe ask a helper?

unreal lily
#

F indeed

hollow sentinel
#

try going to a help channel

unreal lily
#

Hum...

#

I'm sure it's easy to do, but I never worked with it before

hollow sentinel
#

just go to python help: available and click a channel underneath it

unreal lily
#

Occupied

hollow sentinel
#

uhhhhhhhhh

#

help-cobalt is open

unreal lily
#

And how I use it? xD

hollow sentinel
#

just ask your question

neat basin
#

there is an entire channel showing you how to use it

hollow sentinel
#

yes

#

it's called how to get help

unreal lily
#

Thank you guys, I'll see what I can do

heavy light
#

Where is a good place to learn about neural networking?

#

I learn better if I learn how the components work and how to change them rather than being told to make a certain project and then using those skills to develop my own thing

wheat seal
#

hi i need some help with google colab and YOLO object detection

velvet thorn
#

I learn better if I learn how the components work and how to change them rather than being told to make a certain project and then using those skills to develop my own thing
@heavy light like the base theory?

#

I can reco a book or two

wheat seal
#

when i train the darknet model my google colab webpage completely freezes after a few iterations and i cant do anything and the page shows as not responding

velvet thorn
#

hi i need some help with google colab and YOLO object detection
@wheat seal fair warning though, that's a relatively advanced topic

#

not really sure how many people will be able to help you

#

I would but I need to go soon

heavy light
#

Essentially, yes, but I don't want so basic and esoteric that I don't understand what's going on

velvet thorn
#

Essentially, yes, but I don't want so basic and esoteric that I don't understand what's going on
@heavy light try Deep Learning With Python

#

by Francois Chollet

#

he made keras

heavy light
#

I'm not familiar with keras though I'll take your word for it

velvet thorn
#

I'm not familiar with keras though I'll take your word for it
@heavy light have you worked with any DL libraries?

#

when i train the darknet model my google colab webpage completely freezes after a few iterations and i cant do anything and the page shows as not responding
@wheat seal the notebook itself freezes?

#

like you stop getting output?

wheat seal
#

yes

velvet thorn
#

that's unusual

#

I'm not really sure how Google Colab works but

#

are you hitting memory limits or something?

heavy light
#

I have not, in fact this is one of my first ventures out of the basic libraries not counting one I had with libtcod while trying to make a roguelike

velvet thorn
#

I have not, in fact this is one of my first ventures out of the basic libraries not counting one I had with libtcod while trying to make a roguelike
@heavy light what's your mathematical background like

wheat seal
#

im not sure

velvet thorn
#

hm

wheat seal
#

is it because of the output that darknet gives?

velvet thorn
#

my best guess is that you're running out of memory or some other resource...?

wheat seal
#

because it gives a lot of output

velvet thorn
#

but I can't say with any certainty

wheat seal
#

ok

velvet thorn
#

not being a Google Colab user

wheat seal
#

thanks

velvet thorn
#

if that's the case

#

try a less verbose mode

#

and see if it works?

#

because it gives a lot of output
@wheat seal to test this

wheat seal
#

ok ill do that

heavy light
#

I'm in basic algebra but I can understand more complex stuff

#

Don't fuck around in school is my lesson

velvet thorn
#

I'm in basic algebra but I can understand more complex stuff
@heavy light before you get into deep learning

#

I think linear algebra, discrete mathematics, statistics, and calculus would all be good to know

heavy light
#

Hm okay

bitter harbor
#

to add to that, a lot of the stats involved depend on your model and i/o

#

to what you'll need to learn will vary from project to project

heavy light
#

Okay

#

Well, where can I find more advanced projects? All I've been seeing is the basic "make the computer pick a number between 0 and 1 that's closest to 2" or something

bitter harbor
#

imo, prediction with real world variables (pop., economy, etc, etc) and image/voice recgn (voice is definitely a bit more science oriented and a bit more advanced)

heavy light
#

Good because I don't really want image or voice recognition in whatever I choose to develop

bitter harbor
#

why not?

wheat seal
#

Well, where can I find more advanced projects? All I've been seeing is the basic "make the computer pick a number between 0 and 1 that's closest to 2" or something
@heavy light wait people actually made something like that

#

pick a number from 0 to 1 thing?

bitter harbor
#

most common example is having a list ei: [0, 1, 1, 1] = 1 and predicting the result

#

I haven't seen any other basic examples actually

heavy light
#

It's just not the kind of thing I wanna do, as I'm more into chatbots and essentially computers you can communicate with and teach via input rather than, like, verbal input

wheat seal
#

btw @velvet thorn i am clearing the output every few mins of the training session i started before i asked for help here and it works great now thanks

heavy light
#

I haven't seen any other basic examples actually
@bitter harbor that's the one ye

bitter harbor
#

the project idea that got me into this was a voice recgn program that would parse speech into code/text/commands

#

NPL's a bit more advanced definitely

wheat seal
#

avg loss at 420 GWseremePeepoHappy at iteration 113

bitter harbor
#

haha funny cause 4+2=6 and 6 sounds like sticks

wheat seal
#

lmao

bitter harbor
#

It's just not the kind of thing I wanna do, as I'm more into chatbots and essentially computers you can communicate with and teach via input rather than, like, verbal input
The difficult thing with chatbots is they have to be able to recognize context

heavy light
#

Yeah

#

That's what I'm interested in doing actually

#

I wanna see the struggles of designing a chat bot that actually works

bitter harbor
#

@wheat seal is colab giving you any sort of error

wheat seal
#

nop

#

now my loss is at 3 im so happy

bitter harbor
#

so it isn't timing out anymore?

wheat seal
#

no

#

if i clear the output every 5 mins then it works great

bitter harbor
#

I mean if it works it works ยฏ_(ใƒ„)_/ยฏ

#

Considering that and 'common' issues it seems like you're just running out of mem

wheat seal
#

ye prolly

#

btw whats a good avg loss for a object detector model?

bitter harbor
#

im not too sure, I'd assume it would vary depending on the model but I found this from a YOLOv2 article:

"0.451929 avg is the average loss error, which should be as low as possible. As a rule of thumb, once this reaches below 0.060730 avg, you can stop training.```
wheat seal
#

Ok thanks

woven jacinth
#

@bitter harbor

supple pond
#

@woven jacinth look at the 2nd line after import statement. You have accidentally put r infront of the file

velvet thorn
#

@woven jacinth ...it says the file's not found

#

@woven jacinth look at the 2nd line after import statement. You have accidentally put r infront of the file
@supple pond p sure that is correct

#

if you don't use raw strings for paths

#

you need to double backslash for directory separators

modern quail
velvet thorn
#

@lapis sequoia did you change the column names properly?

#

thatโ€™s the only possible cause of that error

lapis sequoia
#

Ehh, it's weird

#

I am getting a plot I plotted earlier when I try to plot the RPM, dunno what's going on

#

with this error
"Models must be owned by only a single document, ColumnDataSource(id='1209', ...) is already in a doc"

#

lmao, odd bug. After searching for a couple mins, the solution was "reload the page"

past pewter
#

Any idea what this caret shape is saying. He's talking about joint probabilities and marginalizing. I've never seen the notation before though.

#

I'm pretty sure the correct notation would be p(s1 = sunny, s0 = sunny) no caret needed unless it is pointing to something else. Like marginal probability is equal to the sum of the joint probabilities

velvet thorn
#

isnโ€™t that โ€œgivenโ€

#

but that should be

past pewter
#

I think given is |

velvet thorn
#

I donโ€™t have the stmbol

#

yeah, that

past pewter
#

And I don't think given wouldn't work here since p(x,y) = p(x|y)p(y), so p(s1) = sum(p(s1|s0)p(s0) for all s0 would be the correct one rather than what he has here

lapis sequoia
#

@velvet thorn also, the code you gave me did not work, sadly

mild topaz
#
Traceback (most recent call last):
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 468, in wrapper
    resp = resource(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\views.py", line 89, in view
    return self.dispatch_request(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 583, in dispatch_request
    resp = meth(*args, **kwargs)
  File "E:\demo3\recDoc1.py", line 266, in post
    if labels == predictions:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()```
#

plz ping me if u have any suggestions or solutions

trail kite
#

I have this data which is week day frequency. I want it to be plotted as simple as just weeks with number of times it's frequency (for example monday is 36446)

I also don't get this histogram, like what is 2.00 ?

chrome barn
#

you need to swap the axis around, and the 2.00 is your weeknames you have 2 weeknames friday and saterday that have almost 30k+ counts is it is plotted as frequency 2.00

#

same for tuesday and wednesday little over 34k+ each

tawdry lagoon
#

Hi guys, I have a table that looks like this (attached) I want to change the 'quantity' part of the multindex into a a sub-level on the columns how can i do this?

velvet thorn
#

@velvet thorn also, the code you gave me did not work, sadly
@lapis sequoia huh why not

#

I tried it on a small dataset and it did

#

what happened

tawdry lagoon
#

aha figured it out

#

flatten the df

#

df2.pivot_table(index=cols, columns='quantity')

#

ez

lime ermine
#

i need help converting strings to utf8 binary

#

all the methods i use leave out 0's and it doesnt work

indigo oyster
#

how can i make operations for every row in df like cloumn1 / column2?

chrome barn
#

there are several options easiest will be:

df['column3'] = df['column1'] / df['column2']
indigo oyster
#

oh thanks

past pewter
#

Turns out the symbol means join/and. So itโ€™s the same as p(x,y)

trail kite
#

you need to swap the axis around, and the 2.00 is your weeknames you have 2 weeknames friday and saterday that have almost 30k+ counts is it is plotted as frequency 2.00
@chrome barn

I'm halfway through understanding this, if you explain this a little bit more I will get it buddy

#

like why I have 2 friday?

#

that's exactly what I don't get.

I want it to be simple just 7 bars

chrome barn
#

@trail kite you don;t have 2 fridays in the graph, closely look at the graph and what it tells you

#

what is the horizontal axis in the graph referring too....

trail kite
#

ah okay, it should be reverse then, right? as you said

chrome barn
#

i don't know if it is fixed but yes currently the amounts you have are horizontal and need to go to the vertical axis

trail kite
#

you know what sucks? when I chain functions, when hover on them, I can't see the function docs, like parameters and such (vscode - jupyter)

#

which parameters do I give it to reverse it?

chrome barn
#

it looks like your trying to use the pandas plotting function I don't use it to create my graphs so look in the documentation of the pandas plotting

trail kite
#

ok man thanks for the help

#

what do you use?

chrome barn
#

usually the altair package

hollow sentinel
#

today I just learned about the UCI Machine Learning Repository

#

I think I like Kaggle more since I can't see what other people did with the data set

lapis sequoia
#

@velvet thorn i printed out the table before and after, the code went through but the table remained the same

unreal lily
#

@hollow sentinel I managed to make the pyqtgraph update if there is a change in the database, works like I wanted/needed it โค๏ธ

#

(Sorry for the ping)

hollow sentinel
#

@unreal lily haha no problem man happy for you

heady hatch
#

Hey guys question on dealing with large json file.

So how do you deal with a json file that has a weird format in it?

The json file I'm working with has it in the format of

{
  "random_key": [
    rest of the records
  ]
}
hollow sentinel
#

idk what to do with json other than json loads/json dumps

heady hatch
#

hahaha @hollow sentinel , love your enthusiasm.

After talking to a few people, it seems like there's no way around it unless I can get the file without that random key.

I'm currently trying to do a streaming method to see if I can somewhat get around it.

hollow sentinel
#

@heady hatch haha i know a little bit of JSON files from freshman year, but I've been sticking with good ol CSVs from Kaggle

heady hatch
#

Oh, I don't want to come off discouraging. But do be careful not to get too conditioned to Kaggle.

It's not the best representation of Data Science and work.

#

I love Kaggle, but it's a quick slice of what the work could be like.

#

Part of the issue is the data is often really clean.

#

But it is good place for understanding the different techniques that other people uses.

hollow sentinel
#

@heady hatch yes i'm learning Selenium and bs4 so I can scrape my own data like a big boi

heady hatch
#

Big boi.

#

I have faith in you, you're a big boi now.

#

btw learn Pyppeteer too.

hollow sentinel
#

thanks dude you were with me when I first started this stuff lol

#

Pyppetter?

heady hatch
hollow sentinel
#

got it

#

where do you normally scrape your stuff from @heady hatch

heady hatch
#

Not to give a cheeky answer, but depends on where I need it from.

hollow sentinel
#

amazing

heady hatch
#

:^)

hollow sentinel
#

built different

heady hatch
#

To give you an example. Recently I wanted to do a project on folklores and fables.

#

So I looked for websites that hosted that kind of text data.

hollow sentinel
#

very cool

#

you did NLP?

heady hatch
#

I try to mainly focus on NLP.

#

But I'm also working with computer vision due to work.

hollow sentinel
#

I want to learn it but I want to focus on the basics first

#

i wanna be comfortable using the basic machine learning algorithms

heady hatch
#

Get your basics in.

hollow sentinel
#

do so many linear regressions i can do it in my sleep

heady hatch
#

hahaha

#

Algorithm is one thing, but pay attention to what the model is doing.

#

Such as overfitting, underfitting, validation, data distribution.

hollow sentinel
#

yeah i keep having myself refer to old code i'm not sure if that's good or bad

heady hatch
#

It's a point of reference until you built your musckles.

hollow sentinel
#

yeah sometimes I open up Portilla's notes

#

they're helpful

heady hatch
#

Another thing to learn is don't worry so much about getting the right and wrong answer.

hollow sentinel
#

haha yes

heady hatch
#

DS/ML is super malleable.

#

And I think it's heavily empirically driven.

#

I mean there's a lot of research behind it too.

#

But you'll probably get most of your experience from playing with the data and models than reading about it.

hollow sentinel
#

haha yes my ML research professor paid for the udemy course i'm taking now

heady hatch
#

Oh!

hollow sentinel
#

he's very kind

#

but i'm not at that college anymore so it's very awkward

heady hatch
#

It's okay, he's a forever teacher and you're a forever student.

#

College is just formality. Pay it back to him with good ML work.

hollow sentinel
#

i'm always gonna be a student hahaha it's machine learning

heady hatch
#

:^)

hollow sentinel
#

@heady hatch do you recommend any books to learn machine learninig

#

although the books tend to put me to sleep

heady hatch
#

hahaha

#

If books put you to sleep, I'd stick with MOOC.

#

There's Google Crash Course.

hollow sentinel
#

once i finish the Portilla course I wanna do the Ng course

#

Ng is more focused on the theory behind machine learning

#

Portilla is more execution

heady hatch
#

I love Ng's course.

hollow sentinel
#

that's the seal of approval I need

#

and then I'll do the google course after bc i'm built different

#

and mostly bc i hate myself but that's for another day

real wigeon
#

what date time format is this (and timezone I guess more specifically)

#

2020-10-26 19:59:52

slender nymph
#

Can I use R with python?

heady hatch
#

I think that's the "yyyy-MM-dd HH:mm:ss" format. Can't tell the timezone without more information.

hollow sentinel
real wigeon
#

i think it's UTC

#

the format

hollow sentinel
#

very cool

real wigeon
#

just trying to figure out how to handle these date time values with python and mysql

#

im querying a mysql db and trying to change the way the user inputs time data

hollow sentinel
#

very nice

real wigeon
#

i already have it set to the default syntax that mysql uses, but obviously i don't want the user to have to follow mysql, i want python to convert it so that mysql can read it

hollow sentinel
#

yeah uh maybe @heady hatch knows

#

i'm rusty on SQL

heady hatch
#

To clarify you want python to convert it to insert into MySQL?

real wigeon
#

the other way, im pulling data from mysql

#

the user input currently is set to match mysql syntax. but that's not really intuitive for the user

slender nymph
#

Ty @hollow sentinel

real wigeon
#

since it is yyyy-mm-dd hh:mm:ss

#

im trying to get it to mm-dd-yyyy hh:mm (user time zone)

#

all users will be est

heady hatch
#

To convert it into EST, that will require manual computation. Or maybe there's a library.

real wigeon
#

theres def a lib somewhere

heady hatch
#

But datetime should be able to do what you're looking for.

#

Along with converting it to EST.

real wigeon
#

thx

marble prairie
#

I'm optimizing a TensorFlow project I found online and so far I've improved it's speed by over 600% but I'm not sure where to stop. Is 0.5 seconds to identify a couple letters in a small picture considered slow for CNNS?

hollow sentinel
#

haha tensorflow idek what that is

#

it's deep learning & neural nets right

marble prairie
#

Yup

hollow sentinel
#

yeah i'm not up to that part on Udemy yet

#

haha

#

it's like 5 hours long

marble prairie
#

It was written by someone who didn't even know list comprehension ๐Ÿ’€ but somehow made an impressive cnn

hollow sentinel
#

idek what list comprehension is

#

lmao

#

cough NOOB

marble prairie
#

basically one line optimized for loops

hollow sentinel
#

yeah i just googled it

heady hatch
#

Hard to say if it's good or not.

Because the tf project could be badly written and you've just optimized it. In regards to 0.5 seconds to identify couple letters, I don't know if there's a benchmark for recognition latency, but it would be useful if there's some kind of baseline for comparison.

Maybe even to the project's constraint.

Because let's say you need the identification to be less than 0.1 seconds. Then 0.5 seconds relatively, to human or even the original project is pretty quick, but then to the results you're trying to achieve, it's still not feasible.

#

On the other hand, I will give you kudos for optimizing projects. Great work.

marble prairie
#

Hm I see what you're saying
With the end goal of using it with async the 0.1 would of course be perfect but in the short time I posted that question I printed some durations of different functions, and alas it's the run method which takes the complete bulk of the duration

gray phoenix
#

Hi I was wondering if someone could help me out with the regex library.

new_seo = pd.read_csv(path + r".*us-" + str(date) + r".*")
the string should be grabbing something like the below:

organic.Positions-us-20201026-2020-10-27T18_26_04Z.csv

When I run the code, I keep getting the file is not found

heady hatch
#

I think it might be because it's reading path + .*....

#

in python, r'string' is raw string.

#

It's often used in regex to not have to escape characters.

#

I think you might be thinking of glob.

#

You can make it into a Path object and then use Path.glob(f'*us-{str(date)}*'), maybe something like that.

gray phoenix
#

ok thanks i'll read into the glob library

hollow sentinel
#

learning neural networks now

#

jesus how long does tensorflow take to install

drifting hemlock
#

tell me how it goes, I want to get into tensorflow soon I hope

hollow sentinel
#

@drifting hemlock no problem

#

I'm just going for a general understanding of machine learning by doing the udemy course

#

I'll come back and do personal projects later

drifting hemlock
hollow sentinel
#

if you want a general overview of machine learning Python for Data Science and Machine Learning Bootcamp by Jose Portilla is pretty good

#

it's on Udemy

#

also Andrew Ng on coursera I've heard a lot of good things about

#

there's also mini Kaggle courses you can take

bitter harbor
hollow sentinel
#

maybe i'll look at it after I finish the Ng course

vague bear
austere swift
#

so how come when i do this, a shows up as a tensor with a value of 0.85, but when i do a.item() it returns 0.8500000238418579

#

there should be no like hidden super tiny decimals because the function error() just returns the differences between each value in x and y

#

i'm just testing it in the repl

#

and idk why its not exactly 0.85

glacial silo
#

whats data science?

austere swift
lapis sequoia
#

statistics machine learning basically just analyzing data or something i dont know i aint good at it

#

I need some help on building ML pipeline

austere swift
#

I figured it out it was just the precision, when i used double precision it worked

heady hatch
#

Interesting.

I was trying to follow along and trying to replicate it in tf. I wonder if it's just a thing in pt.

#

Or maybe I'm doing something wrong.

#

@lapis sequoia What do you need help with?

austere swift
#

me just now realizing how i didnt need the function at all and i could just do sum(abs(x-y))

heady hatch
#

We're all learning new things.

austere swift
#

lol its always the basics that i just overlook

lapis sequoia
#

@heady hatch
I research the basic steps to build ML pipeline yet ain't quite sure, so let me break it down to multiple questions if possible. In Data preparation, what is the best practice for it? Do people usually build this step as a microservice then send the data to next step? Are there any frameworks / libraries available already?

heady hatch
#

Depending on the scale of the project and how it's broken down. Some people even have different definitions for ML pipelines.

#

So one question I have is what is yours.

#

Or if you're unsure here is how I structure my projects.

I usually set up two pipelines, one for data and one for ml.

data pipelines deal with extraction, transformation, and load.

then from the ML pipeline, it takes the data and gets it ready to be modeled. for text and images, it would be preprocessed, shuffled, split, batched etc.

Then sent it off for training.

lapis sequoia
#

๐Ÿ‘€

heady hatch
#

Similarly with tabular data.

lapis sequoia
#

How would unsupervised ml pipeline be different to supervised ?

heady hatch
#

Depending on the unsupervised problem.

lapis sequoia
#

Clustering

heady hatch
#

One thing that comes to mind is algorithm difference.

#

which then makes the data prep different.

#

Oh I guess something that comes to mind is testing and debugging.

#

I'm not quite sure how you'll test unsupervised machine learning pipelines.

lapis sequoia
#

For me there are 4 major steps in ML pipeline.
1.) Data Preparation which would define Train/Test and Featureization
2.) Build and Train the model including hyper-parameter turning, testing and validation.
3.) Deployment
4.) Monitoring

Am I thinking too narrow ?

#

I was thinking using mlflow to abstract 2 and 3 if possible

heady hatch
#

That sounds like the basics of it, but there are a lot to think about too.

#

Such as experiment tracking.

lapis sequoia
#

Agree. Do you break down part of the pipeline as its own microservice or just integrate it in large code base

heady hatch
#

I've never used mlflow but maybe for things like tracking and hyperparam search. But it also depends if I can build my own.

#

like for data preparation, if the data is couple gig, it would be a bit of hassle to send it to a microservice and wait for it to return.

Unless you have some other definition for microservice.

lapis sequoia
#

Make sense

#

Right now I am still trying to grasp the idea of building a basic ML pipeline and integrate it to my personal projects, so I am really new to this area

heady hatch
#

Lots of exploration, what I'm saying is probably not best practice.

lapis sequoia
#

i see

heady hatch
#

I'm a big fan of building stuff on my own before using libraries.

lapis sequoia
#

btw, are they all implemented in Python? I know Tensorflow has support in Java, but do you see people using it on Java?

heady hatch
#

Within my circle, it's all python for ML.

#

I think it's largely because of the library support.

lapis sequoia
#

make sense

#

How did you serve the model?

heady hatch
#

so usually it would be packaged up to have it as a backend.

#

If the model is too big, then we'll use a distilled version.

lapis sequoia
#

Will building your model in keras decrease the performance of the model? I know you can use purely tensorflow for it

heady hatch
#

I don't see any decrease in performance.

#

I usually use tf + keras, or in v2, just tf since it swallowed keras.

lapis sequoia
#

right

heady hatch
#

I usually use keras because of the abstraction.

#

Because I think calculating stuff myself will introduce a lot of human errors.

lapis sequoia
#

Yea that's why I prefer Keras too it makes my life lazy

heady hatch
#

hahaha

lapis sequoia
#

Do you need to use Panda for featurization

heady hatch
#

For text and images, I use pandas for visualization and exploration.

#

Maybe for tabular data.

#

But sometimes it gets too big, and I have to work with Dask.

#

To answer your question, no.

lapis sequoia
#

I see.
Probably I have enough idea to continue researching on my own on building the pipeline.
Thanks for the insight.

heady hatch
#

Yea definitely.

#

I was just going to touch up on when I build classical ml pipelines

#

I use sklearn or library pipelines.

lapis sequoia
#

What are the library pipelines you would recommend?

heady hatch
#

oh sorry I meant like how SpaCy has their own pipeline objects.

#

same with sklearn.

#

Instead of writing one with Pandas.

lapis sequoia
#

NGL SpaCy has strong features for data preparation

heady hatch
#

Maybe there are ones specifically just for pipelining.

the reason why I use the ML library pipeline is because for robustness.

Let's say
You want to just fit it on training data, and transform it on train and test.

when you do it yourself, you might fit it on both training and testing.

which is why pipelines from respective libraries are useful, since they take care of it for you.

lapis sequoia
#

right, that's why I found out mlflow which can make your life lazy on training, testing and deploying your model

heady hatch
#

Does MLFlow allow you to add parts from other services?

#

Such as for hyperparameter searches.

#

I should try using MLflow. hahaha

#

It would clear things up.

lapis sequoia
#

but yeah, SpaCy or NLTK is used for data preparation right? Because it has all the features which abstract transforming texts to vectors

heady hatch
#

They can, I believe yes.

Previously I did it by hand, Iโ€™ll look into how I can use SpaCy or NLTK.

Part of the reason is also because models require the data to be in certain format but SpaCy will make it only in SpaCy format.

#

Like I donโ€™t know if they produce attention masks and segment masks.

vivid wren
#

what would be the best way with numpy to take a 7 value rolling average of a list to create another list of the same length?
This is the code I'm currently using:

def smooth_increases(increases_list):
    vals = []
    new_vals = []
    right = 0
    while True: 
        vals.append(increases_list[right])
        if len(vals) > 8: vals.pop(0)
        new_vals.append(int(mean(vals)))
        right += 1
        if right == len(increases_list):
            return new_vals
#

increases_list is just a list of ints

idle root
#

does anyone know if this book is worth reading ? "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow"

i finished andrew ng's course in ML and i want to keep learning.
what do you guys think ?

lone osprey
#

Yeah, I am reading it

#

It is one of top books I think

lapis sequoia
#

It is a really good introductory book with plenty of references to other valuable articles.

lone osprey
#

Try mit deep learning course too

#

@idle root

idle root
#

@lone osprey which one is it ? i think there are few courses

#

this one ? MIT 6.S191

lone osprey
#

I didn't see it properly

#

I will check and see

hollow sentinel
#

@idle root you can also try Harvard's Machine Learning course on edx. IBM offers a Data Science course on Coursera too. If you want to strengthen your fundamentals you can try Python for Data Science and Machine Learning by Jose Portilla. The Google Crash Course on machine learning looks interesting too

#

wait guys

#

is coursera free

#

or do you have to finish it in a week and they charge money

#

ohhhh you need to pay for a certificate that's lame

#

if you don't understand deep learning clap your hands

hollow sentinel
#

can anyone explain why the MinMaxScaler is needed? Portilla just used it in his lecture and I don't get why

keen bear
#

anyone good with machine learning?

hollow sentinel
#

just ask your question lol

keen bear
#

really?

#

people just answer?

bitter harbor
#

I can't remember exactly why you need to standardize, but minmaxscaler scales each feature separately

#

yes if they want to

keen bear
#

ok.
So, from my understanding, a neural network receives inputs, and then calculates the dot product of the inputs and the weights. Then it adds the result by the biases. Then it repeats at the next set of weights and biases until it reaches the end, which is the output. Is that correct or am I missing anything?

bitter harbor
#

it repeats the weights/biases for each layer in the network not just for the next 'set'

keen bear
#

so it multiplies and adds all the weights and biases in the network?

bitter harbor
#

not all together no

keen bear
#

so, it multiplies the weights by the input, then adds the biases, then multiplies the result by the next set of weights, and then adds the next set of biases, until it reaches the output?

hollow sentinel
#

lol i'm learning neural networks rn and i don't get it

keen bear
#

same

bitter harbor
#

it takes the dot product of the input and the weights, that product becomes the biases

#

then it repeats with the product of the biases * the next set of weights for each layer

#

there's also an activation function involved but my knowledge is kinda lacking I haven't done it in a while

keen bear
#

ik, the activation function really gets me stuck

bitter harbor
#

I'm pretty sure the biases get pushed through the activation function to keep it within [0,1]/[-1,1] (depending on the function)

keen bear
#

then it repeats with the product of the biases * the next set of weights for each layer
@bitter harbor this part is also kinda confusing

#

is there any links you recommend?

bitter harbor
#

I'd suggest watching 3b1b's series' on networks/calculus/linear alg

#

to get the basics

hollow sentinel
#

Python for Data Science and Machine Learning is good too

#

it has a 5 hour section on neural networks

#

idk how new you are to this material tho

bitter harbor
#

I've only watched 3b1b, the rest of what I learnt was through research so I can't confirm

hollow sentinel
#

lol I might be able to do research in December

#

so I'm studying up now

#

taking as many machine learning courses as possible

bitter harbor
#

I've found it's sometimes easier to take a super complicated model and pick it apart until you understand each component individually

keen bear
#

ok

hollow sentinel
#

i don't know which one i like more neural nets or NLP

bitter harbor
#

nlp doesn't seem fun

hollow sentinel
#

how come

#

I feel like I never understood NLP properly since i haven't done an actual project in it yet

bitter harbor
#

I mean nvm syntactic analysis which is it's own challenge, lexical semantics (meaning of individual words in context) seems like a absolute pain

keen bear
#

your words seem like zarglar to me

hollow sentinel
#

what's zarglar

keen bear
#

a fictional alien language

hollow sentinel
#

star trek?

keen bear
#

no, came from a video from jabrils

hollow sentinel
#

oh lmao

#

NLP is hard but neural nets are even harder imo

#

but i just started looking at it

#

so ofc it's gonna be hard

bitter harbor
#

npl is a subfield of nn's

hollow sentinel
#

oh

#

I WILL NEVER ESCAPE NN

#

lmao

bitter harbor
#

tl as an example of lexical semantics, detecting which homonym is used in speech depending on context

keen bear
#

ok

#

\

hollow sentinel
#

cool

#

can we all agree that NN is hard lmao

keen bear
#

neural networks?

bitter harbor
#

and then you get into tone and intention

hollow sentinel
#

yes

keen bear
#

yes

bitter harbor
#

they aren't so bad until you get into super specific models

hollow sentinel
#

i don't even want to look at the math behind it

#

my brain is gonna die

bitter harbor
#

you should probably learn the math before anything else lol

hollow sentinel
#

i tried

bitter harbor
#

it's apparently a lot harder to understand what's going on

hollow sentinel
#

without the math

#

yeah i know

#

I'll try doing a project on it first

bitter harbor
#

lmao that's not going to help

hollow sentinel
#

ugh

#

white flag waved

#

sike

keen bear
#

what level math is neural networks?

bitter harbor
#

high

keen bear
#

like what age?

#

when will I be actually able to learn this in school?

hollow sentinel
#

cOlLeGe

#

or late high school

bitter harbor
#

well I'm a first year uni student and im pretty sure I can't take linear alg until my 3'd year

#

well it's a 300 course

#

I'm not sure about when multivariable optimization is taught

hollow sentinel
#

i never took lin alg lmao i switched to business

#

whoops

keen bear
#

you guys are way older than me

bitter harbor
#

hey I'm not old :{

hollow sentinel
#

:((((

bitter harbor
#

3b1b literally taught me lin alg tho

keen bear
#

i didn't mean that

#

old

hollow sentinel
#

i'll look at his videos

bitter harbor
#

legitimately take notes

hollow sentinel
#

@keen bear we're joking

keen bear
#

k

hollow sentinel
#

yes

keen bear
#

i hope I can learn it though

hollow sentinel
#

dw everyone is here to help

#

I started machine learning maybe 3 weeks ago and the whole channel helped me

#

guys what is feature engineering explained simply is it the same as cleaning data?

#

Portilla keeps saying it and idk

bitter harbor
#

it's pulling features from data with domain knowledge, in other words, knowing that variables present themselves in the env you're working with

#

so manual optimization?

hollow sentinel
#

idek

bitter harbor
#

same I just looked at the wiki page

lapis sequoia
#

Is it still recommended using the 1st version of tensorflow using the following commandsimport tensorflow.compat.v1 as tf tf.disable_v2_behavior()?

bitter harbor
#

why would you do that?

lapis sequoia
#

I've been following along with ML with Scikit and Tensorflow

hollow sentinel
#

nice

lapis sequoia
#

and to make the code work I need to convert to the previous version.

bitter harbor
#

try to find a different course then
there's plenty with tf2

#

kinda the same concept as learning py 2.x to learn python

lapis sequoia
#

haha, kind of ๐Ÿ˜„

bitter harbor
#

idk if it's to the same extreme tho

lapis sequoia
#

it teaches you the fundamentals tho'.. there is a page with the mappings from the old version to the new one. It's quite annoying to go back and forth.

bitter harbor
#

ya again i'd suggest finding a different one

#

if it's just fundamentals it'd probably be a good idea to learn the up to date fundamentals

lapis sequoia
#

yeap, I will as soon as I finish this one >.> 100 more pages to go

hollow sentinel
#

there's a good tensorflow course on udemy

#

there might be more on coursera or edx idk i haven't looked

lapis sequoia
#

is it worth it, I'd rather go for the hands-on-ml2

bitter harbor
#

I've heard good things about it

lapis sequoia
#

I ll look into it, thanks ๐Ÿ˜„

hollow sentinel
#

no problem

bitter harbor
#

casual 4.6 rating

hollow sentinel
#

thatโ€™s pretty good

#

I might do that after the Ng course

#

or I could do the google crash course on machine learning

heady hatch
#

I'm curious. I've never taken these courses other than Ng. What do they teach?

hollow sentinel
#

which course?

heady hatch
#

Just in general. Like let's say the tf ones.

hollow sentinel
#

pretty sure they teach how to implement tensorflow like I saw part of the course and they were teaching how to code a stock trading bot

heady hatch
#

Do they teach it from scratch as in instead of doing a dense layer, they show you by dot product this and add this, you'll get this other thing?

hollow sentinel
#

idk about that since I havenโ€™t taken it yet/donโ€™t understand NNs

#

Iโ€™ve just heard good things

heady hatch
#

hahaha sorry to put you on the spot.

hollow sentinel
#

nah no problem

heady hatch
#

Ahh, please update me. I'm curious.

hollow sentinel
#

I just started learning NN today

#

noooooob

heady hatch
#

I wonder what knowledge I'm lacking since most of my knowledge comes from reading notebooks and playing with code.

bitter harbor
#

from the first minute or so of the course it seems like it goes through basic + advanced stuff

heady hatch
#

If you don't mind me asking, what's basic and what's advanced?

bitter harbor
#

well like the basics so the lin alg, calc, stats stuff like that

#

advanced would be specifics of models, where they should be used, etc, etc

heady hatch
#

Ahh okay okay, ty ty.

hollow sentinel
#

I should take an algorithm and data structure course too

#

Bc I never took that class I only took my intro CS class

bitter harbor
#

Ya that's what Im doing next semester

#

in java tho

bitter harbor
#

@lapis sequoia that course actually came with a reason not to code along and I feel like the last point is relevant

lapis sequoia
#

has anyone used optuna here?

alpine creek
#

Hello all!

#

New to the server.

#

Has anyone been working with "dask" here?

lapis sequoia
#

what course

#

?

keen bear
#

I've only watched 3b1b, the rest of what I learnt was through research so I can't confirm
@bitter harbor 3b1b really helped, I just watched the series. Thanks a lot!

tidal path
#

I wonna submit this exam and be done once and for all with exams for at least next 10 days

#

First i made an java chat app (sockets + web + android + desktop) then they told me to learn python for 10 days in order to do finals in probability and statistics and now i am burnt down, meanwhile i need to finish my paid projects so i can actually pay the uni...life is hard these days

sudden delta
#

that means data in the csv file has different types in the same column, either clean up the data or specify the dtype in read_csv()

#

also it's just a warning so you might not need to do anything if you aren't using any data from that column

hollow sentinel
#

when you define X and y when you're doing neural nets why do you have to put .values

#

I don't remember them doing that for lin reg, log reg, k nearest neighbors, k means clustering, etc.

#

does neural nets just need a .values?

#

and why is Portilla using a MinMaxScaler

#

he's just built different ig

hollow sentinel
#

I seriously don't get it

#

is it because neural nets only accept numpy arrays?

heady hatch
#

@hollow sentinel
Regarding scalers, it's really context dependent.

So understanding it like a tree.
-> Do you need a scaler?
-> Standard or MinMax or some other scaler like log?
-> Standard if you need it on a z scale, minmax if you need it on a 0 to 1 scale.

#

Regarding the .values and nn, could you give more information on what you're referring to?

hollow sentinel
#

dang I can't find where he used .values in the video

#

it's not in his coding examples

#

F

#

what is an epoch lol

heady hatch
#

An epoch is a pass, forward + backward propagation.

hollow sentinel
#

over the training data

heady hatch
#

Over the network.

hollow sentinel
#

oh portilla says over the training set

#

but i'll take your word for it

heady hatch
#

hahaha I think Portilla and I are probably referring to the same thing, looking at it differently.

#

You can configure the dataset such that it might never encounter certain values of the training data.

hollow sentinel
#

can you explain overfitting simply

heady hatch
#

Yea to understand overfitting and underfitting, you'll need to understand bias + variance.

but I'll use a more relatable example.

bitter harbor
#

this is why fundamentals are important

heady hatch
#

Imagine yourself to an algorithm.

You've come to learned from the data that cookies taste like chicken.

#

Thus in the future, you'll predict cookies tasting like chicken.

#

However you, as a human, knows this isn't really true.

#

You have overfitted on the data where cookies taste like chicken.

hollow sentinel
#

ok cool

#

sorry @bitter harbor i fall asleep doing the math behind machine learning

#

I've been doing it tho

heady hatch
#

I think that's one of the articles that's been linked quite often regarding bias and variance and how it relates to overfitting and underfitting.

hollow sentinel
#

alright I'll check it out

heady hatch
#

In context of ML, overfitting is when the model gets too complex while under is not enough complexity.

hollow sentinel
#

hm

#

I think the Ng course does the fundamentals more

heady hatch
#

That's how you've been describing Portilla and Ng right?

hollow sentinel
#

yep

heady hatch
#

Portilla is more applicable while Ng is more theoretical.

hollow sentinel
#

I probably should've done Ng first

#

but that's ok

heady hatch
#

Eh.

#

At least now you have applicable context when learning the theory.

#

Because unless you have a good grasp of the math foundation, I think learning theory is kinda rough.

#

But if you can supplement it with applicable experience, it might not be as bad.

hollow sentinel
#

it's pinned here that columbia's ML course is better than Ng's

heady hatch
#

Because now when Ng says this, you can think about how in your previous projects you might have encounter such a thing.

hollow sentinel
#

yep

heady hatch
#

I wouldn't get too sticky about the course quality and etc.

As long as they're able to teach it and you can follow along while doing your own research, I think it's a mission accomplished.

#

Because you want to spend time on the ML stuff, not on course ratings.

hollow sentinel
#

i just want to understand ML

heady hatch
#

My advice is play and experiment.

hollow sentinel
#

yeah I'll probably end up doing multiple courses

heady hatch
#

All the theory and books you read is just a piece of the puzzle.

hollow sentinel
#

machine learning books scare me

#

I like courses more

#

layer my neural nets like lasagna

#

what

#

nothing

balmy junco
#

hi guys

#

having some really really weird issues with dataframes and series

hollow sentinel
#

what's the question

balmy junco
#

so i am simply trying to change the value of a row and column. prety easy, right?

#

but heres whats happening

#

it just doesnt change

#

when i created a variable to copy that value over

#

and tried to change it

#

it successfully changed

#

so a little confused

#

if my issue is unclear, i can show a little code

hollow sentinel
#

uhhhhh

#

sure you can send code

balmy junco
#
symbolsDf[symbolsDf[SYMBOL]==symbol][changeLbls[dateIndx][percentIndx]] 
= (float(numHigherChanges/numDays))
#

that's what I am trying to do

#

I also tried this

series = symbolsDf[symbolsDf[SYMBOL]==symbol][changeLbls[dateIndx][percentIndx]] 
series[0] = float(numHigherChanges/numDays)
#

The latter worked for the series, but not for the symbolsDf when I tried

symbolsDf[symbolsDf[SYMBOL]==symbol][changeLbls[dateIndx][percentIndx]][0]
= (float(numHigherChanges/numDays))
#

I also read about some method called set_values, but apparently the interpreter says that series doesn't have a method like that

#

I am aware that the values are tuples

#

And I might want to consider replacing the entire tuple

#

It's just a weird concept to me

heady hatch
#

Some clarification, what's the latter you're referring to?

#

The line with the series?

balmy junco
#

the latter is the second block of code

#

the second approach to changing the values which was done in both the second and third blocks of code

heady hatch
#

What does your data look like?

balmy junco
#

it is a huge dataset lol

#

but hold on

velvet thorn
#

you need

#

to use .loc

balmy junco
#

ahhh okay

#

i tried that before and it wasnt workin but i will give it another go

velvet thorn
#

no

#

I mean, you need to change how you index

balmy junco
#
= (float(numHigherChanges/numDays))```
velvet thorn
#

you have chained indexing

#

not like that

balmy junco
#

Ohhhhhh

#

You mean like

velvet thorn
#

sec

balmy junco
#

[,]

#

Like that type of indexing?

#

or more like

velvet thorn
#

okay

#

so

balmy junco
#
= (float(numHigherChanges/numDays))```
velvet thorn
#

no

#

okay I assume symbolsDf[SYMBOL]==symbol is a row mask

balmy junco
#

okay I assume symbolsDf[SYMBOL]==symbol is a row mask
@velvet thorn yes

velvet thorn
#

what is changeLbls[dateIndx][percentIndx]?

balmy junco
#

it's a list of labels

#

so i have int indexes to get the correct index

#

so it just comes out to be a specific column

#

ill simplify

#

symbolsDf.loc[symbolsDf[SYMBOL]==symbol][columnName]

velvet thorn
#

then that should be df.loc[row_indexer, column_indexer]

#

not sure if I mentioned this before but I would suggest you use snake case names for your variables

balmy junco
#

ahh

velvet thorn
#

and strongly suggest you have spaces around your operators

balmy junco
#

perfect

#

that seemed to do it

#

thank you

fierce swallow
#

Hi

hollow sentinel
#

hello

heady hatch
#

Hey guys, any quick way to get groups with their values in Pandas?

ie.

dataframe.groupby -> return a dictionary with key and values of their groups and respective values.

#

I was able to find

df.groupby.groups

But it ends up being in index form instead of their actual values.

hollow sentinel
#

haha as if I understand Pandas well enough to help ๐Ÿ˜ฆ

#

that might be helpful

#

idk

#
sns.boxplot(x="loan_status", y="loan_amnt",data=df)
#

anyone know why this is so slow

#

sike it's completely fine

velvet thorn
#

Hey guys, any quick way to get groups with their values in Pandas?

ie.

dataframe.groupby -> return a dictionary with key and values of their groups and respective values.
@heady hatch why do you want that?

heady hatch
#

@velvet thorn I need to compare across groups to count intersections. Any suggestions on how to go about it?