#data-science-and-ml

1 messages ยท Page 267 of 1

velvet thorn
#

but I would do everything differently

smoky fractal
#

It's really just a proof of concept program at this point

#

what would you do differently?

velvet thorn
#

btw, not directly functionality related, but

#

snake case is preferred for Python

#

@smoky fractal maybe something like this:

diff = symbol_data['close'].tail(bars)

down_mean, _, up_mean = diff.groupby(np.sign(diff)).mean().sort_index()
rsi = 100 - 100 / (1 + up_mean / down_mean)
drifting hemlock
#

Has anyone used a Data Discovery/Catalog service like Amundsen?

smoky fractal
#

@velvet thorn I'm still a bit confused about the ups/downs, is that a continuous value at each point or is it just one point that gets returned?

#

because initially I had them as variables but now they are columns in the dataframe

velvet thorn
#

I think what you mean to ask is - is that a Series or a single value?

#

and the answer is the former

#

wait, go back

#

you mean in yours?

smoky fractal
#

Yes that's exactly what I was asking, thanks. And those three lines you posted do the same thing as my code? Just trying to

velvet thorn
#

or mine

smoky fractal
#

in my original code, I had them as single values. Now I have them as series.

velvet thorn
#

oh wait actually I need to clarify your algorithm

#

RS is supposed to have the value of the mean increase divided by the value of the mean decrease, right

smoky fractal
#

yeah

velvet thorn
#

okay, then I need to change my code

#

sec

smoky fractal
velvet thorn
#

okay, I edited the code

#

in my original code, I had them as single values. Now I have them as series.
@smoky fractal at that point

#

they should be single values

#

because they are means

smoky fractal
#

so what I think I am doing now is taking the RSI at evey point and putting it in that column. At the end of my code now I am returning symbol_data['RSI'][-1] to get the most recent value

#

Because while yes it is a mean it is still helpful to see it plotted over time as a series

velvet thorn
#

so what I think I am doing now is taking the RSI at evey point and putting it in that column. At the end of my code now I am returning symbol_data['RSI'][-1] to get the most recent value
@smoky fractal RSI given the last X points?

smoky fractal
#

yes the function takes in the bars variable as the amount of points to consider

velvet thorn
#

yeah, but that returns a single value, right

#

the function

smoky fractal
#

Yes the function returns a single value. To give some context, this utility will be a filter to decide whether to buy a stock or not. If RSI >=X, do/don't buy. So I always want the most recent value

#

But in other contexts I might want to plot the RSI over time to spot trends

velvet thorn
#

@smoky fractal then look into window functions

candid merlin
#

I was brain stroming for ideas to write a python module, But I was struck.

#

what kind of python module do you guys think should have been already available?

hasty grail
#

Is this related to data science?

serene scaffold
#

My advisor asked me to help one of her students install tensorflow on windows 10; he was getting errors related to Windows not being able to find C++ files

#

For some reason I'm able to install tensorflow but idk what I installed that enables me to do that.

bitter harbor
#

is it the cpp build tools?

serene scaffold
#

could be. I asked him to install visual studio and that didn't work.

bitter harbor
#

it's separate from vs
you still have to download the build tools

serene scaffold
#

I see

bitter harbor
serene scaffold
#

I'm not referring to the IDE so I may be using the wrong term

#

I'm not sure why they throw around the word "visual" so much

bitter harbor
#

wasn't visual cpp ms's version of the language or smthing

serene scaffold
#

C# wasn't enough for them?

#

wow

bitter harbor
#

or maybe it was c# im not sure

#

it was one of the c's

undone flare
#

So I want to use data from MySQL what library is good for that?

#
from sqlalchemy import create_engine
engine = create_engine("mysql:///:memory:")
``` Would something like this work?
bitter harbor
undone flare
#

related to pandas

#

I want to read that using pandas

bitter harbor
#
sqlEngine       = create_engine('mysql+pymysql://*', pool_recycle=3600)
dbConnection    = sqlEngine.connect()
frame           = pd.read_sql("select * from whatever", dbConnection);

pd.set_option('display.expand_frame_repr', False)

dbConnection.close()```
ยฑ whatever options you need
winged stratus
#

Hey guys, does anyone have a small classification dataset? I want to build a neural network just using numpy and the MNIST one seems a bit much for me

bitter harbor
winged stratus
#

yeah, i searched on kaggle but not really sure which one i should practice on

bitter harbor
#

well what are you using it for

winged stratus
#

some keywords to search for will be helpful

#

well what are you using it for
@bitter harbor i want to build a small neural network

bitter harbor
#

for..?

winged stratus
#

classification, i just learned them so i want to practice

bitter harbor
#

classification of what tho?

winged stratus
#

any classification dataset

undone flare
#
sqlEngine       = create_engine('mysql+pymysql://*', pool_recycle=3600)
dbConnection    = sqlEngine.connect()
frame           = pd.read_sql("select * from whatever", dbConnection);

pd.set_option('display.expand_frame_repr', False)

dbConnection.close()```
ยฑ whatever options you need

@bitter harbor umm what is pymysql?

#

and pool_recycle?

bitter harbor
#

idk that was the first thing that came up with the google search
I haven't worked with sql like at all

undone flare
#

ok

winged stratus
#

should i just use a normal breast cancer dataset?

undone flare
#

@winged stratus you want a data set?

winged stratus
#

i practiced logistic regression on that

#

@winged stratus you want a data set?
@undone flare yeah

bitter harbor
#

why not just use the mnist set?

undone flare
#

for what type of operations you wanna practice on that

winged stratus
#

@undone flare a neural netowrk

#

i really dont know how to build them, so im prcticng

bitter harbor
#

we understand that, it's just that 'classification' is a pretty broad term

#

why not just use the mnist set?

winged stratus
#

mnist has something like 784 inputs per training example right?

#

it may take quite a while to train

#

im just looking for something small and simple

#

it's ok ill find one

#

thanks for your time guys

undone flare
#

I haven't worked with neural networks so idk what type of dataset is good for that

bitter harbor
#

that's not a lot of inputs tbh
the one you sent arnav has 110811

quiet pine
#

hi im new to python/numpy and i was trying to understand how to represent different probability functions via numpy

#

rn im confused as to how i can alter the probability of np random (if its possible)

velvet thorn
#

@quiet pine what exactly do you want to do?

quiet pine
#

i want to return 1 or 0 given a probability ratio

#

basically implement a bernoulli rv

#

ik random returns [0, 1)? i believe

velvet thorn
#

np.random.binomial

#

alternatively, np.random.choice

#

(but the former would be more appropriate)

quiet pine
#

ah yeah im trying to do it w rand specifically because i want to learn how to implement these probabilities

velvet thorn
#

ah, okay

#

so in that case

#

the output of np.random.rand is uniformly distributed, right

#

in the range [0, 1)

quiet pine
#

ye

velvet thorn
#

so think about this.

#

what's the probability

#

that the output will be >= 0.7?

quiet pine
#

.3? no

velvet thorn
#

yup

#

so now

#

let's say your probability of success is 0.3

#

wouldn't you say that the above calculation

#

could represent a Bernoulli RV?

quiet pine
#

yes, altho i mean i want to influence the probability

#

by p

velvet thorn
#

yup

#

so in that case

#

we set 0.7

#

as an arbitrary bound

#

but it doesn't have to be 0.7, right?

#

or rather, 0.7 and 0.3 are arbitrarily chosen

#

to put it another way...given a uniform distribution in the range [0, 1), and a number p also in that range, what is the probability that a randomly drawn value will be >= p?

#

think about that and relate it to the nature of a Bernoulli RV

quiet pine
#

if the value > p then it has probability 1-p and if its less than p it has probability p?

#

or hm wait let me write this out before i come to a conclusion 1sec

#

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
^
P(X <= 0.6) = 0.6, P(X >= 0.4) = 0.4 right okay.

#

ok so if its less than or equal to the value then it represents the probability

#

ohh okay i got it now thanks ๐Ÿ‘

#

lol that was confusing for how simple it is

velvet thorn
#

ye yw ๐Ÿ‘‹

grave frost
#

Anyone know why pytorch checkpoint is recognized as a folder in my Ubuntu and a file in my VM? And if checkpoint is in the form of a folder, how do I load it?

#

The ckeckpoint is supposed to be 20 Gigs, but it was in the download phase where I beleive it lost the correct format

#

Im gonna try to compress it and see if that works

boreal summit
#

For those who use VS code to write Python, I just discovered my intellisense is case sensitive and won't come up except you use the exact case for the word you need. Any walk around this?

undone flare
#

I have a dataset and it has a column called 'CC Exp Date'
and it has dates like 01/20, 04/22, 23/25.... in different rows
How many people have a credit card that expires in 2025?
so
I tried using regex
but I failed miserably lol

#

nvm got it ๐Ÿ‘

grave frost
#

For those who use VS code to write Python, I just discovered my intellisense is case sensitive and won't come up except you use the exact case for the word you need. Any walk around this?
@boreal summit Use Kite

grave frost
#

Can anyone confirm whether a 20Gb checkpoint would necessarily use 20Gb RAM or is there a way to reduce the memory taken by the loading checkpoint? I have 8GB mem on my system but can use Kaggle/Colab for 16Gb too.

boreal summit
#

@grave frost thanks, I'll give it a try.

lapis sequoia
#

can anyone link me to some good ai tutorials using python

prisma isle
#

I need to use scipy's optimise functions to perform gradient descent.
Only issue is, my function is calculated by finding a linear combination of several large matrices

#

Large enough that I can't store them all in memory, so they're numpy memmapped

#

However, there's still a significant memory usage spike while calculating each function value

#

Is there a way to make sure it won't overcap my memory while running? Also, can I keep storing the iteratively computed values to disk, so that in the case it does fail, I can atleast start closer to the minima

undone flare
#

does legend() has default loc set to 0?

autumn imp
open stratus
#

hey... i'm just getting into machine learning (TensorFlow for now) do i need to get anaconda for that or can i just use my usual python 3.8 with tensorflow?

#

i already have multiple versions of python i'd rather not install more... is anaconda a requirement for machine learning or is it optional??

hollow sentinel
#

optional but recommended bc jupyter notebook is great @open stratus

glacial rune
#

I have a list of dictionaries:

[{'store': 'a', 'buy': '1.1312', 'sell': '1.1518'}, 
{'store': 'b', 'buy': '1.1315', 'sell': '1.1517'}, 
{'store': 'c', 'buy': '1.1316', 'sell': '1.1518'},
etc.]

all of the buys and sells are strings. What is the most performant way to convert them all to floats? I made a try_float method but iteratively that takes quite a while

#

if anyone has any performant data structures they'd recommend for processing a list of dictionaries that would be great - I'm trying to see if I can use a numpy array

pearl vine
#

One possibility is to write a dict wrapper that applies a float conversion as the 'buy' and 'sell' values are accessed.

lapis sequoia
#

i can scrap text no problem but i couldnt scrap and download mp3 file

#

i just wanna download first class="speaker exafile fas fa-volume-up hideOnAmp"

prisma isle
#

optional but recommended bc jupyter notebook is great @open stratus
@hollow sentinel you can get jupyter without anaconda too

hallow orbit
#

so I'm trying to make a financial option tool, and I need to decide between using any of:

  • Hidden Markov Models
  • Naive Bayes
  • Bayesian Networks
  • Markov Networks

If I'm inputting past sets of observations, such as price points, bid/ask prices, strike prices, etc, and I input the next day's return on investment for different combinations of those observations, and I want to make the tool predict a return on investment given a unique set of observations, what model should I use? I'm currently going with Bayesian Networks because they're good at inference, but I'm not totally confident.

hollow sentinel
#

@prisma isle my bad haha

fallow prism
#

boys i need help, i want to fix spell errors in spanish text and i don't know haw to start, somebody get an idea?

grave frost
#

@fallow prism Do you want to use ML for that or is any other tool good enough?

#

Does anyone know how to load a big Pytorch checkpoint (20Gigs) without taking 20~ish Gb RAM? I only have 16G

fallow prism
#

@fallow prism Do you want to use ML for that or is any other tool good enough?
@grave frost yes, i want to use it for ML, specifcally NLP

molten hamlet
#

Hi guys,
any idea if I could do this in one line?

hue = np.mean(hsv[:, :, 0])
saturation = np.mean(hsv[:, :, 1])
value = np.mean(hsv[:, :, 2])

hue, sat, val = np.mean(hsv, ....)

pale thunder
#

hue, sat, val = (np.mean(hsv[:, :, n]) for n in range(3)), with this few elements it does not matter that you are using a python comp

#

other than that, maybe np.moveaxis with some clever axis arg for the np.mean

molten hamlet
#

mean returns matrixes if feeded axis, or scalar if not
๐Ÿ˜ I though I will find some numpy solution

#

it does not matter that you are using a python comp
@pale thunder
hey, can you elaborate on that compiler? what u had on mind?

pale thunder
#

that was short for comprehension

molten hamlet
#

ah right

#

I jsut checked, and you can iterate natively on last axis, so for matrix in hsv ๐Ÿ™‚

#

i was wrong

hallow orbit
#

so I'm trying to make a financial option tool, and I need to decide between using any of:

  • Hidden Markov Models
  • Naive Bayes
  • Bayesian Networks
  • Markov Networks

If I'm inputting past sets of observations, such as price points, bid/ask prices, strike prices, etc, and I input the next day's return on investment for different combinations of those observations, and I want to make the tool predict a return on investment given a unique set of observations, what model should I use? I'm currently going with Bayesian Networks because they're good at inference, but I'm not totally confident.

heady hatch
#

Have you considered time series model?

hallow orbit
#

no, what's that?

heady hatch
#

But also I think a clarification on your problem will be helpful too.

From my understanding, you want to predict a return on investment based on some past data.

hallow orbit
#

yes

heady hatch
#

So time series models learn from past data to predict future patterns.

hallow orbit
#

ah

#

are there any other names for time series models?

#

i'm using pomegranate, and it doesn't list time series models as an option

#

it's possible that pomegranate doesnt support it though

heady hatch
#

Iโ€™m not too sure what pomegranate is, but you can look up ARIMA models.

hallow orbit
#

ok

#

do you know any ML libraries that implement time series?

#

o nvm, found one that looks good

#

oh wait I did a dumb, when I agreed with "you want to predict a return on investment based on some past data", I interpreted past data to mean a set of observations that has just been collected, not data from a significant time ago. the thing being predicted is totally independent of past data @heady hatch

heady hatch
#

Oh I see.

#

If that's the case you can try much more algorithms. If you think the relationship is linear or can be transformed into linear, you can try linear models. If not then try some tree based or ensemble models.

#

I think I should clarify what you mean by totally independent of the past data.

#

As in there's no relationship or no time relationship?

#

Because it would be hard to do a prediction on features that doesn't have any relationships at all.

hallow orbit
#

oh

#

there's no time relationship

#

which of these would (most likely) be the best though:

  • Hidden Markov Models
  • Naive Bayes
  • Bayesian Networks
  • Markov Networks

(the person I'm doing this for wants to stick to these models)

wintry atlas
#

Hi all,

I am running the following code:

import math
from scipy import stats

o=float(input("Enter Odds(O):"))
r=float(input("Provide ROI(R):"))

s=abs(math.sqrt(abs(r*(o-r))))
print("\nStandard Deviation(S.D.)="+str(s)+"")

n=float(input("Enter n:"))

t=(math.sqrt(n)*(r-1))/s

print("\nT-score="+str(t)+"\n")

p=round((stats.t.sf(t,n))*100,3)

print("\nP-value="+str(p))
#

for which I'm entering:
Enter Odds(O):4.76

Provide ROI(R):0.1163

Standard Deviation(S.D.)=0.734889318196965

Enter n:8854

T-score=-113.14951036753682

P-value=100.0

#

I just can't quite understand the p-value here

spice cedar
#

Hello, a quick question.
I have a DataFrame with a Time vector, where the time has been given as
00:04
00:08
00:11
and so on, which is an object datatype.
How do i change this to a normal time vector, like 4,8,11, etc.

heady hatch
#

@hallow orbit hard to say without able to know the relationship of your features.

But I would start with naive bayes since that's relatively simple in capturing relationship in a probabilistic way.

mental timber
#

Can someone help me to understand Random Forests Classifiers?

heady hatch
#

Do you have any specific questions?

mental timber
#

ah yes. I want to use random forest to predict results from a dataset

mental timber
#

oh ty ๐Ÿ˜„

midnight rain
heady hatch
#

Hey @midnight rain , have you worked much with rf?

I would love to get your opinion on outliers and imbalanced data with rf.

midnight rain
#

ive done a bit of work with isolation forests

mental timber
#

Ty for the info. I'll look into it

midnight rain
#

but im not a datascientist im a machine learning engineer

heady hatch
#

Ahh.

midnight rain
#

so i do more support then primary modeling

#

if you want to use a RF i really recommend trying an isolation forest

heady hatch
#

Do you mind if I ask you regarding your responsibility as a mle?

midnight rain
#

they work very well and i think they tend to work much better than SVMs in production environments

heady hatch
#

Ahh.

midnight rain
#

sure whats up

heady hatch
#

What's your responsibility like as an mle?

#

Because I've come across a wide definition and would love to add yours to my knowledge base.

midnight rain
#

mmm right now im integrating data science models into a large project we are working on

#

i take the jupyter notebooks from the data scientists and then i turn them into a production ready model by optimizing the code as much as possible and adding production ready error handling etc.

#

im also managing the data pipelines for productionizing the models.

#

and im working on a large project in Neo4J to create an interface for us to query and get insights out of the data produced from the models and our other data scraping

heady hatch
#

Oh that's pretty cool.

Do you add monitoring and testing/debugging for the models?

midnight rain
#

i dont do any monitoring at my current job yet

#

eventually i'll add a feedback loop from our BI work, but thats further down the product pipeline

hallow orbit
#

@heady hatch thanks!

heady hatch
#

Are you the sole/few engineers on your team?

hallow orbit
#

oh wait you're still in the convo, sorry for ping

heady hatch
#

Yea no problem Theelx.

#

Btw pantsforbirds, thank you so much for the information.

It's really nice to hear what other people are doing and working with so I can evaluate myself and see where I fit in.

midnight rain
#

yeah im the only MLE on the team right now

heady hatch
#

Ahh.

midnight rain
#

its a smaller VC firm that im working at now

heady hatch
#

Oh how are deadlines like?

midnight rain
#

we are semi researching right now so not terrible

hallow orbit
#

oh that looks cool

heady hatch
#

That's pretty insane.

#

I wonder how did they get the data realtime.

midnight rain
#

i have no idea the latency on it

hallow orbit
#

they might have a bunch of different scrapers set up for each news sites

midnight rain
#

the data scale is so insane

hollow sentinel
#

is there like a machine learning project idea generator somewhere

#

I'm getting bored

molten hamlet
#

generator?

hollow sentinel
#

yeah that spits out ideas to do

#

there's one like that online with videogames

molten hamlet
hollow sentinel
#

lol don't think i'm at that level yet

#

I'm having a hard time staying motivated to do the Ng course

molten hamlet
#

its not ML actually ;D

hollow sentinel
#

oh my bad then

molten hamlet
#

have you been on open ai gym ?

hollow sentinel
#

no what's that

molten hamlet
#

place with many environments

graceful glacier
#

any recommendations for a SQL IDE?

molten hamlet
#

๐Ÿ˜•

hollow sentinel
#

i need to practice cleaning data

molten hamlet
#

I think regularization is that term

#

but I do not know it

#

๐Ÿ˜ฆ

hollow sentinel
#

lol well i suck at it

#

My best idea is to find kaggle datasets and clean them

molten hamlet
#

I downloaded some fruits from kaggle and jsut classified it ๐Ÿ˜

hollow sentinel
#

oh that's cool

#

lol idk how to do thatyet

candid lodge
#

hi @velvet thorn

velvet thorn
#

okay so about resizability of numpy arrays

#

in the general case, you cannot resize them, because the memory of a numpy array must be contiguous

#

so it's best (IMO) to treat them as static.

#

however, @serene scaffold is actually right in that you can technically resize them with the .resize method.

olive dove
#

I agree

candid lodge
#

so what is the best alternative of vector<T> in C++ for python?

velvet thorn
#

but you reaaaaaaally shouldn't do that because of views and stuff

candid lodge
#

I am trying to make a tile map

#

which requires a 2D layout

#

and store values

olive dove
#

They can start with a list, append to that, then switch to np array right

candid lodge
#

but list is slow right

velvet thorn
#

but you reaaaaaaally shouldn't do that because of views and stuff
@velvet thorn because say you have an array b that is a view into an array a; if you resize a, the behaviour of b is undefined.

olive dove
#

Or you could np.full numbers

velvet thorn
#

so what is the best alternative of vector<T> in C++ for python?
@candid lodge why do you need resizability?

#

if it's a tile map

candid lodge
#

i need a map that is resizable?

velvet thorn
#

what kind of calculations

#

are you doing

#

as in

#

why not just create a new array

#

copying the values

candid lodge
#

oh

velvet thorn
#

so what np.append does is create a new array

candid lodge
#

so np.append(array, values)?

velvet thorn
#

with the passed values added to the end

#

!e

import numpy as np

a = np.array([1, 2, 3])
print(a)
print(np.append(a, [4, 5]))
print(a)
arctic wedgeBOT
#

@velvet thorn :white_check_mark: Your eval job has completed with return code 0.

001 | [1 2 3]
002 | [1 2 3 4 5]
003 | [1 2 3]
velvet thorn
#

you can see that a does not change, because append returns a new array

#

this is unlike the behaviour of native Python list.append

candid lodge
#

ohh

#

that's so alike of vector<T> in C++

#

it creates a new object

velvet thorn
#

you can append inplace to vectors in C++, right?

candid lodge
#

yes

velvet thorn
#

yeah, you can't for numpy arrays (in the general case)

#

so they're really more like Python lists, except faster

#

and statically typed

candid lodge
#

ohh okay thank you

#

do you know how to make a numpy array the initialise the size

#

when created

velvet thorn
#

uh

#

you want an array of zeroes?

candid lodge
#
a = np.array([0, 0, 0, 0, 0, 0, 0, 0])
#

yes

velvet thorn
#

!e

import numpy as np

a = np.zeros((3, 5))
print(a)
arctic wedgeBOT
#

@velvet thorn :white_check_mark: Your eval job has completed with return code 0.

001 | [[0. 0. 0. 0. 0.]
002 |  [0. 0. 0. 0. 0.]
003 |  [0. 0. 0. 0. 0.]]
velvet thorn
#

there you go

candid lodge
#

not 0 but what if different?

velvet thorn
#

what do you mean

#

np.fill

candid lodge
#

what are the parametersw?

velvet thorn
#

you can check the docs

candid lodge
#

alright

velvet thorn
#

or np.full, if you want a new array

#

np.fill is inplace

candid lodge
#

ohh

velvet thorn
#

@serene scaffold btw sorry to ping you but was just wondering - why did you say arrays could be resized?

#

like were you thinking of the same thing I was or was there something else

prime girder
#

Hey I'm new here๐Ÿ‘‹
What happens here?

velvet thorn
#

Hey I'm new here๐Ÿ‘‹
What happens here?
@prime girder we talk about data science

serene scaffold
#

I figured that if you can change the data in an array without creating a new object, then you can also change the size.

velvet thorn
#

ah, okay

serene scaffold
#

@prime girder we talk about data science
@velvet thorn and we don't talk about fight club

velvet thorn
#

thanks for explaining

#

@velvet thorn and we don't talk about fight club
@serene scaffold yes but we ALSO don't talk about fight club

prime girder
#

I feel like there is a lot of context I am missing

velvet thorn
#

but yeah, we discuss data science/machine learning/statistics/etc. and the Python libraries incidental thereto here

prime girder
#

Well I dabble in those

#

Most goes way over my head

serene scaffold
#

@prime girder let me pull up the rules for you

#

&rules

arctic wedgeBOT
#

Python Discord Rules
We have a small but strict set of rules on our server. Please read over them and take them on board. If you don't understand a rule or need to report an incident, please send a direct message to @sonic vapor!
Rule 1
Do not talk about fight club.
Rule 2
DO NOT TALK ABOUT FIGHT CLUB.
Rule 3
Listen to and respect staff members and their instructions.
Rule 4
This is an English-speaking server, so please speak English to the best of your ability.
Rule 5
Do not provide or request help on projects that may break laws, breach terms of services, be considered malicious or inappropriate. Do not help with ongoing exams. Do not provide or request solutions for graded assignments, although general guidance is okay.
Rule 6
No spamming or unapproved advertising, including requests for paid work. Open-source projects can be shared with others in #python-discussion and code reviews can be asked for in a help channel.

prime girder
#

Well there goes my gameplan to get files to hack people with

candid lodge
#

what is the better code of this

#
test = [5, 2, 6, 3]

counter = 0
for e in test:
    print(e)
    print(counter)```
#

i want to keep track of the index

autumn locust
#
test = [5, 2, 6, 3]

for i, value in enumerate(test):
  print(value)
  print(i)
#

@candid lodge

hollow sentinel
#

WHY ARE WE SCREAMING

serene scaffold
#

@lapis sequoia please don't disturb the developers.

boreal summit
#

Guys, each time I try to run GridSearchCv, I always get this invalid parameters error. Even though I'm certain all the hyper parameters are spelled and labelled correctly. I use VS code.

#

Never mind about this again, just found out I spelt neighbours wrongly. I used neighbours instead of neighbors

prime girder
#

Never mind about this again, just found out I spelt neighbours wrongly. I used neighbours instead of neighbors
@boreal summit
Amen to that I hate American spelling

#

Cuz a "u" is soo hard to type

boreal summit
#

@prime girder I've been wondering why the code couldn't run even after doing everything right for the past 2 days. I'm a bit relieved now.

#

Thanks.

heady hatch
#

@velvet thorn I think I'm beginning to understand why database prefer atomic values.

Working with data structures inside columns are a pain.

potent fern
#

Hi

#

Any one know to focus on a specific object out of multiple objects?

velvet thorn
#

@velvet thorn I think I'm beginning to understand why database prefer atomic values.

Working with data structures inside columns are a pain.
@heady hatch by a lot

#

I mean there are databases that do well with those

#

just not SQL

potent fern
#

Hii.. Any one familiar with pyzbar? ๐Ÿ˜ž

#

I am getting error...

While i trying to implement this code :

#

import cv2
import numpy as py
import pyzbar

cap = cv2.VideoCapture(0)
cap.set(3,640)
cap.set(4,480)

while True:

success,img = cap.read()
for barcode in decode(img):
    print(barcode.data)
    mydata = barcode.data.decode('utf-8')
    print(mydata)
    
cv2.imshow('Result',img)
cv2.waitKey(1
undone flare
#

Hey where can I see all the datasets available in seaborn?

smoky bobcat
#

someone can help me with standardisation process? explain how it's done?

mild topaz
#

i have a code which saves a image and do further execution of code
now i do not want to save this image , i want to directly do further execution of code
how i can do this?
my code herepython with open("imagetosave2.png", "wb") as test_img: test_img.write(image_data) test_img = image.load_img("imagetosave2.png", target_size = (64, 64))
here i do not want to save img2.png this here

#

ping me when u have ans

fierce shadow
#

@mild topaz whats image data consisting of?

#

numpy arrays?

#

or what?

#

btw is this channel about data science or for machine learning aswell?

mild topaz
#

see my code herepython image_data = base64.b64decode(data["image"]) print(type(image_data)) data = io.BytesIO(image_data) try: test_img = Image(io.BytesIO(image_data)) except Exception as e : logger.debug ({ "status": "invalid", "message" : "Provide valid base64 string"}) return { "status": "invalid", "message" : "Provide valid base64 string"} test_img = open("img2.png", "rb") image_data = test_img.read() test_img.close() test_img = image.load_img("img2.png", target_size = (64, 64))

#

@fierce shadow

#

btw is this channel about data science or for machine learning aswell?
both

fierce shadow
#

never worked with those base64 stuff... but I am pretty sure you might have to use PIL.Image

#

it has many functions to convert images

mild topaz
#

see here i am not converting any image @fierce shadow

marsh chasm
#

hi! I'm learning some supervised learning stuff and i have a project to find the best classifier for some data; i'm trying svm's rn and I find that the poly kernel takes a long time; I can't seem to find why some kernels take longer than others; it seems to be data dependent too since my friend with a different data set had the same problem but for her the linear kernel was the one that took a long time to run; is there a reason why?

lapis sequoia
#

I'm trying to work with some json data but can't figure out how to gather all information and then use it for an example a function that just get's all the json sections without me specifying the real name like this: 2020-11-11

{
"status": 200,
"type": "stack",
"data": {
"2020-11-11": {
"total_cases": 166707,
"deaths": 6082,
"recovered": 0,
"critical": 129,
"tested": 2431770,
"death_ratio": 0.03648317107260043,
"recovery_ratio": 0
},
"2020-11-10": {
"total_cases": 162240,
"deaths": 6057,
"recovered": 0,
"critical": 92,
"tested": 2431770,
"death_ratio": 0.0373335798816568,
"recovery_ratio": 0
},
"2020-11-09": {
"total_cases": 146461,
"deaths": 6022,
"recovered": 0,
"critical": 92,
"tested": 2431770,
"death_ratio": 0.04111674780316944,
"recovery_ratio": 0
},
"2020-11-08": {
"total_cases": 146461,
"deaths": 6022,
"recovered": 0,
"critical": 92,
"tested": 2431770,
"death_ratio": 0.04111674780316944,
"recovery_ratio": 0
},
"2020-11-07": {
"total_cases": 146461,
"deaths": 6022,
"recovered": 0,
"critical": 92,
"tested": 2431770,
"death_ratio": 0.04111674780316944,
"recovery_ratio": 0
},
"2020-11-06": {
"total_cases": 146461,
"deaths": 6022,
"recovered": 0,
"critical": 92,
"tested": 2431770,
"death_ratio": 0.04111674780316944,
"recovery_ratio": 0
},
"2020-11-05": {
"total_cases": 141764,
"deaths": 6002,
"recovered": 0,
"critical": 90,
"tested": 2242469,
"death_ratio": 0.0423379701475692,
"recovery_ratio": 0
},
"2020-11-04": {
"total_cases": 137730,
"deaths": 5997,
"recovered": 0,
"critical": 73,
"tested": 2242469,
"death_ratio": 0.04354171204530603,
"recovery_ratio": 0
}
}
}
quick helm
#

is there anyone know something about huggingface and text classification with electra?

smoky bobcat
#

how much test size is suggested? 0.3?

#

while doing train test split

boreal summit
#

@marsh chasm could be that your data is high dimensional. You could reduce the dimensionality using PCA or some other dimensionality reduction technique.

#

Also, Your data might be too complex for the model you're using to train it.

cerulean spindle
#

@marsh chasm you could definitely try PCA, but you should also check to see if there are a lot of zeros in your dataset. Sometimes these zeros are treated as a placeholder or null value. You could use the following code to check:

print(np.sum(data == 0)/(data.size))

If this results in a large %, you should consider using the TruncatedSVD dimensionality reduction technique.

mortal pendant
#

With textgenrnn, is it possible to continue training a pre-trained dataset (possibly with more data)? So, like, I'll train with a datest with 10000 datapoints for 5 epochs one day, and then the next day I can continue to train with that same data (possible with now 10200 datapoints) from the hdf5 file for another 5 epochs to get even better results?

marsh chasm
#

@marsh chasm could be that your data is high dimensional. You could reduce the dimensionality using PCA or some other dimensionality reduction technique.
@boreal summit yeah Iโ€™ll try PCA thanks

#

@marsh chasm you could definitely try PCA, but you should also check to see if there are a lot of zeros in your dataset. Sometimes these zeros are treated as a placeholder or null value. You could use the following code to check:

print(np.sum(data == 0)/(data.size))

If this results in a large %, you should consider using the TruncatedSVD dimensionality reduction technique.
@cerulean spindle ok cool! Thanks so much

cerulean spindle
#

@marsh chasm Are you using MNIST dataset?

marsh chasm
#

No Iโ€™m using the Wisconsin breast cancer data set

#

On kaggle

cerulean spindle
#

oh ok

azure holly
#

Does anyone here mess with Tensorflow? Pretty much learned what I can from the entire Python Crash Course book and was wanting to move into ML. Only been doing Python for like 8 months. Should I learn about something else before Tensorflow and ML or just go straight into it?

heady hatch
#

Are you familiar with ML foundation? Such as train, validation, test split, overfitting, underfitting, imbalanced datasets, model evaluation, optimization, different kinds of ml problems, data cleaning, transformation, selection, etc etc?

#

Along with mathematical foundation such as linear algebra, probabilities, statistics, and calculus?

#

Or you can also dive straight in and go with a top down approach instead of a bottom up.

#

Ultimately depends on your learning style and how much you are willing to adapt.

#

There's fastai which teaches it in top down perspective and you learn the models as tools and then learning how to take it apart.

raw vigil
#

Does anyone have any good datasets for chatbot?

#

Please ping me

#

thanks

livid temple
#

I've been using pandas/python/jupyter for many years now, but i recently saw an R notebook and it looked really really clean/easy. Can someone explain to me what benefit R might have to someone who already knows python/pandas/jupyter well?

cerulean spindle
#

I believe R is a more statistically minded approach, but I'm not sure.

livid temple
#

my initial thoughts were that it looked cleaner, but python is maybe more granular?

torpid cave
#

Anyone here who knows both R and Python?

#

I have something I have been doing in R for quite a while but implementing it in Python is a hassle

hearty jewel
#

def children(data):
if data=0:
return 'childless'
if 1 <= data =<3:
return '1-3 children'
if data > 3:
return '4+ children'

#

im getting a syntax error with this

#

can anyone help lol

#

says data=0 is syntax error

torpid cave
#

data == 0

hearty jewel
#

ty

#

yes

torpid cave
#

= is for assignmet, == is for comparison

#

nww

hearty jewel
#

nowim getting new syntax error

#

for the =<3

torpid cave
#

welp

#

it is badly written

#

Haha

hearty jewel
#

im noob

#

lol

torpid cave
#

no worries

hearty jewel
#

whats wrong with the <= 3

torpid cave
#

You should do

#

if data >= 1 and data <= 3

hearty jewel
#

it wouldnt be and right

#

would be &>?

torpid cave
#

and

#

actually

hearty jewel
#

if data>=1 and data<=3:

torpid cave
#

Let me check, I have been coding in R

#

and got the syntax confused for both

hearty jewel
#

i got it

#

it worked

#

thanks bro

#

โค๏ธ

torpid cave
#

def children(data):
if data=0:
return 'childless'
if data >= 1 and data <= 3:
return '1-3 children'
if data > 3:
return '4+ children'

hearty jewel
#

u a god

torpid cave
#

Keep on working on Python

hearty jewel
#

i will one day become a god like you

torpid cave
#

I am not a god haha

#

Just learn that comparison syntax and you should be fine

#

I just wished someone helped me with my issue, I am overcomplicating my code

heady hatch
#

@torpid cave I'm not familiar with R, but might be able to help you translate.

What are you trying to do in terms of code?

torpid cave
#

I have one dataframe with responses, and another dataframe with keys

#

I just need to translate responses to keys

#

My initial approach (works in R) was:
df.apply(lambda x: df2[df2['key'] == x]['code'].item())

heady hatch
#

Hm could you give me an example of the dataframes?

torpid cave
#

let me get a repex

#

one sec

hearty jewel
#

for column in insurance.columns:
pivot=insurance.pivot_table('charges', index=column)
display(pivot)
pivot.plot.bar(stacked=False)

#

oscar im getting an error

#

with the new columns we just made

torpid cave
#

one sec @hearty jewel

hearty jewel
#

that code worked with all columns except the new columns

#

ValueError: Grouper for 'charges' not 1-dimensional

heady hatch
#

Do you have two columns named "charges"?

hearty jewel
#

No

torpid cave
#
df = pd.DataFrame(dict(
    Sample1 = [5,2,10,2,2],
    Sample2 = [5,5,5,10,10]))

df2 = pd.DataFrame(dict(
    Keys = ['A','B','C'],
    Values = ['5', '2', '10'] ))
#

@heady hatch

#

What I am trying to do is just convert df into df2 letters

heady hatch
#

so change all the 5 to 'A'?

torpid cave
#

yep

heady hatch
#

Your apply makes sense.

torpid cave
#

I am doing this frankestein

def TranslateList(list1):
    def LookValue(value):
        value = str(value)
        value = info[1][info[1]['IDNumber'] == value]['Sample'].item()
        return value
    
    translation = []
    for item in list1:
        translation.append(LookValue(item))
    return(translation)

defCreateTrasnlatedTable():
#I am writting this now

#

But a one-liner should do it

#

Not sure why I can't get it right

#

@hearty jewel what are you trying to do? I have some more time before I start work

heady hatch
#

So another way of doing is to creating a dictionary to index into.

df2_dict = df2.set_index('Values')['Keys'].to_dict()

df1.apply(lambda x: df2_dict[x])
torpid cave
#

Let me try

#

I never think about dictionaries

heady hatch
#

This is assuming that the values to keys mapping is unique.

torpid cave
#

AH yes

#

I control the data input

heady hatch
#

Yea let me know how that goes.

torpid cave
#

TypeError: 'Series' objects are mutable, thus they cannot be hashed

#

damn it

#

haha

#

My df is quite complex, let me check the code

#

But I think a dict should be the way

heady hatch
#

Hmm does your values or keys have Series object?

torpid cave
#

not sure

#

I think I got it

heady hatch
#

Nice nice nice.

torpid cave
#

Nevermind

heady hatch
#

Oh hahaha

#

Do you mind sharing the structure of your dataframe?

torpid cave
#
info[0][['Sample1','Sample2','Sample3']].apply(lambda x: info_dict.get(x))
#

Tried that

heady hatch
#

Like an actual structure.

#

Oh hm.

torpid cave
#

so info is a list of dataframes

#

df 0 is the on I am using

#

df[0]

#

and the columns with the keys are the ones I am interested

heady hatch
#

To double check, info_dict is okay? Like you can create it and index into the hashes.

torpid cave
#

so info_dict is

hearty jewel
#

i want to pivot between each column and charges

torpid cave
#

info_dict = info[1].drop(['Title'],axis=1).set_index('IDNumber')['Sample'].to_dict()

hearty jewel
#

and show a graph

torpid cave
#
info_dict = info[1].drop(['Title'],axis=1).set_index('IDNumber')['Sample'].to_dict()
hearty jewel
#

in a for loop

heady hatch
#

And print info_dict for yourself, is that what you're expecting?

torpid cave
#

{'329': 'A', '587': 'A', '433': 'B', '274': 'B'}

#
print(info_dict)
{'329': 'A', '587': 'A', '433': 'B', '274': 'B'}
heady hatch
#

And that's what you want?

torpid cave
#

Yep

#

key to value

#

I might just solve it and post it to SO

heady hatch
#

Cool. Now hmm for info.

You said info is a list of dataframes?

torpid cave
#

yes

#

info is a list that contains 2 dataframes

#

I just did it like that to have some order, I dont like having so many variables in the code

#

Should not affect anything

heady hatch
#

OH

#

You probably need applymap.

torpid cave
#

@hearty jewel maybe try showing me what output you are looking for. I could help with data manipulation.
For graphs I use ggplot in R and the plots I did in Python are just copy/paste so I won't be able to help there

#

@heady hatch how is that?

#

nvm I will just read the docs

heady hatch
#

applymap is elementwise, apply is series.

torpid cave
#

oh

#

damn

heady hatch
#

And I think the error is tripping up when you do dict.get(SeriesObject).

torpid cave
#

fck

#

That was it

#

works

#

hahahaha

heady hatch
#

๐Ÿ™‚

#

Yea sorry in my head I was thinking you're working with one column.

torpid cave
#

I thought, I should not be coding something this simple that hard

#

Why is Python not as simple as R

#

many thanks, you just won lots of internet points

heady hatch
#

Hope your journey is swell from here on.

torpid cave
#
info[0][['Sample1','Sample2','Sample3','Diff']].applymap(LookValue)

Works like a charm

lapis sequoia
#

Hey guys trying to train my gan getting a weird traceback

#

Can someone help me figure out what's going in

#

*on

#

The traceback would infer it's a type error but I've just loaded up type8 numpy arrays into tf.dataset.from tensor slices or whatever

#

Should I upload my h5py files and convert the numpy array to type 64 or whatever it's saying is appropriate

hasty grail
#

You should train your models with float32 inputs in general. If you're using images, it's recommended that you rescale them to the [0, 1] range.

lapis sequoia
#

For gans they recommend you normalise between -1 and 1 tho

#

Most documentation I've seen does that

#

Which is what I've done

#

All my data is between those numbers

#

Anyway I think I'm gonna try reshape with numpy to float32 cos they're float64

hasty grail
#

That would be called type casting, not reshaping. [-1, 1] would also work, I personally don't have much experience with GANs

lapis sequoia
#

My bad

#

It is asking me to cast it to a supported type

#

TypeError: Failed to convert object of type <class 'tensorflow.python.data.ops.dataset_ops.BatchDataset'> to Tensor. Contents: <BatchDataset shapes: ((1, 256, 256, 1), (1,)), types: (tf.float64, tf.int32)>. Consider casting elements to a supported type.

hasty grail
#

you're dealing with a dataset, which doesn't have a dtype of its own

#

you should map the dataset with a function that casts its elements to the correct dtype

#
# Suppose your dataset is the variable `ds`
cast_ds = ds.map(lambda x, y: (tf.cast(x, tf.float32), y))
lapis sequoia
#

I mean I cast the thing as float32 before storing it in the tf.dataset file

#

Didn't work

#

But idk

#

How do I implement that into the code

hasty grail
#

Implement what?

lapis sequoia
#

The cast ds

hasty grail
#

I just did it above

lapis sequoia
#

Like would copying what you did work

#

Ok I'll try ut

hasty grail
#

That was an example

#

You will have to use your own variable names and stuff

summer island
#

Hii, Can anyone help me as I would like to get excel sheet from complex nested JSON file?

hollow gull
#

python has a json library that might be useful. That can help you turn it into a python dict. Then pandas can build a dataframe from a dict, and a pandas dataframe can be saved to a csv or workbook.

marsh chasm
#

Hi! I was wondering if people here are familiar with the validation_curve functionality of sklearn. Basically I was wondering if for the x axis on a validation curve I can plot instead of a hyperparameter a combination of hyperparameters like so:

#

my teacher somehow managed to produce that plot, unfortunately i can't see how given the limitation of the validation_curve function with param_name and param_range

hollow gull
#

The x-axis format reminds me of what matplotlib.pyplot does by default if you have a multi-index dataframe, but I am not sure on that. Even if what I said is correct I am not sure if it will help you. The short answer is I don't have any experience with the validation_curve function.

marsh chasm
#

yeah the thing is i feel like i need the validation_curve in order to plot both the training and validation score (every time i try to look up how to plot them without validation_curve google just tries to show me validation_curve xD)

#

pls ping me if you could help! i'd greatly appreciate it. i asked in a help channel before but my helper and i got stuck xD

#

I can just use the gridsearch features

#

ugh i totally forgot about that

#

oh wait but still that doesnt show me the training vs validation score

#

hm

mortal pendant
graceful glacier
#

any resources for sql projects besides kaggle?

torpid cave
#

Hi guys, anyone here knows how to do math in Python?

#

I mean solving equations by calling out other variables

zealous holly
#

is this a good channel for web scraping

velvet thorn
#

I mean solving equations by calling out other variables
@torpid cave are you talking about symbolic math?

undone flare
lapis sequoia
#

Guess you used matplotlib

undone flare
#

no seaborn

mental timber
#

getting this error not sure how to fix. Note: im still learning python so sorry for my stupidity xd

hasty grail
#

that only works if you're in a Jupyter notebook

#

inside a regular script, it's nonsense

mental timber
#

so it doesnt work in spyder... damn...

undone flare
#

What is better : sns.factorplot(kind='bar') or the sns.barplot()

mental timber
#

whats the difference between them if you dont mind me asking

undone flare
#

There is actually no difference

#

it's just that factorplot has a kind attribute

#

factorplot/catplot

#

so like if you set kind to violin it will act as sns.violinplot()

mental timber
#

I see. Ty

undone flare
#

so what would you prefer?

mental timber
#

I'm making a random forest code to predict something using a dataset i found. So just going around and researching different codes and reading which one would be best and easiest

undone flare
#

I mean would you use factorplot() or the specific kind plots

mental timber
#

hmm, I prolly would since it'll make is easier to understand for me

undone flare
#

so you would use specific kind plots?

mental timber
#

i guess ye

undone flare
#

okay

#

I would switch between those and see what suits me xD

mental timber
#

ah ok xD

lapis sequoia
#

Hey guys I'm still having trouble with my model

#

I'm confused because shouldn't a numpy array that is stored in a tf.dataset be a tensor

hasty grail
#

It should

lapis sequoia
#

It doesn't make sense that the error I'm getting is saying BatchDataset is cannot be converted to a tensor

hasty grail
#

you need to distinguish between a Dataset and an element of a Dataset

#

a Dataset is a Dataset

#

an element of a Dataset is a Tensor

#

Datasets are basically a better version of a vanilla Python generator when it comes to iterating over data

#

they are not convertible to ndarrays directly

lapis sequoia
#

But even if I cast it as a float 32 before loading it with from_tensor _slices I still get the same thing

hasty grail
#

Dataset is not a tensor

#

as such, it doesn't have a dtype

#

so you can't cast it

#

you can only cast the elements of the dataset

lapis sequoia
#

Yeah I meant cast the image array

hasty grail
#

in that case you need to map the dataset

#

as I have shown previously

#

the mapping function is applied to each element of the dataset

#

either that, or you cast the array before converting it into a dataset

lapis sequoia
#

Well I tried the latter and it.didnt work I got the same thing

hasty grail
#

how did you do it

lapis sequoia
#

Just saying float32 instead of float64

#

g_x_in = g_x_in.astype('float32')

hasty grail
#

and then?

lapis sequoia
#

Same error traceback

hasty grail
#

what does the error say again?

lapis sequoia
#

Just float32 instead of 64

#

One sec

#

TypeError: Failed to convert object of type <class 'tensorflow.python.data.ops.dataset_ops.BatchDataset'> to Tensor. Contents: <BatchDataset shapes: ((1, 256, 256, 1), types: (tf.float32)>. Consider casting elements to a supported type.

#

That's the error I get when I cast it as float32 before converting

hasty grail
#

can you provide your code again?

#

Seems that you're passing a dataset to tf.cast which doesn't work because, as mentioned above, datasets don't have a dtype

lapis sequoia
hasty grail
#

gen_output = generator(g_dataset, training=True)

#

this doesn't work

#

you need to pass in a tensor

#

not a dataset

lapis sequoia
#

Aight

hasty grail
#

such as the input argument of the function this code lies within

lapis sequoia
#

Yeah so now I'm getting a thing saying content is larger than 2gb but my training data (g_x_in) ,is 300mb

#

Cannot create a tensor proto whose content is larger than 2gb

weary heart
#

Hi, i'm new to data science and i want to create some ML project but i need some datasets, is there any recommendation site for good datasets that have more than 5k data? other than kaggle and UCI

lapis sequoia
#

My batch size is 1 bro lmao

#

I got no idea wtf to dp

hasty grail
#

can you print input?

#

your fit function also has a loop that doesn't make sense

#
    # Train
    for n, (g_dataset) in train_ds.enumerate():
      print('.', end='')
      if (n+1) % 100 == 0:
        print()
      train_step(g_dataset, target_dataset, epoch)
    print()
#

train_ds.enumerate() yields the train step, and an element of train_ds

#

so using g_dataset is misleading

#

datasets don't yield datasets

heady hatch
#

@weary heart what kind of data are you looking for? Have you considered scraping?

lapis sequoia
#

Ok changing that thanksd

#

That's what I get when I print the generator input

weary heart
#

i'm looking for some kind of e-commerce datasets or something like that. i haven't tried scraping atm

hasty grail
#

looks correct

lapis sequoia
#

Idk man I'm so confused as to why it doesn't work

hasty grail
#

look at the call stack and determine the type of each variable that is relevant

#

check that they have the correct type (don't confuse Dataset with Tensor!)

lapis sequoia
#

Well yeah ive changed all calls to tensors

#

Where appropriate at least

#

The problem is the size?

#

But it's totally below 2gb

#

I've looked up the error on Google and I can't find anything thats relevant

#

This stinks

hasty grail
#

can you show the error log?

heady hatch
#

Sorry to nitpick but is this correct?
g_x_in = np.array(g_x_in) - 127.5/127.5

#

It looks like array - 1

hasty grail
#

does that make sense to you?

heady hatch
#

Oh lmao I didn't mean that as in I wrote it. I picked it from @lapis sequoia 's code.

hasty grail
#

nvm thought kash said that

#

lol

lapis sequoia
#

Let me put brackets around that

#

I'll grab the error logs one sec dude

#

I'll be right back homies

hasty grail
#

oh

#

you need to zip the two datasets

#

if you're passing in a dataset, the y parameter in tf.keras.Model.fit will be ignored

#

according to the docs:

#
  • A tf.data dataset. Should return a tuple of either (inputs, targets) or (inputs, targets, sample_weights).
undone flare
#
g.map(sns.displot,'total_bill')```
What is wrong with this? It gives displot separated than the grid
#

but when I do

g.map(sns.distplot,'total_bill')``` It works fine but a warning comes distplot will be deprecated in future release
#
g.map(sns.histplot,'total_bill')``` Works but I just want to know why displot won't work?
lapis sequoia
#

@hasty grail you know how to zip it?

hasty grail
#

tf.data.Dataset.zip

lapis sequoia
#

So would I do smth like g_dataset = tf.data.dataset.zip(g_dataset)

unique flicker
#
async def masscloneemoji(self, ctx, emoji: discord.PartialEmoji, name=None):

What should I change here if I want to be able to add multiple emojis at once?

hasty grail
#

look at the example in the docs @lapis sequoia

#

zip is in the sense of the vanilla Python zip

lapis sequoia
#

Yeah ok will do

undone flare
#

anyone know why displot doesn't work with grid?

#
g.map(sns.displot,'total_bill')```
What is wrong with this? It gives displot separated than the grid

This is what I am talking about

lapis sequoia
#

@hasty grail same error

hasty grail
#

code?

lapis sequoia
#

Sec

hasty grail
#

g_dataset = tf.data.Dataset.zip(g_dataset) that's not how you zip datasets

#

did you read the docs?

#

you need to zip the sample and label datasets together into a single dataset

#

and pass only that dataset to fit

lapis sequoia
#

I think it wont work because g_x_in is an ndarray

#

I wasn't sure what to do because it said you need to put dataset objects in it

#

G_y_in is a dataset

#

I can comment out the line where it's g_x_in/127.5 - 1

#

That'll change g_x_in to a dataset and the zip will work but then the issue is how do I normalise all the data

hasty grail
#

You can normalize the data, then convert it into a dataset

#

Or use .map on the dataset to apply a mapping function to each element

lapis sequoia
#
BUFFER_SIZE = 5000

gen_input = h5py.File('/content/gdrive/My Drive/Colab Notebooks/files/training_mnist_raw.h5','r')
g_x_in = gen_input.get('images')
g_x_in = np.array(g_x_in)/127.5 - 1 
g_y_in = gen_input.get('labels')
g_dataset = tf.data.Dataset.from_tensor_slices(g_x_in)
g_dataset = g_dataset.shuffle(BUFFER_SIZE)
g_dataset = g_dataset.batch(BATCH_SIZE,drop_remainder=True)
g_dataset = tf.data.Dataset.zip((g_dataset,g_y_in))```

 gives me

```TypeError                                 Traceback (most recent call last)
<ipython-input-19-262507c436a9> in <module>()
      8 g_dataset = tf.data.Dataset.from_tensor_slices(g_x_in)
      9 
---> 10 g_dataset = tf.data.Dataset.zip((g_dataset,g_y_in))
     11 
     12 

1 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py in zip(datasets)
    998       Dataset: A `Dataset`.
    999     """
-> 1000     return ZipDataset(datasets)
   1001 
   1002   def concatenate(self, dataset):

/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py in __init__(self, datasets)
   3481           message = ("The argument to `Dataset.zip()` must be a nested "
   3482                      "structure of `Dataset` objects.")
-> 3483         raise TypeError(message)
   3484     self._datasets = datasets
   3485     self._structure = nest.pack_sequence_as(

TypeError: The argument to `Dataset.zip()` must be a nested structure of `Dataset` objects.```
#

I don't know if it's different because of the fact that its a batch dataset

#

Because g_dataset before being zipped is a batch dataset object

cobalt jetty
#

what is inside your zip?

#

structure wise

lapis sequoia
#

Images and labels

cobalt jetty
#

The point seems to be that the content of your zip isn't properly structured to be accepted by tensorflow.
Since you want to use MNIST, it seems, you should find it easier to do this:

import tensorflow_datasets as tfds
(ds_train, ds_test), ds_info = tfds.load(
    'mnist',
    split=['train', 'test'],
    shuffle_files=True,
    as_supervised=True,
    with_info=True,
)
def normalize_img(image, label):
  """
  Normalizes images: `uint8` -> `float32`.
  """
  return tf.cast(image, tf.float32) / 255., label

ds_train = ds_train.map(normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(128)
ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)

ds_test = ds_test.map(normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_test = ds_test.batch(128)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)
lapis sequoia
#

@cobalt jetty thanks man, my data is actually like a smaller version of mnist and the images are distorted

#

It's like 5000 images

#

Do you think the same code would work with just my unloaded image and label dataset from my h5py files? The images are 256*256

cobalt jetty
#

The mnist images are formatted in 28x28 natively, where do you get a 256x256 mnist dataset?

lapis sequoia
#

So these are actually photos I took with a camera that captured the mnist images that were projected with a laser using a DMD

#

And then I distorted those images using scattering media

#

Lol

#

Those images were 256

#

This project is kinda like a superresolution project

#

I'm essentially correcting the distortion with a gan

bold olive
#

Do the sk-learn classifiers not work with just a single feature?

cobalt jetty
#

since you want to work on a dataset that is comparable to the MNIST, I would resize your secondary dataset to 28x28.

#

there are some pretty nice pre-processing functions from tf/keras shown there

#

especially tf.keras.preprocessing.image_dataset_from_directory, which might ease your workflow.

lapis sequoia
#

I don't think I would need to do that though, my generator downsamples the image and outputs a 28*28 image

#

I've seen precedent of this working in a paper which was the inspiration for this project

cobalt jetty
#

mhm

#

I've used that page to help preprocess a relatively chonky dataset to train a NSFW detector.

lapis sequoia
cobalt jetty
#

You're trying to create a LeNet-like network?

lapis sequoia
#

I've never heard of that I thought this was based off the original SRGAN

#

Just appropriated to a physics context

#

Basically the generated output is 28*28 anyway

cobalt jetty
#

So what are you trying to achieve? At first glance it seems like you're trying to compress a 128ยฒ picture into a 28ยฒ one.

lapis sequoia
#

My images (which are 256*256) go through the generator and are processed to eventually look like the target images (mnist) and then go through the discriminator which decides if the image is fake or real

#

It's essentially a network that is making predictions of what the image looked like prior to distortion

cobalt jetty
#

that's actually neat.

lapis sequoia
#

Would be if I could get it to work lol

#

I just can't seem to figure out how to load in data, I've been trying to zip data but I feel like I shouldn't even need to my data isnt over 2gb

#

My training dataset is 313 mb

cobalt jetty
#

what's the issue? You can't load your data in memory?

#

I don't understand why you'd need to zip the data to work with it

lapis sequoia
#

Yeah exactly lol

#

But no the issue is when I run fit

#

I'll paste the traceback

#

This is an error I get when I try to run my code

cobalt jetty
#

can I get the code of the cell where it happens?

lapis sequoia
#

Sure one sec

cobalt jetty
#

also I'm in class right now, I might not answer quickly.

grave frost
#

Anyone know how to load a large checkpoint without eating up all the RAM? (I have about a 20G Pytorch checkpoint) Would prefer a solution that can make it work in about 16G of memory....

lapis sequoia
#

No problem I really appreciate your help all the same

#

Also it's 11:40 pm for me so I might knock soon, if you want you can just DM me

#

Really appreciate everything tho guys thank you all

cobalt jetty
#

you have an issue in how your g_dataset or dataset are constructed.

#

you're trying to pass a file which tensorflow cannot parse as a tensor because it's too big.

#

However if your dataset is only 353mb, you might have a preprocessing issue

#

you have to look in your previous cells

#

how you process and get your two variables.

lapis sequoia
#

I don't know I just used sentdex's method for storing data in h5py

#

I mean he stored his in pickle files but I just used hdf5

#

Cos pickle sux

#

Lol

cobalt jetty
#

tbh, your dataset is small enough you can just use keras to load pictures directly in batches of like 16, 34, 64

#

no need to perform some preprocessing like that.

smoky bobcat
#

is logistic regression and naive bayes good for numerical classification dataset?

#

supervised learning classification

cobalt jetty
#

depends on the complexity of your dataset, but I'd say that logistic regression is a good start for binary classification iirc.

lapis sequoia
#

Yeร h I was thinking about that roms

smoky bobcat
#

depends on the complexity of your dataset, but I'd say that logistic regression is a good start for binary classification iirc.
@cobalt jetty the target is binary yeah

cobalt jetty
#

then I'd go back to the Keras page I sent you and use the functions there.

lapis sequoia
#

I would use keras preprocessing right

#

Yeah

smoky bobcat
#

@cobalt jetty what about naive bayes? is that good for binary target as well?

cobalt jetty
#

I've never used naive bayes, so I can't tell.

smoky bobcat
#

i got decision tree, logistic regression and naive bayes on my list

#

ook

cobalt jetty
#

but like logistic regression is super easy to implement

smoky bobcat
#

all of them are easy, same code just changes the model function

#

make the model, fit the model and test predictions

#

tell me if i'm skipping something

cobalt jetty
#

like with sci-kit learn

dataset = pd.read_csv("AAAAAAAAAAAAH.csv",sep=";")
train_set, test_set = train_test_split(dataset, test_size=0.2)
model = LogisticRegression()
model.fit(train_set.attribute, train_set.label)
pred = model.predict(test_set)```
smoky bobcat
#

yeah that's what i was saying, are there harder ways to implement the learning algorithms?

cobalt jetty
#

You could implement the logreg from scratch

#

doing something from scratch without using libraries is always the hardest.

grave frost
#

Anyone know how to load a large checkpoint without eating up all the RAM? (I have about a 20G Pytorch checkpoint) Would prefer a solution that can make it work in about 16G of memory....

cobalt jetty
#

if you're talking about all the checkpoints in Pytorch, isn't there a load function where you can specify which checkpoint you want to load.

#

I'd be surprised a model would weight 20gb

#

I've not used Pytorch much so I can't really help

fallow prism
#

hello everybody, anyone knows something about pycharm respect to the others python editors like vscode or sublime? what is the difference beetwen scientific python development and pure python development? give me your opinions ๐Ÿค“ ๐Ÿค“ ๐Ÿค“

austere swift
#

yeah I wouldn't think a checkpoint would be 20gb, usually mine only ever go up to like 200mb

#

but I don't think theres any way to do that anyways, you really do have to load everything into memory anyways since the model itself has to be stored in memory

#

so even if you could somehow load the checkpoint without an oom error, you'd still have to have the whole model in your memory anyways

grave frost
#

@austere swift ik, but model can also be read through SSD which may impact performance but wouldn't matter much since I am not training, only inferencing. So, that wouldn't really impact time taken that much

austere swift
#

Iโ€™ve never read the model off disk so I wouldnโ€™t know how to do that lol

spark nimbus
grave frost
#

With an online calc, puts it to 6s to load the model which doesn't sound that bad, and I can have it done in a day or so

spark nimbus
#

Does anyone know of things relating to audio you'd like to be explained in a simplified way?

grave frost
#

@austere swift Are you sure that a 200Mb model would take exactly 200 RAM, perhaps there is some clever memory tricks done on the way to save memory..?

cobalt jetty
#

tbh, I'm always intrigued at how people can splice voice out of a clip (with music for instance) or vice versa.

#

but not enough to read up on that.

grave frost
#

@cobalt jetty Using Machine Learning

spark nimbus
#

@cobalt jetty either by using a bandpass, machine learning or trying to recreate the music part and subtracting it

cobalt jetty
#

I'm answering Mart, but not just ML

#

splicing voice out is older than ML

spark nimbus
#

Bandpass and a bit of manual sample editing is probably what they did back in the day

cobalt jetty
#

mhm

#

an uneducated guess of mine was that voice and instruments are usually not recorded with the same mics and so voice and instruments would be recorded on different subparts of let's say a magnetic band.

#

so one could only read those parts.

#

but I was wrong, I see.

crude marsh
#

Guys. Anyone up? I need some help

spark nimbus
#

That's usually the case for source files, but due to size constraints on tapes/cds it all had to be put on one channel (two for stereo), and that was usually stored as interleaved data at best

crude marsh
#

Can someone help me? I just need to know how to go about an Idea I have, I will code it myself

#

I just need the framework

spark nimbus
#

@crude marsh what's the issue?

crude marsh
#

I have this code that basically scrapes a share price from the web and then prints out the price

#

I need it to record the prices in an excel file

spark nimbus
#

@crude marsh try openpyxl, or if even something simple works, you could export it as CSV using the built-in csv library

crude marsh
#

So, in open csv, Does it have to be in a table form?

spark nimbus
#

!docs csv.writer

arctic wedgeBOT
#
csv.writer(csvfile, dialect='excel', **fmtparams)```
Return a writer object responsible for converting the userโ€™s data into delimited strings on the given file-like object. *csvfile* can be any object with a `write()` method. If *csvfile* is a file object, it should be opened with `newline=''` [1](#id3). An optional *dialect* parameter can be given which is used to define a set of parameters specific to a particular CSV dialect. It may be an instance of a subclass of the [`Dialect`](#csv.Dialect "csv.Dialect") class or one of the strings returned by the [`list_dialects()`](#csv.list_dialects "csv.list_dialects") function. The other optional *fmtparams* keyword arguments can be given to override individual formatting parameters in the current dialect. For full details about the dialect and formatting parameters, see section [Dialects and Formatting Parameters](#csv-fmt-params)... [read more](https://docs.python.org/3/library/csv.html#csv.writer)
crude marsh
#

Whats this?

spark nimbus
#

The documentation for something that writes to a CSV file, there's an example if you click read more

crude marsh
#

I am not able to access the file

spark nimbus
#

Uhh...

crude marsh
#

I can only read !docs csv.writer

spark nimbus
#

๐Ÿค”

#

do you have embeds disabled somehow?

crude marsh
#

I dont know. How to enable them?

spark nimbus
#

they should be enabled by default

#

can you screenshot what you see?

crude marsh
#

Hmm. Yeah sure.

#

This is what I see.

#

AAh, just a min. found out whats wrong

#

now I can see

#

@spark nimbus U still there?

spark nimbus
#

Yeah

crude marsh
#

Yeah, I can read it now. I will post the code here for your reference

quiet breach
#

does anyone know whether there's a better way to select rows in a dataframe based on a column value than what I'm currently using?

part_df = df[df['path'].str.startswith(directory_path, na=False)]

I'm dealing with 2+ million rows and the command above takes about 11 seconds to complete

crude marsh
#
# Imports

import bs4
import requests

#Custom Function
def get_share_price(share_url):
    res = requests.get(share_url)
    res.raise_for_status()

#Element finder
    soup = bs4.BeautifulSoup(res.text, features="html.parser")
    elems = soup.select('#quote-header-info > div.My\(6px\).Pos\(r\).smartphone_Mt\(6px\) > div.D\(ib\).Va\(m\).Maw\(65\%\).Ov\(h\) > div > span.Trsdu\(0\.3s\).Fw\(b\).Fz\(36px\).Mb\(-4px\).D\(ib\)')
    return elems[0].text.strip()

#Get price

price = get_share_price('https://in.finance.yahoo.com/quote/HDFCBANK.NS/history/')

#Call

print('The price of HDFC bank share is ' + price)
#

@quiet breach Sorry mate, I am intermediate

#

Why don't u use copy, paste and ctrlF to do it quickly?

#

@quiet breach

quiet breach
#

how do you mean?

crude marsh
#

Like, you need to type this line several times, right?

#

is that your qn?

quiet breach
#

huh? no

#

I have a dataframe of 2 million rows

#

where I must select a subset containing only the rows where the value in the 'path' column matches a string

#

the initial scope of what I'm working on required this to happen 50 times

#

so I was fine with it taking 10 seconds per run

#

now it needs to run 120k times :)

crude marsh
#

Ahh, I see, since I am an intermediate, I cant really help a lot, but from what I know, you can scrape the code using python to return only the values that meet a specific criteria

spark nimbus
#

@crude marsh doesn't yahoo finance have an API so you don';t need to scrape?

crude marsh
#

Yeah, but I am doing it to have some basic experience with web scraping

spark nimbus
#

ah

crude marsh
#

then I can move on to some complex projects with confidence

#

Like scraping wikipedia

#

Aight, Imma go see if CSV works

spark nimbus
#

wikimedia api exists

cobalt jetty
#

Yahoo doesn't support an API anymore IIRC. Last year their API was removed -- maybe it's changed since.

#

That caused me issues when I tried to recreate my own VIX index calculator.

crude marsh
#

Yeah, but I want it to create a chart connecting different articles with each other

#

You know what I mean

#

?

cobalt jetty
#

Not really. What do you mean by articles?

crude marsh
#

Wait a min. I will show you

#

I can t find the image

#

It basically shows how one article leads to another

#

like there are links to other articles right?

#

those

cobalt jetty
#

what do you mean by article here?

crude marsh
#

any wikipedia page

#

a mindmap/ flowchart

cobalt jetty
#

aren't you working with Yahoo Finance, tho?

crude marsh
#

Yeah, this is my future project

cobalt jetty
#

based on your snippet above.

crude marsh
#

I plan on building it

#

Not yet though

#

How can I make my code transfer all the data to csv file?

#

Any idea. You see my code above

#

What should I edit so that It transferrs it to a CSV file?

cobalt jetty
#

transfer the stock data your scraped into a panda dataframe then just use the method .to_csv('file.csv')

crude marsh
#

Ahh. I see

#

I just need to convert it to a table and then use .to_csv('file.csv')

#

Right?

#

I just need to convert it to a table and then use .to_csv('file.csv')
Using pandas

quiet breach
#

table?

#

dataframe

#

then indeed, df.to_csv(path, options)

crude marsh
#

Okay

#

Arigato(Thanks)

marsh chasm
#

@remote valley i ended up figuring it out; i didn't realize gridsearch had the ability to return test scores and training scores; this way i don't have to use the validation_curve function and can just directly plot it using matplotlib

remote valley
#

@marsh chasm nice. thanks for telling me. gridsearch does the validation curve stuff for the whole set of parameters and plots with correct axis labels for the parameter set? sounds way easier.

ivory panther
#

Anybody who have experience using multiindex on Pandas?

mortal pendant
grave path
#

Hello guys what does it mean when my cross validation score is less than my model accruacy score?

heady hatch
#

@grave path could you give more information on what you mean by cv score vs accuracy score? Is your cv using a different metric?

grave path
#

hello nine so I did split my data first and do the scaling for them my accuracy was 84% then I tried to apply cross validation on the scaled model and the accuracy was 79%

#
print("{:.3f}".format(scores.mean())) ```
#

@heady hatch

heady hatch
#

@grave path So you're saying the scores for the scaled model was lower than the score of the model unscaled?

#

Could you clarified on what you mean by your accuracy was 84%?

grave path
#

Model score = 74%
Model Scaled = 84%
Cross Validation for model scaled = 79%

#

because I have training and testing data

heady hatch
#

Right so what's model score and what's model scaled?

#

Model themselves don't have score unless you're talking about oob.

grave path
#

Model = LogisiticRegression()

heady hatch
#

Okay.

#

Is this score on the training data?

grave path
#

So my question is that how come my cross validation accuracy was lower

#

Yes Nine

heady hatch
#

So

#

Just to clarify.

#

You're asking why your training score was higher than your cv score?

grave path
#

isn't cross validation supposed to split the dataset and try to fnd the best accuracy

#

Yes

heady hatch
#

Okay I guess here, give me these information.

#
  • cv score of model not scaled
  • cv score of the model scaled.
#

cv isn't trying to find the best accuracy.

#

cv is trying to see how your model will generalize on a validation set.

grave path
#

cv score of model not scaled:71%
cv score of the model scaled:79%

heady hatch
#

Okay so what I'm seeing here is your model is generalizing better on the validation set.

#

Meaning that it's capturing more signal when the data is scaled.

#

It doesn't really make sense to compare one model's score on training data to another model's cv score.

#

Often times in classical ml, if training score is higher than validation score, that could mean your model is overfitting.

grave path
#

what do you mean when you say cv is trying to see how your model will generalize on a validation set

heady hatch
#

So do you know how cross validation works, especially in default where it's kfold?

grave path
#

kfold is the number of splits right?

heady hatch
#

Mhm!

grave path
#

might have misunderstood what it does then

grave path
#

but is 79% considered good when cross validating ?

heady hatch
#

Metric evaluation is another topic. hahaha

#

It depends on the problem.

grave path
#

yes so we split the dataset and each time we change the training and testing split and then the cv will be the mean of these right?

heady hatch
#

Right.

grave path
#

oh I think you cleared something for me then

heady hatch
#

It splits it into x section, in 5 folds, it's 5 sections.

#

It treats one of them as the validation set. And your model trains on the rest, and test it on the validation set.

grave path
#

So this is an overall accuracy since it tests more possible outcomes and the scaled has nothing to do with it

#

I should compare it on the split I did manually right

heady hatch
#

Hm could you clarify on what you meant by scaled has nothing to do with it?

#

It seems like scaling does better, since your data respond well to scaling.

#

Isn't that's what your cv showed? 71 vs 79.

grave path
#

Yes but that doesn't have to be interpreted in a bad way

#

or does it?

heady hatch
#

I'm not sure. What do you mean by interpreted in a bad way?

grave path
#

I was thinking that cv must be higher than Scaled data or something is wrong

heady hatch
#

You can think of scaled vs not scaled model as two different models.

#

Because cv is higher in model that scales the data, that's a sign that your data provides more signal when it's scaled.

#

Or that your model isn't able to capture the signal properly when it's not scaled.

#

cv is just a scoring method to tell you how your model generalizes on data it wasn't trained on.

#

Because you don't want to just train it on the training set and test it on the training set.

grave path
#

Yeah I see your point since cv trains and tests on everything eventually then it will give you the gerelized score

#

since testing it wouldn't make sense since its not new data

slender eagle
#

Does anyone here also know Abstract graph transformation, and deriving binary logic from it?

grave path
#

Thanks a lot for the help Nine

heady hatch
#

Yea no problem, glad to be of help.

molten hamlet
#

uh

#

I have to run cell, nvm

#

thanks guys, you are awesome ๐Ÿ˜„

plush zenith
#

Hi can i ask here something about Matrix operations in an iteration?

heady hatch
#

Try it, and if people don't answer, try somewhere else.

plush zenith
#

okey

#

thanks

#

im having this error

#

No loop matching the specified signature and casting was found for ufunc inv

#
    J=sp.lambdify([x, y],[dp1,dp2], "numpy")
    f=sp.lambdify([x, y],[dp1,dp2], "numpy")
    v = v0
    print(v)
    for i in range(20):
        Jr=np.array(J(v[0], v[1]))
        fi=np.array(f(v[0],v[1]))
        J_inv=np.linalg.inv(Jr)
        #print(J_inv)
        print("")
        v = v - J_inv @ fi
        print("v")
        print(v)
        print("")
    return 
#

basically this is what i wanth to tierate

#

J and f are two matrix (J is from derivatives) but the thing is i dont know how to make the iteration works without errors

heady hatch
#

What kind of errors are you getting?

plush zenith
#

TypeError: No loop matching the specified signature and casting was found for ufunc inv