#data-science-and-ml | Python | Page 343

prisma mulch Sep 24, 2021, 12:49 PM

#

(scikit)

velvet thorn Sep 24, 2021, 12:51 PM

#

a random forest is a bunch of decision trees, each of which is fit to a subset of your training data (basically)

#

n_estimators = number of trees

prisma mulch Sep 24, 2021, 12:52 PM

#

thank you

velvet thorn Sep 24, 2021, 12:53 PM

#

yw

prisma mulch Sep 24, 2021, 12:53 PM

#

so if i set the value of n_estimators too high, will overfitting like problems reappear?

velvet thorn Sep 24, 2021, 12:53 PM

#

prisma mulch so if i set the value of n_estimators too high, will overfitting like problems r...

excellent question

#

so

#

why do we fit multiple trees?

#

what are the characteristics of a single tree? <- start with this

prisma mulch Sep 24, 2021, 12:54 PM

#

to get better results?

prisma mulch Sep 24, 2021, 12:55 PM

#

velvet thorn what are the characteristics of a single tree? <- start with this

it is a binary tree

#

that is all i got

velvet thorn Sep 24, 2021, 12:56 PM

#

prisma mulch it is a binary tree

heh, no, sorry

#

what I mean is more like

#

are you familiar with bias and variance

#

?

prisma mulch Sep 24, 2021, 12:56 PM

#

velvet thorn are you familiar with bias and variance

somewhat, go ahead

velvet thorn Sep 24, 2021, 12:57 PM

#

prisma mulch somewhat, go ahead

so

#

if you fit a decision tree on your data

#

without any constraints

#

if possible

#

it will overfit madly

#

because

prisma mulch Sep 24, 2021, 12:57 PM

#

it will identify wrong patterns

velvet thorn Sep 24, 2021, 12:57 PM

#

it can memorise your entire training set

#

agree?

prisma mulch Sep 24, 2021, 12:57 PM

#

yeah

velvet thorn Sep 24, 2021, 12:57 PM

#

the trees in a random forest are constrained in 2 ways

#

they don't see the whole dataset

#

they are limited in depth

#

this serves to limit overfitting

prisma mulch Sep 24, 2021, 12:57 PM

#

yeah

#

Oh nice

prisma mulch Sep 24, 2021, 12:58 PM

#

velvet thorn this serves to limit overfitting

that is awesome!

#

but does it mean that they can be vulnerable to underfitting then if the value is too low?

velvet thorn Sep 24, 2021, 12:58 PM

#

prisma mulch but does it mean that they can be vulnerable to underfitting then if the value i...

what do you think? 🙂

prisma mulch Sep 24, 2021, 12:59 PM

#

velvet thorn what do you think? 🙂

yes?

#

does it dynamically split the trees?

velvet thorn Sep 24, 2021, 12:59 PM

#

prisma mulch does it dynamically split the trees?

what do you mean

velvet thorn Sep 24, 2021, 12:59 PM

#

prisma mulch yes?

elaborate

prisma mulch Sep 24, 2021, 1:00 PM

#

velvet thorn elaborate

if the trees it is making is going to be fixed length with fixed dataset whether n_estimator is 2 or 15, then yes

velvet thorn Sep 24, 2021, 1:00 PM

#

prisma mulch if the trees it is making is going to be fixed length with fixed dataset whether...

the trees are always the same depth, IIRC

#

unless you change the setting

#

but eahc tree sees a slightly different subset

#

it's called bagging (bootstrap aggregation)

prisma mulch Sep 24, 2021, 1:02 PM

#

so, I guess it should be vulnerable to underfitting?

velvet thorn Sep 24, 2021, 1:02 PM

#

prisma mulch so, I guess it should be vulnerable to underfitting?

yup

#

well, depends on the settings

#

in general, you have p tiny trees for random forests, so yes

prisma mulch Sep 24, 2021, 1:03 PM

#

so, why don't you set n_estimator to the highest values?

#

performance?

velvet thorn Sep 24, 2021, 1:03 PM

#

prisma mulch performance?

this, and @ some point you don't get much out of it

prisma mulch Sep 24, 2021, 1:03 PM

#

velvet thorn this, and @ some point you don't get much out of it

diminishing returns?

velvet thorn Sep 24, 2021, 1:03 PM

#

prisma mulch diminishing returns?

yes

prisma mulch Sep 24, 2021, 1:03 PM

#

ahh

velvet thorn Sep 24, 2021, 1:03 PM

#

think about it this way

#

each tree sees a random subset of the data

#

but the fitting itself is deterministic-ish

#

the more trees you have

#

the higher the probability that two trees will see the same data

lapis sequoia Sep 24, 2021, 1:04 PM

#

Mathematics is problem solving. Just watching videos does not teach, you have to do tasks. Watching videos and reading is good, but solving problems develops the most.

prisma mulch Sep 24, 2021, 1:04 PM

#

thanks for your help @velvet thorn

velvet thorn Sep 24, 2021, 1:04 PM

#

prisma mulch thanks for your help <@!171929073063297024>

yw 👋 hope you understand better now!

serene scaffold Sep 24, 2021, 1:04 PM

#

lapis sequoia Mathematics is problem solving. Just watching videos does not teach, you have to...

I mention that in my earlier comment.

velvet thorn Sep 24, 2021, 1:05 PM

#

lapis sequoia Mathematics is problem solving. Just watching videos does not teach, you have to...

different people learn differently, too.

#

I personally don't do videos @ all

serene scaffold Sep 24, 2021, 1:05 PM

#

see here @lapis sequoia

lapis sequoia Sep 24, 2021, 1:05 PM

#

serene scaffold I mention that in my earlier comment.

👍

lapis sequoia Sep 24, 2021, 1:05 PM

#

serene scaffold see here <@456226577798135808>

Very important point!

serene scaffold Sep 24, 2021, 1:07 PM

#

On a more general note, I'm of the opinion that all learning is self-learning.

lapis sequoia Sep 24, 2021, 1:09 PM

#

Yes. I see too much here in university that some students think they get everything on a tray, even though the intention is to develop in increasingly challenging problem-solving tasks.

desert bear Sep 24, 2021, 2:56 PM

#

I managed to find the answer. I asked about it without showing an example, because I saw this in many ML models and though it was a general thing.
Here's the answer https://stats.stackexchange.com/questions/153823/what-is-verbose-in-scikit-learn-package-of-python

Cross Validated

What is "Verbose" in scikit-learn package of Python?

What is "Verbose" in scikit-learn package of Python? In some models like neural network and svm we can set it's value to true. This is the documentation:

verbose : bool, default: False
Enable

surreal jetty Sep 24, 2021, 3:13 PM

#

hello! pandas' resample somehow moves my table columns around. Any idea how to revert it?
original:

---------------------
| time | name | val |
---------------------

after resample it looks like this:

---------------------
|        name | val |
| time |
---------------------

Any idea how to revert it back to the original, or prevent it from happening?

velvet thorn Sep 24, 2021, 3:14 PM

#

surreal jetty hello! pandas' `resample` somehow moves my table columns around. Any idea how to...

that's because

#

that column

#

becomes the index

#

you can turn it back into a column with reset_index()

#

but

#

why?

surreal jetty Sep 24, 2021, 3:15 PM

#

becase accessing the data is a bit more tricky

#

cant do df['time'] anymore

velvet thorn Sep 24, 2021, 3:15 PM

#

surreal jetty cant do df['time'] anymore

df.index

#

not a big thing though

#

shrugs

surreal jetty Sep 24, 2021, 3:27 PM

#

seems like the other values are a bit tricky to access as well

velvet thorn Sep 24, 2021, 3:31 PM

#

surreal jetty seems like the other values are a bit tricky to access as well

can you elaborate

surreal jetty Sep 24, 2021, 3:32 PM

#

but i guess thats not really that relevant. What im really trying to do is given the resampled series

    time                   val
0    2021-09-23 13:27:00    1092.307692
1    2021-09-23 13:30:00    1091.789474
2    2021-09-23 13:33:00    1089.692308
3    2021-09-23 13:36:00    1089.000000
4    2021-09-23 13:39:00    1089.200000
5    2021-09-23 13:42:00    1089.400000
6    2021-09-23 13:45:00    1089.333333
7    2021-09-23 13:48:00    1089.666667
8    2021-09-23 13:51:00    1089.000000
9    2021-09-23 13:54:00    1089.000000
10    2021-09-23 13:57:00    1089.666667

and turn it into a "change per hour" using least square. The sklearn's reg.fit expects some training values (which given this case im not sure is a right approach)

velvet thorn Sep 24, 2021, 3:33 PM

#

surreal jetty but i guess thats not really that relevant. What im really trying to do is given...

I don't understand

#

and turn it into a "change per hour" using least square.

#

this part

surreal jetty Sep 24, 2021, 3:35 PM

#

looking at the data, it seems like val is reduced by 3 in 30 minutes. So thats 6/hr, which is what im trying to fit using least square

velvet thorn Sep 24, 2021, 3:35 PM

#

surreal jetty looking at the data, it seems like val is reduced by 3 in 30 minutes. So thats 6...

so you want

#

the difference

#

between successive values?

surreal jetty Sep 24, 2021, 3:35 PM

#

the real world data is a lot more noisy so a plain diff doesnt work that well

#

i think ordinary least squares is whats its called

#

i got a c implementation somewhere, but it's quite alot of code so i'd rather use a library if possible

velvet thorn Sep 24, 2021, 3:38 PM

#

surreal jetty i _think_ ordinary least squares is whats its called

like...regression?

#

how does that relate to taking a diff

surreal jetty Sep 24, 2021, 3:43 PM

#

yeah i guess. my english math terms are a bit rusty

#

well i mean you could model the change using a simple diff

#

or you could use regression

civic wadi Sep 24, 2021, 4:13 PM

#

Hey I am currently doing a research internship in nlp it's basically handeling homonyms and contextual words in sentiment analysis anyone know any nice papers related to this topic ?

bronze skiff Sep 24, 2021, 4:48 PM

#

velvet thorn how does that relate to taking a diff

he's probably taking a diff OF the linear regressed values

bronze skiff Sep 24, 2021, 5:01 PM

#

surreal jetty yeah i guess. my english math terms are a bit rusty

no it's fine-- ordinary least squares is a type of linear regression

#

it's kinda faux pas to just say that all regression is original least squares

#

your english terms aren't rusty

pure gull Sep 24, 2021, 5:04 PM

#

surreal jetty well i mean you could model the change using a simple diff

I would run linear regression with a linear model, a*x+b, of value vs time. Then the rate is the slope of the regression line, the parameter a.

#

(At least as a first step)

lapis sequoia Sep 24, 2021, 5:06 PM

#

Hi guys, is there someone with knowledge about recommendation engines? I'm writing my thesis on this and would like to talk to an expert

grave frost Sep 24, 2021, 5:23 PM

#

Every problem is 🥳
change my mind

serene scaffold Sep 24, 2021, 5:24 PM

#

brb gonna do linear regression to figure out what 5 + x is for any x

grave frost Sep 24, 2021, 5:25 PM

#

every problem is a result of several other problems created by intelligent apes known as humans. In the end, intelligence is just a biophysical process.

create AGI, create intelligence.

solve everything

#

pretty good startup pitch, eh? 😏

coral kindle Sep 24, 2021, 6:21 PM

#

Just saw scikit-learn upgraded to 1.0

#

Welp, RIP compatibility

#

Though I think some of us will stick with 0.24 for a while

violet walrus Sep 24, 2021, 7:34 PM

#

Hi all, hope all is well. Is this the best channel to discuss MLE and Data Scientist interview questions? I'm looking for a channel/resource to do specifically that

bronze skiff Sep 24, 2021, 7:42 PM

#

coral kindle Welp, RIP compatibility

what incompatibilities do you have there?

#

most things are literally either stable, or things that should be deprecated are deprecated

coral kindle Sep 24, 2021, 8:05 PM

#

I've been using the pipeline and GridSearchCV APIs

#

But i think it shouldn't be a problem

outer sorrel Sep 24, 2021, 8:51 PM

#

Anyone wanting to join me on an open source project to create a 2d self driving car simulation using NEAT and pygame? ( i have completed most of the code, i just need a few teammates to help with more features and bugs, i can send you the git hub link)

tacit raft Sep 24, 2021, 11:02 PM

#

Hello. I would like to try to learn about Reinforcement Learning. Most of the material I find either yada yadas over creating an environment etc. Or are super technical. I am willing do do a deep dive into the technical but would like a happy medium to start with. Anyone have any good resources? Thanks in advance.

odd meteor Sep 25, 2021, 12:51 AM

#

outer sorrel Anyone wanting to join me on an open source project to create a 2d self driving ...

I'd love to contribute but I don't know about Reinforcement Learning yet. I'm still learning ML & Deep Learning.

outer sorrel Sep 25, 2021, 12:57 AM

#

odd meteor I'd love to contribute but I don't know about Reinforcement Learning yet. I'm st...

This is actually not reinforcement learning, its pretty simple actually, just a genetic algorithm, comparable to selective breeding.

#

Neuro evolution of augmenting toppoligies

#

NEAT

#

http://nn.cs.utexas.edu/downloads/papers/stanley.ec02.pdf

#

thats a basic rundown of it^

#

or actually, i think thats the longest one, there are shorter ones on te website.

odd meteor Sep 25, 2021, 1:00 AM

#

outer sorrel This is actually not reinforcement learning, its pretty simple actually, just a ...

I just recently started learning deep learning so I don't have that much experience yet. I'll check out the attached link as well 😊

lapis sequoia Sep 25, 2021, 4:24 AM

#

hello

worthy flower Sep 25, 2021, 5:46 AM

#

i was scraping instagram posts links using selenium and my script is working fine but i am able to scrape only 2k links but the posts are 300,000 and i don't why the browser stops loading content

lapis sequoia Sep 25, 2021, 10:36 AM

#

can someone help me?

#

hi, i spent the whole day fixing many errors, but im stuck on this one, can someone help me? im working on OpenCV for a project, making the harry potter invisibility cloak, if someone knows anything about this error then please help

#

hi, i spent the whole day fixing many errors, but im stuck on one, can someone help me? im working on OpenCV for a project, making the harry potter invisibility cloak, if someone knows how to use it then dm me please

edgy hearth Sep 25, 2021, 11:01 AM

#

hey can anyone help me out

#

can you tell me where do i start in ml or data sicence

#

??

ripe forge Sep 25, 2021, 11:04 AM

#

edgy hearth can you tell me where do i start in ml or data sicence

take a look at some pins on this channel

edgy hearth Sep 25, 2021, 11:04 AM

#

oke

#

thanks

edgy hearth Sep 25, 2021, 11:05 AM

#

ripe forge take a look at some pins on this channel

like is there some kind of video that can help me out

#

to learn AI

#

i found a book in the pinned stuff

odd meteor Sep 25, 2021, 12:05 PM

#

edgy hearth like is there some kind of video that can help me out

University of Udemy
University of Youtube
University of Coursera

If you wanna get a Masters apply for graduate school. I started from buying a ML course on Udemy

edgy hearth Sep 25, 2021, 12:06 PM

#

odd meteor University of Udemy University of Youtube University of Coursera If you wanna g...

oke lemme try udemy

edgy hearth Sep 25, 2021, 12:06 PM

#

odd meteor University of Udemy University of Youtube University of Coursera If you wanna g...

what university of youtube ?

odd meteor Sep 25, 2021, 12:06 PM

#

edgy hearth what university of youtube ?

😀 YouTube

edgy hearth Sep 25, 2021, 12:07 PM

#

yes ik jk

odd meteor Sep 25, 2021, 12:08 PM

#

I was taught R in school but I later decided to learn python because that's the programming language used in most of the courses I'm using to learn.

So yeah you can start with Python.

edgy hearth Sep 25, 2021, 12:08 PM

#

odd meteor I was taught R in school but I later decided to learn python because that's the ...

i started with python

#

ik everything about python

#

almost

#

what do you want to give info

#

yess plz

#

lol which hand

#

yeah even i want an example

#

hehe

#

yes it is

#

i dont knwo ai

#

so im asking what to do

#

ye

zinc rock Sep 25, 2021, 12:52 PM

#

random question but is pytorch supposed to download painfully slow

#

#

the wheel for older python version doesnt work so i have to use conda install

velvet thorn Sep 25, 2021, 1:04 PM

#

zinc rock random question but is pytorch supposed to download painfully slow

I remember the package repos being on the slow side

zinc rock Sep 25, 2021, 1:05 PM

#

tensorflow installs like

#

instantly

#

is there a way to speed things up im not sure why the pip installs dont work

austere swift Sep 25, 2021, 1:37 PM

#

yeah their repos are pretty slow

#

you can't do anything about it

tough bolt Sep 25, 2021, 1:44 PM

#

Has anyone here worked with pytorch-geometric?
(https://pytorch-geometric.readthedocs.io/en/latest/index.html)

tough bolt Sep 25, 2021, 1:45 PM

#

zinc rock

installed it today, I'd usually have 100 mbit down yet it took me what felt like 30 minutes

tough bolt Sep 25, 2021, 1:49 PM

#

tough bolt Has anyone here worked with pytorch-geometric? (https://pytorch-geometric.readt...

I'd have some specific question regarding GNNs as I'm not 100% sure if they will work in my usecase

drowsy wraith Sep 25, 2021, 2:02 PM

#

Does someone know an easy way to install opencv with cuda?

serene scaffold Sep 25, 2021, 2:22 PM

#

drowsy wraith Does someone know an easy way to install opencv with cuda?

on what os

drowsy wraith Sep 25, 2021, 2:25 PM

#

ubuntu

tender hearth Sep 25, 2021, 2:43 PM

#

tough bolt installed it today, I'd usually have 100 mbit down yet it took me what felt like...

This is why the first thing I do with a new OS install is install my big Python packages so that I can use that sweet pip cache 😆

#

... Linux breaks sometimes, don't judge

lapis sequoia Sep 25, 2021, 2:49 PM

#

hi guys, can i you help with an assignmen, that I can't solve ?

tough bolt Sep 25, 2021, 2:54 PM

#

tender hearth This is why the first thing I do with a new OS install is install my big Python ...

Yeah ... I thought I still had anaconda and pytorch installed. Turns out that was on my old machine.

It's always a fun weekend activity spending hours on getting the enviroment set up...

solid raptor Sep 25, 2021, 2:54 PM

#

can anyone here guide me for a time series based dataset
i'm having trouble processing it
basically not knowing where to start

#

Its for an assignment but any guidance would be appreciable

tough bolt Sep 25, 2021, 2:57 PM

#

solid raptor can anyone here guide me for a time series based dataset i'm having trouble proc...

having trouble processing it

what exactly is your problem?

What does your dataset look like?

What do you need to with it?

Hard giving any suggestion without any infos

solid raptor Sep 25, 2021, 2:59 PM

#

Thank u for replying @tough bolt
sorry i was collecting images

#

Basically this is the data
it has 3Mx3 rows and colums
each of patientid,date of the incident and incident

serene scaffold Sep 25, 2021, 3:01 PM

#

lapis sequoia hi guys, can i you help with an assignmen, that I can't solve ?

In general, no one will volunteer to help unless you explain what the question is.

solid raptor Sep 25, 2021, 3:02 PM

#

#

its a dataset of 27k unique patients
having different incidents on diff dates
and our motive is to predict who can survive a new "Target drug" based on their historical incidents

#

Some of the patients present in the test file are eligible for the drug prescription within a month and some of them are not, using each patient’s historical data predict if he/she is eligible for the “Target Drug”

#

the problem i'm facing is i dont know on what factor should i configure if someone is eligible for the drug or not

#

i can pm u the assignment if i couldn't explain the problem properly

feral patrol Sep 25, 2021, 5:17 PM

#

Hi,
I am having problem figuring which version of python is on the cluster.
If I type spark-submit --version
2.2.0.cloudera2 Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_131
If I type python --version I get
Python 2.7.6
Reason is that I am trying to use a version of OneHotEncoder, but it changed in 2.4

#

The error I get using OneHotEncoder is TypeError: __init__() got an unexpected keyword argument 'outputCols'
I cannot import OneHotEncoderEstimator

zealous hinge Sep 25, 2021, 5:25 PM

#

this probably doesn't help you any, but are you sure you want to use python2? It's quite obsolete

#

unsupported, etc

#

also IME it's not super-obvious which version of python that spark will run, if you're submitting a python script

#

the one time I did that, I made sure to put my preferred python right at the front of PATH

feral patrol Sep 25, 2021, 5:27 PM

#

yeah, its not up to me. Else I would had switch

zealous hinge Sep 25, 2021, 5:28 PM

#

can you submit a simple script that looks like ```py
import sys
print("Hello world! I am python", sys.version)

#

That would tell you which version is running on the cluster

feral patrol Sep 25, 2021, 5:30 PM

#

('Hello world! I am python', '2.7.13 |Anaconda 4.3.1 (64-bit)| (default, Dec 20 2016, 23:09:15) \n[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]')

#

With that then I will just work with the idea that I am in 2.7 and find a work around.

zealous hinge Sep 25, 2021, 5:34 PM

#

👍

proven quail Sep 25, 2021, 6:47 PM

#

Hi, i have some issue with a code, someone have deep understanding in matplotlib that i can pm please ?

feral patrol Sep 25, 2021, 7:21 PM

#

Is there an IDE to recommend for someone starting? Something that would help me get to documentation faster, import helps, autofill after a tab, and stuff like that.

zealous hinge Sep 25, 2021, 7:22 PM

#

VSC is popular

#

I kinda dig it, kinda

#

it's not at all specific to data science, that I know of, although there might be some handy plugins

#

https://github.com/datagy/pandas-lil-helper e.g.

GitHub

GitHub - datagy/pandas-lil-helper: VSCode Extension for Pandas lil ...

VSCode Extension for Pandas lil Helper. Contribute to datagy/pandas-lil-helper development by creating an account on GitHub.

feral patrol Sep 25, 2021, 7:24 PM

#

Is it hard to set up to be able to code in PySpark? I created a new file, and you can only get Python.
Sadly, I am not allow to use Pandas atm. I used to know why, but I forgot :S Maybe because distributed environment.

zealous hinge Sep 25, 2021, 7:25 PM

#

I'd say "no" since I've done it, and I'm dumb 🙂

#

dunno what you mean by "you can only get Python"

feral patrol Sep 25, 2021, 7:32 PM

#

I couldnt load any spark libraries, so it was basically Python only. I am trying to set the environment right now.

zealous hinge Sep 25, 2021, 7:35 PM

#

how were you trying to "load spark libraries"?

feral patrol Sep 25, 2021, 7:37 PM

#

from pyspark import SparkContext

zealous hinge Sep 25, 2021, 7:38 PM

#

what happened?

#

when I do that, iirc, it pauses for like 30 seconds, but then returns sucessfully

#

oh, no, it's when I do wat = SparkContext() that takes forever. (It's spinning up a giant Java process in the background)

#

https://paste.pythondiscord.com/ejisedoyon.yaml e.g.

feral patrol Sep 25, 2021, 7:43 PM

#

imports cannot be resolved by Pylance
Sadly I see that I would also need to create a whole new envieronment using python 2.7 to be able to work on it.
I am trying to use OneHotEncoder, which I manage to get to work in Docker with Pythong 3. Sadly int he cluster is 2.7
So it doesnt accept multiple columns, so I went around it and made each column at a time.
and now it tells me that
"'OneHotEncoder' object has no attribute 'fit'"
Which I dont know how to go around. I need to learn more about how to find documentation (but from 2.7)

zealous hinge Sep 25, 2021, 7:44 PM

#

pylance might only work with python3, for all I know

#

I don't know what OneHotEncoder is, but it too might only work with python3

#

that's the world you're in, you'll have to get used to it

feral patrol Sep 25, 2021, 7:46 PM

#

its been on python since 2.4 under that name, and it does work on the 2 previous lines. I just need to figure whats the translation from "fit" to python 2.7

tacit basin Sep 25, 2021, 8:26 PM

#

What is clustering method where cluster centers are points from input data. For example kmeans will optput cluster centeres that are not in general points from input data. Is there a such an algorithm in scikit learn for example?

serene scaffold Sep 25, 2021, 8:38 PM

#

tacit basin What is clustering method where cluster centers are points from input data. For ...

so you basically want kmeans except that for whatever clusters it comes up with, the centroids have to be one of the points?

#

is there any reason that you can't pick the points closest to the centroid determined by kmeans?

tacit basin Sep 25, 2021, 8:43 PM

#

serene scaffold so you basically want kmeans except that for whatever clusters it comes up with,...

yes that's correct

tacit basin Sep 25, 2021, 8:44 PM

#

serene scaffold is there any reason that you can't pick the points closest to the centroid deter...

i was thinking that way too. then was reading different scikit learn algos, but don't think any of them choses centers from points. unless i missed something. that's why my question.

#

i was thinking that calculating kmeans and then finding closest point would be like duplicating calculations. i am am thinking about tweaking kmeans to come up with centeres that are in points.

celest light Sep 25, 2021, 9:31 PM

#

tacit basin What is clustering method where cluster centers are points from input data. For ...

That is the K Medoids algorithm. Exactly like K Means but uses cluster centroids from the actual data.

celest light Sep 25, 2021, 9:32 PM

#

celest light That is the K Medoids algorithm. Exactly like K Means but uses cluster centroids...

It is not in the sklearn library but is present in the sklearn-extra library

tacit basin Sep 25, 2021, 9:39 PM

#

celest light That is the K Medoids algorithm. Exactly like K Means but uses cluster centroids...

wow. fantastic. didn't know about sklearn extra. thank you mayur7garg!

tender stag Sep 26, 2021, 12:02 AM

#

hey can someone help me with evaluating certain pts in a numpy array without looping

errant parcel Sep 26, 2021, 1:03 AM

#

Whats a good way of cheaply ingesting massive amounts of table data into some cloud service (so i can move it around and download specific parts more easily)

#

would be terabytes as json so i want a proper format ideally but also something that i can easily append to and won't become corrupted if a write messes up etc

#

(would be time series and frequently written to)

zealous hinge Sep 26, 2021, 1:06 AM

#

I imagine AWS, Google, Azure, etc have that sort of thing -- Azure's is called "Databricks" iirc

errant parcel Sep 26, 2021, 1:10 AM

#

data lakes is a new word to me, looks like i should do some more reading, thanks

#

i was hoping to avoid ingesting it into an actual database service just cause that makes it harder to download chunks to work with offline

#

or i assumed it would at least

umbral skiff Sep 26, 2021, 1:16 AM

#

I have html code and I want get value with regular expression, but I'm note getting.

<td>Vínculo</td>
                                <td>CARGO COMISSIONADO</td>

vinculo = re.findall("""<td>Vínculo</td>
                                <td>([A-Z]+)</td>""", html_detalhes)                             ```

errant parcel Sep 26, 2021, 1:16 AM

#

what

#

it's a different word

#

Vínculo vs Matrícula

#

??

royal crest Sep 26, 2021, 1:16 AM

#

i presume they want everything within <td> and </td>

errant parcel Sep 26, 2021, 1:17 AM

#

oh

umbral skiff Sep 26, 2021, 1:17 AM

#

I want to get the value "CARGO COMISSIONADO"

royal crest Sep 26, 2021, 1:18 AM

#

i just think your regex is flawed

umbral skiff Sep 26, 2021, 1:19 AM

#

code correct!

vinculo = re.findall("""<td>Vínculo</td>
                                <td>([A-Z]+)</td>""", html_detalhes)

tender hearth Sep 26, 2021, 1:19 AM

#

er, this is off-topic for this channel

royal crest Sep 26, 2021, 1:20 AM

#

regex needs a channel of its own 😜

royal crest Sep 26, 2021, 3:34 AM

#

whooooosh

lime current Sep 26, 2021, 4:32 AM

#

Hi, I am new to Machine learning .

#

I need someone's guidance on a project which I picked up from "ineuron open data science project".
Hoping for a start to end mentorship.
(Some of the work like data scraping, data preprocessing , pipeline, HLD and LLD documentation).

hollow palm Sep 26, 2021, 4:47 AM

#

Hello, can anyone help me understand what this means:

Screen_Shot_2021-09-25_at_10.36.47_PM.png

#

I am not sure what the phi symbol is at all

#

But the equation is something to do with Gaussian Models

velvet thorn Sep 26, 2021, 5:14 AM

#

hollow palm Hello, can anyone help me understand what this means:

cumulative density function?

tacit basin Sep 26, 2021, 5:49 AM

#

celest light That is the K Medoids algorithm. Exactly like K Means but uses cluster centroids...

That worked great. Thanks for sharing this lib. Now, is there a way to calculate inertia in different way? Kmedoid uses sum of distances from cluster medoid to each point in cluster. How to change it to max distance in cluster?

prisma mulch Sep 26, 2021, 6:21 AM

#

can someone eli5 dropout to me

#

and why it is so great

desert oar Sep 26, 2021, 7:43 AM

#

hollow palm But the equation is something to do with Gaussian Models

upper-case Φ is often used to represent the gaussian cdf (cumulative density function, aka distribution function). lower-case φ is often used to represent the gaussian pdf (probability density function).

#

this looks like a bayesian mixture model. this line states that the likelihood of x, given the full set of parameters Θ, is a weighted sum of two different gaussian likelihoods

desert oar Sep 26, 2021, 7:48 AM

#

prisma mulch can someone eli5 dropout to me

having fewer brain cells makes you less clever. models should be clever enough to make generalizations. but if your model gets too clever, it starts to find patterns in the data that don't exist. this is bad. so we make our models less clever in order to make them smarter in the long run.

in other words, dropout helps prevent overfitting.

#

https://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf

prisma mulch Sep 26, 2021, 10:45 AM

#

desert oar having fewer brain cells makes you less clever. models should be clever enough t...

nice

#

thanks @desert oar

velvet thorn Sep 26, 2021, 11:38 AM

#

desert oar having fewer brain cells makes you less clever. models should be clever enough t...

this is pretty good actually

#

I like it a lot

celest light Sep 26, 2021, 11:57 AM

#

tacit basin That worked great. Thanks for sharing this lib. Now, is there a way to calculate...

Haven't actually used K Medoids so I don't know. Maybe you can look into the documentation to see if there is a parameter like that.

outer girder Sep 26, 2021, 12:38 PM

#

i'm struggling to create an array, is it possible to create one between 2 aranges?

#

#

its not creating the list as intented

#

is it because they are different sizes?

serene scaffold Sep 26, 2021, 12:58 PM

#

@outer girder if A and B are different shapes then that won't work, I don't think.

#

Arrays have to be "rectangular" for whatever number of dimensions they have.

outer girder Sep 26, 2021, 12:59 PM

#

its my first week in coding

serene scaffold Sep 26, 2021, 12:59 PM

#

That's okay! We can help you

#

What are you trying to do exactly?

outer girder Sep 26, 2021, 12:59 PM

#

the program im trying to write is something like this

#

Write a program that prints a table converting Fahrenheit degrees to degrees
Celsius. The values must be calculated from 5th to 5th and the maximum and minimum limits must
be chosen by the user.

serene scaffold Sep 26, 2021, 1:00 PM

#

Why are you using numpy? Is this for a data science class?

outer girder Sep 26, 2021, 1:00 PM

#

we have been using numpy since week one, but im pretty sure im not "forced" to use it

#

i just dont know anything else besides it xD

serene scaffold Sep 26, 2021, 1:01 PM

#

Numpy encourages you to think about your data differently than general python usage

#

If this isn't for a data science class then I wouldn't use it

outer girder Sep 26, 2021, 1:02 PM

#

the class is called " Computation for Geologists" (which is my field)" xD

#

so i have no clue if its considered data science or not

serene scaffold Sep 26, 2021, 1:02 PM

#

Then numpy would probably help you

outer girder Sep 26, 2021, 1:03 PM

#

so do you have any clue where i went wrong with the code?

#

and where i could change the array

ripe forge Sep 26, 2021, 1:03 PM

#

sidenote, (havent seen the core question yet) i think you should learn both the builtin datatypes, and numpy, and learn when to use which.

serene scaffold Sep 26, 2021, 1:04 PM

#

I don't know what you mean by "from 5th to 5th"

ripe forge Sep 26, 2021, 1:04 PM

#

that will help you out in the long run

outer girder Sep 26, 2021, 1:04 PM

#

its like 0ºc to 5ºc to 10ºc

outer girder Sep 26, 2021, 1:04 PM

#

ripe forge sidenote, (havent seen the core question yet) i think you should learn both the ...

i'll look into it!

serene scaffold Sep 26, 2021, 1:05 PM

#

If you're making a table then I would use pandas

#

I'll be back soon, hopefully.

ripe forge Sep 26, 2021, 1:05 PM

#

oh nah nah, i think all that is overkill

velvet thorn Sep 26, 2021, 1:05 PM

#

serene scaffold If you're making a *table* then I would use pandas

I would prefer a hand saw, myself 😔

ripe forge Sep 26, 2021, 1:06 PM

#

to me it sounds like a simple task trying to teach you loops

ripe forge Sep 26, 2021, 1:06 PM

#

velvet thorn I would prefer a hand saw, myself 😔

no! oh gosh

#

lol

serene scaffold Sep 26, 2021, 1:06 PM

#

ripe forge to me it sounds like a simple task trying to teach you loops

I don't think we can infer the instructors intentions based on what we know

ripe forge Sep 26, 2021, 1:06 PM

#

to me, sounds like this assignment is trying to teach range

#

the instructions are like this: print stuff from blah1 to blah2 in increments of 5

serene scaffold Sep 26, 2021, 1:06 PM

#

They introduced numpy in week one 🤷‍♂️

ripe forge Sep 26, 2021, 1:07 PM

#

i.. uh... touche.

outer girder Sep 26, 2021, 1:07 PM

#

ripe forge the instructions are like this: print stuff from blah1 to blah2 in increments of...

i think its this yeah

#

something like that atleast

#

i'm gonna try to learn range and come back with the results xD

feral patrol Sep 26, 2021, 2:48 PM

#

I am using StringIndexer (python 2.7) and I am trying to understand if all the information is being put on the on the driver. I am looking at the spark.apache documentation for my version, but I cannot decide where the data is taken from.
This is not the firs time that I have this doubs, is there a way/place to see this easier?

tacit basin Sep 26, 2021, 2:59 PM

#

celest light Haven't actually used K Medoids so I don't know. Maybe you can look into the doc...

Thank you. I have read the docs but it's not clear to me. Possibly not possible without hacking the library.

old grove Sep 26, 2021, 3:20 PM

#

Hyperparemeter tuning is decreasing accuracy....any idea what could be the problem ?

tough bolt Sep 26, 2021, 3:22 PM

#

Is anyone here familiar with NetworkX?

#

I'm not sure where or how to correctly set the relationship between nodes

#

e.g.
the default graph

#

where would I define the distance between 3 and 2

or 2 and 1?

median fulcrum Sep 26, 2021, 5:02 PM

#

anyone that is familiar with spacy?

#

I need some help, tried various help channels and servers but seems that don't have a lot of people with knowledge on this lib

serene scaffold Sep 26, 2021, 5:03 PM

#

median fulcrum I need some help, tried various help channels and servers but seems that don't h...

I've contributed to spacy, but I have to know your specific question before I know if I can answer it.

median fulcrum Sep 26, 2021, 5:06 PM

#

serene scaffold I've contributed to spacy, but I have to know your specific question before I kn...

with spacy 3.0 i'm very confused in some codes that in the past works pretty fine and now it's confusing to understand. I search for the error in stackoverflow but the cases was different from mine.

model = spacy.blank("en")

categories = model.add_pipe("textcat")

categories.add_label("Happy")
categories.add_label("Scared")

model.add_pipe(categories)

historic = []

I need to transform this code

#

I think this part is the problem

serene scaffold Sep 26, 2021, 5:06 PM

#

median fulcrum with spacy 3.0 i'm very confused in some codes that in the past works pretty fin...

can you copy and paste the error message that you got as text?

median fulcrum Sep 26, 2021, 5:06 PM

#

serene scaffold can you copy and paste the error message that you got as text?

sure

#

ValueError: [E966] nlp.add_pipe now takes the string name of the registered component factory, not a callable component. Expected string, but got <spacy.pipeline.textcat.TextCategorizer object at 0x7fc953d7def0> (name: 'None').

serene scaffold Sep 26, 2021, 5:07 PM

#

alright, let me see.

median fulcrum Sep 26, 2021, 5:07 PM

#

this is very common while trying to apply code that works in older versions of spacy, but the cases are different

serene scaffold Sep 26, 2021, 5:10 PM

#

median fulcrum this is very common while trying to apply code that works in older versions of s...

model.add_pipe("textcat") returns the component, and my impression is that add_label mutates the component in-place. so I suspect that your second call to add_pipe is unnecessary.

median fulcrum Sep 26, 2021, 5:11 PM

#

serene scaffold `model.add_pipe("textcat")` returns the component, and my impression is that `ad...

but, if I don't put the second call how the Happy and Scared gonna stay in model?

serene scaffold Sep 26, 2021, 5:11 PM

#

https://spacy.io/api/pipe#add_label

TrainablePipe

TrainablePipe · spaCy API Documentation

Base class for trainable pipeline components

serene scaffold Sep 26, 2021, 5:12 PM

#

median fulcrum but, if I don't put the second call how the Happy and Scared gonna stay in model...

add_label puts them in the model, so to speak, yes?

median fulcrum Sep 26, 2021, 5:12 PM

#

serene scaffold `add_label` puts them in the model, so to speak, yes?

add_label isn't putting in categories?

#

oh

#

sorry

serene scaffold Sep 26, 2021, 5:13 PM

#

just try deleting model.add_pipe(categories) and see if it works.

median fulcrum Sep 26, 2021, 5:13 PM

#

serene scaffold just try deleting `model.add_pipe(categories)` and see if it works.

ok

#

works

serene scaffold Sep 26, 2021, 5:13 PM

#

🔥 happypeepo

median fulcrum Sep 26, 2021, 5:14 PM

#

serene scaffold 🔥 <:happypeepo:716774967647273011>

I think I messed up because I put categories = model.create_pipe("textcat") in the first time

#

probably model.add_pipe(categories) was necessary in this case

median fulcrum Sep 26, 2021, 5:14 PM

#

median fulcrum I think I messed up because I put `categories = model.create_pipe("textcat")` in...

but this don't work in spacy 3.0 now

#

:/

#

oh no

#

ValueError: [E989] nlp.update() was called with two positional arguments. This may be due to a backwards-incompatible change to the format of the training data in spaCy 3.0 onwards. The 'update' function should now be called with a batch of Example objects, instead of (text, annotation) tuples.

#

spacy 3.0 please don't

#

😫

#

@serene scaffold what means 'with a batch of Example objects'?

serene scaffold Sep 26, 2021, 5:23 PM

#

median fulcrum <@!253696366952316929> what means 'with a batch of Example objects'?

I'm not really sure

#

some spaCy contributor I am, I know

median fulcrum Sep 26, 2021, 5:24 PM

#

serene scaffold some spaCy contributor I am, I know

sad

median fulcrum Sep 26, 2021, 6:03 PM

#

from spacy.training.example import Example

model.begin_training()


for epoch in range(1000):
  random.shuffle(final_data_base)
  losses = {}
  for batch in spacy.util.minibatch(final_data_base, size=30):
    texts = [model(text) for text, entities in batch]
    annotations = [{'cats': entities} for text, entities in batch]
    example = Example.from_dict(texts, annotations)
    model.update([example], losses=losses)
  if epoch % 100 == 0:
    print(losses)
    historic.append(losses)

#

my code is that

#

any idea @serene scaffold

#

?

serene scaffold Sep 26, 2021, 6:37 PM

#

median fulcrum any idea <@!253696366952316929>

any idea about what? I would fix the indentation, in either case, so that it's four spaces.

median fulcrum Sep 26, 2021, 6:38 PM

#

serene scaffold any idea about what? I would fix the indentation, in either case, so that it's f...

about the error

serene scaffold Sep 26, 2021, 6:38 PM

#

median fulcrum about the error

if you get an error, always copy/paste the text of the whole error.

median fulcrum Sep 26, 2021, 6:39 PM

#

median fulcrum ValueError: [E989] `nlp.update()` was called with two positional arguments. This...

@serene scaffold

median fulcrum Sep 26, 2021, 6:40 PM

#

median fulcrum ```Python from spacy.training.example import Example model.begin_training() f...

after that I do this code

serene scaffold Sep 26, 2021, 6:41 PM

#

median fulcrum <@!253696366952316929>

!traceback

arctic wedgeBOT Sep 26, 2021, 6:41 PM

#

Please provide a full traceback to your exception in order for us to identify your issue.

A full traceback could look like:

Traceback (most recent call last):
    File "tiny", line 3, in
        do_something()
    File "tiny", line 2, in do_something
        a = 6 / 0
ZeroDivisionError: integer division or modulo by zero

The best way to read your traceback is bottom to top.

• Identify the exception raised (e.g. ZeroDivisionError)
• Make note of the line number, and navigate there in your program.
• Try to understand why the error occurred.

To read more about exceptions and errors, please refer to the PyDis Wiki or the official Python tutorial.

solid raptor Sep 26, 2021, 7:37 PM

#

Ok so i was trying to create dummy variables of large catogorical data
and this is what i got
train["Patient-Uid"] = pd.get_dummies(train["Patient-Uid"],drop_first=True)
Error:
MemoryError: Unable to allocate 81.1 GiB for an array with shape (3220868, 27033) and data type uint8

#

🙂

lofty plover Sep 26, 2021, 7:40 PM

#

Hey, I've been asked to calculate a weighted average for binned data (basically a histogram) using the method mentioned here:
https://stats.stackexchange.com/questions/531794/how-to-calculate-the-mean-from-bin-endpoints-and-frequencies
However, the final bin in my data doesn't have an endpoint for me to calculate a midpoint from.

Cross Validated

How to calculate the mean from bin endpoints and frequencies?

Sometimes data extracted from reports do not have individual values, like 4, 23, 43, but grouped together like this:
income level
people in this group
10k to 20k
44
20k to 40k
240
40k to 80k
40...

solid raptor Sep 26, 2021, 7:42 PM

#

solid raptor Ok so i was trying to create dummy variables of large catogorical data and this ...

how should i tackle this

minor geyser Sep 26, 2021, 7:56 PM

#

Whats the best way to learn datascience (maybe from a begginer standpoint of python)

azure marsh Sep 26, 2021, 8:38 PM

#

solid raptor Ok so i was trying to create dummy variables of large catogorical data and this ...

Sounds like you need a more compressed representation of your data (e.g. embedding) or shard / split up your dataset

wary phoenix Sep 26, 2021, 8:39 PM

#

from random import randint

class Character:
def init(self):
self.name = ""
self.health = 1
self.health_max = 1
def do_damage(self, enemy):
damage = min(
max(randint(0, self.health) - randint(0, enemy.health), 0),
enemy.health)
enemy.health = enemy.health - damage
if damage == 0: print "%s evades %s's attack." % (enemy.name, self.name)
else: print "%s hurts %s!" % (self.name, enemy.name)
return enemy.health <= 0

class Enemy(Character):
def init(self, player):
Character.init(self)
self.name = 'a goblin'
self.health = randint(1, player.health)

class Player(Character):
def init(self):
Character.init(self)
self.state = 'normal'
self.health = 10
self.health_max = 10
def dfffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

azure marsh Sep 26, 2021, 8:42 PM

#

Is that a very long datafffframe?

zealous hinge Sep 26, 2021, 8:45 PM

#

it's a long sigh of exasperation.

serene scaffold Sep 26, 2021, 9:33 PM

#

wary phoenix from random import randint class Character: def __init__(self): self.nam...

I'm not sure I understand the relevance of this

lapis sequoia Sep 26, 2021, 10:10 PM

#

#

Does anyone know how I can create this?

#

I made an example of it using this

#

ggplot(data = mpg, aes(y = hwy, c = drv)) +
geom_boxplot(fill = 'darkgreen')

#

but they want me to create a ggplot box plot that show the distribution of Calories in the following Starbucks beverages:

Classic Espresso Drinks
Frappuccino® Blended Coffee
Shaken Iced Beverages

#

anyone knows how to do that it would help me a lot.

serene scaffold Sep 26, 2021, 10:28 PM

#

I was fired from Starbucks today.
So triggered rn.

serene scaffold Sep 26, 2021, 10:28 PM

#

lapis sequoia ```my_col <- c('darkgreen', 'darkred', 'red') ggplot(data = mpg, aes(y = hwy, c ...

Is this R?

zealous hinge Sep 26, 2021, 10:29 PM

#

😦

serene scaffold Sep 26, 2021, 10:29 PM

#

zealous hinge 😦

I haven't shown up to work since January so I'm surprised it took this long.

lapis sequoia Sep 26, 2021, 10:34 PM

#

serene scaffold Is this R?

Yep

serene scaffold Sep 26, 2021, 10:34 PM

#

lapis sequoia Yep

Are you trying to do it in python

lapis sequoia Sep 26, 2021, 10:35 PM

#

serene scaffold Are you trying to do it in python

I have to do both ways

#

R and python

serene scaffold Sep 26, 2021, 10:35 PM

#

lapis sequoia I have to do both ways

why?

lapis sequoia Sep 26, 2021, 10:35 PM

#

I don’t know😹

#

Teacher wants it so

serene scaffold Sep 26, 2021, 10:36 PM

#

Uh okay

#

Well I guess make the dataframe in python first

lapis sequoia Sep 26, 2021, 10:36 PM

#

Yea idk how to do that for just the three drinks

serene scaffold Sep 26, 2021, 10:36 PM

#

Which three?

lapis sequoia Sep 26, 2021, 10:37 PM

#

Classic Espresso Drinks
Frappuccino® Blended Coffee
Shaken Iced Beverages

#

I have a table but idk how to do it

serene scaffold Sep 26, 2021, 10:37 PM

#

Those are categories of drinks

#

Trust me, I worked there from January 2016 until today, coincidentally.

#

Anyway, you can use loc

#

And this

#

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.boxplot.html

#

I can't be much more helpful at the moment as I'm on my phone. I might remember to check on you later @lapis sequoia

lapis sequoia Sep 26, 2021, 10:40 PM

#

serene scaffold I can't be much more helpful at the moment as I'm on my phone. I might remember ...

Like I have a table but is long

serene scaffold Sep 26, 2021, 10:41 PM

#

That's fine

lapis sequoia Sep 26, 2021, 10:41 PM

#

And Idk how to make it as a code

serene scaffold Sep 26, 2021, 10:41 PM

#

lapis sequoia And Idk how to make it as a code

What is the data in? A csv?

lapis sequoia Sep 26, 2021, 10:42 PM

#

serene scaffold What is the data in? A csv?

Yes

#

Starbucks.csv

serene scaffold Sep 26, 2021, 10:42 PM

#

import pandas as pd

#

Start with that

#

And then use pd.read_csv

lapis sequoia Sep 26, 2021, 10:42 PM

#

This is what I have

serene scaffold Sep 26, 2021, 10:42 PM

#

Yay

lapis sequoia Sep 26, 2021, 10:43 PM

#

#

That’s what she gave me

serene scaffold Sep 26, 2021, 10:43 PM

#

This is R. I can't help with that.

lapis sequoia Sep 26, 2021, 10:43 PM

#

Damn all good thx tho

#

I did python I’m having trouble wit R

#

Do you know any discord server that can help me with it?

serene scaffold Sep 26, 2021, 10:44 PM

#

Let me see

#

@lapis sequoia if you join this server be sure to check their rules about asking questions https://discord.gg/PD8YMNKB

lapis sequoia Sep 26, 2021, 10:46 PM

#

Thanks

main fox Sep 26, 2021, 11:19 PM

#

serene scaffold I was fired from Starbucks today. So triggered rn.

What happened?

azure marsh Sep 26, 2021, 11:40 PM

#

serene scaffold I haven't shown up to work since January so I'm surprised it took this long.

Then why are you triggered?

serene scaffold Sep 26, 2021, 11:46 PM

#

main fox What happened?

I told them I had to move "because of covid" and that I would come back "soon" and then ghosted them.

serene scaffold Sep 26, 2021, 11:46 PM

#

azure marsh Then why are you triggered?

I was getting free stuff just for being an employee-on-paper.

main fox Sep 26, 2021, 11:47 PM

#

Lol I hope it doesn't affect any future job search you may have

serene scaffold Sep 27, 2021, 12:32 AM

#

main fox Lol I hope it doesn't affect any future job search you may have

my current job doesn't care and no subsequent jobs are going to care about jobs I had before this one.

umbral skiff Sep 27, 2021, 12:47 AM

#

I have a code that extracts data from html pages and makes some filters until generating this list of dictionaries. I want the "header" information to be the header of a CSV file, but I don't know how to do this correctly. Does anyone have a tip?

I want the file with these columns:

"Matrícula","Referência","Vínculo","Servidor","Cargo","CPF","Lotação","Remuneração","Abono","Eventuais","Desconto","Salário Líquido"

def filtrador():
  informacoes = []
  for item in raspador():
    header = item[1].strip("</td>")
    info = item[2].split("</td>")
    informacoes.append({header: info[0]})
  return informacoes
    
filtrador()


[output]

[{'Matrícula': '00101105'},
 {'Referência': '09 / 2021'},
 {'Vínculo': 'CARGO COMISSIONADO'},
 {'Servidor': 'DOUGLAS HENRIQUE SANTOS'},
 {'Cargo': 'SECRETARIO PARLAMENTAR'},
 {'CPF': '***42475'},
 {'Lotação': 'COMISSIONADO - GABINETE'},
 {'Remuneração': 'R$ 4.800,00'},
 {'Abono': 'R$ 0,00'},
 {'Eventuais': 'R$ 0,00'},
 {'Desconto': 'R$ 849,42'},
 {'Salário Líquido': 'R$ 3.950,58'},
 {'Matrícula': '00092175'},
 {'Referência': '09 / 2021'},
 {'Vínculo': 'CARGO COMISSIONADO'},
 {'Servidor': 'DULCEANA PALMEIRA DE SA'},
 {'Cargo': 'CHEFE DE GABINETE 2oSECRETARIO'},
 {'CPF': '***31400'},
 {'Lotação': 'MESA'},
 {'Remuneração': 'R$ 9.100,00'},
 {'Abono': 'R$ 0,00'},
 {'Eventuais': 'R$ 0,00'},
 {'Desconto': 'R$ 2.178,33'},
 {'Salário Líquido': 'R$ 6.921,67'},
 {'Matrícula': '00092182'},
 {'Referência': '09 / 2021'},
 {'Vínculo': 'CARGO COMISSIONADO'},
 {'Servidor': 'EDIJANE ALVES SANTOS SILVA'},
 {'Cargo': 'CARGOS DE NATUREZA ESPECIAL'},
 {'CPF': '***14404'},
 {'Lotação': 'MESA'},
 {'Remuneração': 'R$ 2.250,00'},
 {'Abono': 'R$ 0,00'},
 {'Eventuais': 'R$ 0,00'},
 {'Desconto': 'R$ 199,30'},
 {'Salário Líquido': 'R$ 2.050,70'}]

serene scaffold Sep 27, 2021, 1:23 AM

#

umbral skiff I have a code that extracts data from html pages and makes some filters until ge...

why is your output a list of dicts with one key-value pair each? this seems like a bad data model.

umbral skiff Sep 27, 2021, 1:53 AM

#

serene scaffold why is your output a list of dicts with one key-value pair each? this seems like...

That really could be it. Thanks!

boreal loom Sep 27, 2021, 1:59 AM

#

Any ideas on how to optimize this piece of code?

#

dataframe_tokenized_speech_Only = all_data_tokenized_FreqDist_df[["Speech"]]
dataframe_tokenized_speech_Only

for country_year_vector, Speech_dictionary in tqdm(dataframe_tokenized_speech_Only.iterrows()):
    country =(country_year_vector[0])
    year = country_year_vector[1]
    
    for key,value in Speech_dictionary["Speech"].items():
        if key in dataframe_tokenized_speech_Only.columns:
            dataframe_tokenized_speech_Only.loc[country, year][key] = value
        else:
            dataframe_tokenized_speech_Only[key] =0
            dataframe_tokenized_speech_Only.loc[country, year][key] = value

serene scaffold Sep 27, 2021, 2:02 AM

#

boreal loom ```py dataframe_tokenized_speech_Only = all_data_tokenized_FreqDist_df[["Speech"...

!code

arctic wedgeBOT Sep 27, 2021, 2:02 AM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

boreal loom Sep 27, 2021, 2:02 AM

#

serene scaffold !code

Thanks for the heads up

serene scaffold Sep 27, 2021, 2:02 AM

#

Thanks

boreal loom Sep 27, 2021, 2:02 AM

#

I am looking into numba, is that a thing?

serene scaffold Sep 27, 2021, 2:03 AM

#

But what are you trying to do?

#

You want to avoid for loops as much as possible.

boreal loom Sep 27, 2021, 2:03 AM

#

Yeah

#

Unfortunately the dataframe, has a nested dictionary

serene scaffold Sep 27, 2021, 2:03 AM

#

Why

boreal loom Sep 27, 2021, 2:03 AM

#

Unlucky

#

What can i say

#

Someone from the preprocessing gave it like this

serene scaffold Sep 27, 2021, 2:04 AM

#

I would straighten that out before trying to work around it.

serene scaffold Sep 27, 2021, 2:04 AM

#

boreal loom Someone from the preprocessing gave it like this

Fire them.

austere swift Sep 27, 2021, 2:04 AM

#

boreal loom I am looking into numba, is that a thing?

numba works best on numpy operations, it doesn't work well with dataframes

boreal loom Sep 27, 2021, 2:04 AM

#

Numba seems to like for loops

austere swift Sep 27, 2021, 2:05 AM

#

boreal loom Numba seems to like for loops

yes, on numpy operations

serene scaffold Sep 27, 2021, 2:05 AM

#

Regardless, if one column of a dataframe contains dicts, make a separate data frame with the same index and expand the dicts into columns.

boreal loom Sep 27, 2021, 2:06 AM

#

Yeah that would make sense, but the dictionary is different in every row

#

Tried that approach too

serene scaffold Sep 27, 2021, 2:06 AM

#

Different sets of keys?

boreal loom Sep 27, 2021, 2:06 AM

#

Yeppity Yippity

serene scaffold Sep 27, 2021, 2:06 AM

#

Whatever data they contain, think of how you would structure it in a database

boreal loom Sep 27, 2021, 2:08 AM

#

I will try to talk them into not giving me that monstrosity

serene scaffold Sep 27, 2021, 2:11 AM

#

Also your variable names are quite long

#

But I'll let you do the cost benefit analysis on that.

azure marsh Sep 27, 2021, 2:17 AM

#

You might be able to do some kind of join on the two instead of your double for loops. The intersecting columns will be the same for each row

#

Agree on the variable names, it's generally recommended to not include the type in the variable name anymore (aka Systems hungarian notation) as modern tools make it easy to ascertain the type. EDIT: correction on specific type of hungarian

#

and it just takes up space, making it harder to understand what's going on. It's certainly better than too short names, though.

#

A good read is the chapter "Meaningful Names" in the book Clean Code

velvet thorn Sep 27, 2021, 2:23 AM

#

azure marsh Agree on the variable names, it's generally recommended to not include the type ...

okay I gotta say this

#

that's not what Hungarian notation was originally meant to be

#

"type" was meant in the business case sense, not the formal type sense

azure marsh Sep 27, 2021, 2:24 AM

#

I know that is not what it was originally meant to be

velvet thorn Sep 27, 2021, 2:24 AM

#

azure marsh A good read is the chapter "Meaningful Names" in the book Clean Code

which dovetails with this, essentially

azure marsh Sep 27, 2021, 2:24 AM

#

but the end result in the code is similar

velvet thorn Sep 27, 2021, 2:24 AM

#

azure marsh but the end result in the code is similar

how is it similar?

azure marsh Sep 27, 2021, 2:25 AM

#

The type of the structure is in the variable name

velvet thorn Sep 27, 2021, 2:25 AM

#

azure marsh The type of the structure is in the variable name

what I am saying is

#

if you do it as it was meant to be, it isn't

#

(unless you have, like, refinement types, but most languages don't)

azure marsh Sep 27, 2021, 2:26 AM

#

Not really clear on what you're getting at in relation to their code, in general it adds unnecessary verbiage

#

Do you disagree that they should remove "dataframe_" as a prefix?

velvet thorn Sep 27, 2021, 2:27 AM

#

azure marsh Do you disagree that they should remove "dataframe_" as a prefix?

no, I don't

azure marsh Sep 27, 2021, 2:27 AM

#

That's my only point.

velvet thorn Sep 27, 2021, 2:28 AM

#

azure marsh That's my only point.

great, and mine is that doing so is one variant of what is called Hungarian notation.

azure marsh Sep 27, 2021, 2:29 AM

#

Ok, no disagreements there.

lilac hull Sep 27, 2021, 5:24 AM

#

so.. idk if this is the right place to ask, but i recently made an object detection project using opencv. it works fine, except for the fact that it detects things like chairs as toilets, and spectacles as scissors. is there a way to make the model better?

onyx drum Sep 27, 2021, 5:40 AM

#

Just used np.savetxt() to store a huge python array into a text file on disk. However, this spiked up my RAM a lot and I didn't store this np.savetxt() into a specific variable to be able to delete the array

Where is this RAM held and how do I release it? (I can't reset the Jupyter notebook because it takes a looong time to re-simulate my stuff)

lapis sequoia Sep 27, 2021, 7:18 AM

#

onyx drum Just used np.savetxt() to store a huge python array into a text file on disk. Ho...

we can directly tell gc to flush things out. it helped me once(i had similar issue with np)

lapis sequoia Sep 27, 2021, 7:19 AM

#

onyx drum Just used np.savetxt() to store a huge python array into a text file on disk. Ho...

so i just deleted the array (del arr) and then called gc function

#

gc.collect()

lone drum Sep 27, 2021, 9:14 AM

#

Hello
I am using resample function of pandas .
I have tick by tick data
I want that data in minutewise

#

Can anyone look into this?

royal crest Sep 27, 2021, 9:20 AM

#

see https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html for the description on what level means and what it does, and see https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#aggregation for example use

lone drum Sep 27, 2021, 9:21 AM

#

My code

for i, chunk in enumerate(pd.read_csv(f'{path}{file_name}{extension}' , engine='python',  chunksize=250000 , iterator=True,  names = ['Msgtype', 'Activity Type', 'Transaction Time', 'script_name', 'expiry', 'strike_price', 'call/put', 'Exchange', 'Token', 'Buy/Sell', 'Buy Order number', 'Sell order number', 'Price', 'qty', 'price_in_rupees', 'Lot'])) :
    chunk['Transaction Time'] = pd.to_datetime(chunk['Transaction Time'], errors='coerce')    
    
    chunk = pd.DataFrame(chunk).set_index('Transaction Time')
    print('chunk1...')
    print(chunk)
    print()
        
    chunk2 = chunk.resample('T')['price_in_rupees'].agg(['first', 'max', 'min', 'last']).set_axis(['Open', 'High', 'Low', 'Close'],axis=1)
    chunk2 = chunk2[chunk2.Close > 0]
    
    print('chunk2...')
    print(chunk2)
    print()
    
    chunk2.to_csv(f'{new_path}{output_file_name}{extension}',  mode= 'a', header=None)

royal crest Sep 27, 2021, 9:22 AM

#

use the level arg

#

please check out the links i've attached.

lone drum Sep 27, 2021, 9:22 AM

#

royal crest use the `level` arg

U mean level=m

royal crest Sep 27, 2021, 9:23 AM

#

If that works, then yes

#

though it should be a str or an int as per the documentation

#

just be wary

#

it also must be datetime-like.

lone drum Sep 27, 2021, 9:24 AM

#

See my data this way

royal crest Sep 27, 2021, 9:25 AM

#

Please don't ping me either directly or indirectly, I am right here.

lone drum Sep 27, 2021, 9:26 AM

#

I am getting

Traceback (most recent call last):

  File "E:\python files\resample_practice.py", line 26, in <module>
    chunk2 = chunk.resample('T', level = 'm')['price_in_rupees'].agg(['first', 'max', 'min', 'last']).set_axis(['Open', 'High', 'Low', 'Close'],axis=1)

  File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\core\generic.py", line 8369, in resample
    return get_resampler(

  File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\core\resample.py", line 1311, in get_resampler
    return tg._get_resampler(obj, kind=kind)

  File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\core\resample.py", line 1466, in _get_resampler
    self._set_grouper(obj)

  File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\core\groupby\grouper.py", line 381, in _set_grouper
    raise ValueError(f"The level {level} is not valid")

ValueError: The level m is not valid

#

Above error

#

Ping me when replying

royal crest Sep 27, 2021, 9:28 AM

#

Have you checked out the links I have attached?

#

One of them is a comprehensive user guide.

lone drum Sep 27, 2021, 9:28 AM

#

Which link?

royal crest Sep 27, 2021, 9:29 AM

#

royal crest see https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html ...

Here

lone drum Sep 27, 2021, 9:31 AM

#

royal crest see https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html ...

I have hone through this first link but i do not get my expected output

royal crest Sep 27, 2021, 9:32 AM

#

royal crest Please don't ping me either directly or indirectly, I am right here.

Please respect my request.

#

What have you tried?

lone drum Sep 27, 2021, 9:33 AM

#

I tried minutewise example vivek in doc

#

But not get expected output

royal crest Sep 27, 2021, 9:33 AM

#

Could you share the part of the code where you've made changes?

#

What is the expected output and what is the output you are getting?

lone drum Sep 27, 2021, 9:36 AM

#

I tried chunk.resample('1T')
But not worked as u can see abov ss u can see i am getting same output

#

I am expecting
Open, high, low , close columns for each time

#

09:15
09:16
09:17
...
15:30
``` this way

#

Do u get my point?

royal crest Sep 27, 2021, 9:41 AM

#

I think it's tricky for me to explain without the data in hand. Would you mind sharing the csv of the data?

lone drum Sep 27, 2021, 9:43 AM

#

royal crest I think it's tricky for me to explain without the data in hand. Would you mind s...

Can I provide u ss of data i have
Because CSV file is too big
So u can make dummy CSV file in your system by seeing data in ss?

tender hearth Sep 27, 2021, 9:43 AM

#

What do you guys think about PyTorch Lightning

royal crest Sep 27, 2021, 9:44 AM

#

maybe chop it to the first 1000 columns?

lone drum Sep 27, 2021, 9:44 AM

#

royal crest maybe chop it to the first 1000 columns?

Can I dm u ?

royal crest Sep 27, 2021, 9:45 AM

#

DMs are reserved for discord friends, sorry.

lone drum Sep 27, 2021, 9:46 AM

#

Can I provide u ss of data
Can u please make dummy CSV from it

royal crest Sep 27, 2021, 9:48 AM

#

What's SS?

#

blob_sweat

#

If it's a link to the data, sure

lone drum Sep 27, 2021, 9:53 AM

#

Screenshot of data

lilac hull Sep 27, 2021, 10:05 AM

#

lilac hull so.. idk if this is the right place to ask, but i recently made an object detect...

????

#

pls help

#

im using the coco algorithm

rigid zodiac Sep 27, 2021, 1:15 PM

#

Hi Everyone, I have a quick question. I keep getting this error, how can i fix it

random nest Sep 27, 2021, 1:51 PM

#

lilac hull so.. idk if this is the right place to ask, but i recently made an object detect...

Sounds like some issue with the training data

lilac hull Sep 27, 2021, 1:51 PM

#

random nest Sounds like some issue with the training data

hmm i'll try replacing that and check

random nest Sep 27, 2021, 1:52 PM

#

It’s a funny issue tbh

lilac hull Sep 27, 2021, 1:52 PM

#

yeah lol

serene scaffold Sep 27, 2021, 2:01 PM

#

rigid zodiac Hi Everyone, I have a quick question. I keep getting this error, how can i fix i...

look at the data types of df after you do the rename.

rigid zodiac Sep 27, 2021, 2:01 PM

#

serene scaffold look at the data types of `df` after you do the rename.

no bad suggestion. i will see it

dull turtle Sep 27, 2021, 2:08 PM

#

hello my data this way python Activity Type script_name ... price_in_rupees Lot Transaction Time ... 2011-08-28 09:15:02.006138097 N BANKNIFTY ... 4734.30 47.0 2011-08-28 09:15:02.707897899 N BANKNIFTY ... 2555.95 47.0 2011-08-28 09:15:03.373856246 N BANKNIFTY ... 2556.00 20.0 2011-08-28 09:15:04.159525439 N BANKNIFTY ... 6071.85 47.0 2011-08-28 09:15:05.213452151 M BANKNIFTY ... 2556.05 47.0 ... ... ... ... ... 2011-08-28 09:15:20.175758062 N BANKNIFTY ... 125.00 1.0 2011-08-28 09:15:20.175804372 M BANKNIFTY ... 149.10 1.0 2011-08-28 09:15:20.176193109 M BANKNIFTY ... 148.60 1.0 2011-08-28 09:15:20.176239215 M BANKNIFTY ... 150.70 8.0 2011-08-28 09:15:20.176248648 M BANKNIFTY ... 150.90 8.0 this way

#

i want to above tick by tick data converted into opne, high, low, close columns

serene scaffold Sep 27, 2021, 2:09 PM

#

dull turtle hello my data this way ```python Activity Type scr...

thanks for sharing the data. using print(df.to_string()) would make sure that no columns are left out. there's no way for us to know how many columns the ... represents.

dull turtle Sep 27, 2021, 2:10 PM

#

my code https://paste.pythondiscord.com/nuvukilore.py here

serene scaffold Sep 27, 2021, 2:13 PM

#

I'm still not clear on what you are trying to do.

dull turtle Sep 27, 2021, 2:14 PM

#

serene scaffold thanks for sharing the data. using `print(df.to_string())` would make sure that ...

Index(['Activity Type', 'script_name', 'expiry', 'strike_price', 'call/put', 'Exchange', 'Token', 'Buy/Sell', 'Buy Order number', 'Sell order number', 'Price', 'qty', 'price_in_rupees', 'Lot'], dtype='object')

serene scaffold Sep 27, 2021, 2:15 PM

#

dull turtle `Index(['Activity Type', 'script_name', 'expiry', 'strike_price', 'call/put', ...

this does not help, unfortunately. why don't you do print(df.head().to_csv())?

dull turtle Sep 27, 2021, 2:15 PM

#

i have stock market tick by tick data. I want to convert that data in open, high, low, close columns

serene scaffold Sep 27, 2021, 2:16 PM

#

alright. let me know when you've provided enough data for me to solve this.

dull turtle Sep 27, 2021, 2:16 PM

#

serene scaffold this does not help, unfortunately. why don't you do `print(df.head().to_csv())`?

what this will do ?

serene scaffold Sep 27, 2021, 2:16 PM

#

dull turtle what this will do ?

print out the data without any missing columns.

#

while we're at it, try print(df.head(30).to_csv()) and put it in the paste bin

#

!paste

arctic wedgeBOT Sep 27, 2021, 2:17 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

dull turtle Sep 27, 2021, 2:17 PM

#

serene scaffold this does not help, unfortunately. why don't you do `print(df.head().to_csv())`?

Transaction Time,Activity Type,script_name,expiry,strike_price,call/put,Exchange,Token,Buy/Sell,Buy Order number,Sell order number,Price,qty,price_in_rupees,Lot
2011-08-28 09:15:02.006138097,N,BANKNIFTY,2021-09-02,31500.0,CE,NSEFO,39203,SELL,0,1400000000013113,473430,1175,4734.3,47.0
2011-08-28 09:15:02.707897899,N,BANKNIFTY,2021-09-02,31500.0,CE,NSEFO,39203,BUY,1400000000019595,0,255595,1175,2555.95,47.0
2011-08-28 09:15:03.373856246,N,BANKNIFTY,2021-09-02,31500.0,CE,NSEFO,39203,BUY,1400000000027793,0,255600,500,2556.0,20.0
2011-08-28 09:15:04.159525439,N,BANKNIFTY,2021-09-02,31500.0,CE,NSEFO,39203,SELL,0,1400000000034501,607185,1175,6071.85,47.0
2011-08-28 09:15:05.213452151,M,BANKNIFTY,2021-09-02,31500.0,CE,NSEFO,39203,BUY,1400000000019595,0,255605,1175,2556.05,47.0```

#

https://paste.pythondiscord.com/oxuzecahak.apache

serene scaffold Sep 27, 2021, 2:18 PM

#

dull turtle ```python Transaction Time,Activity Type,script_name,expiry,strike_price,call/pu...

I need enough rows to cover two days.

#

please put that in the paste bin when you have it.

dull turtle Sep 27, 2021, 2:18 PM

#

serene scaffold I need enough rows to cover two days.

okay can i share u csv ?

serene scaffold Sep 27, 2021, 2:19 PM

#

dull turtle okay can i share u csv ?

if you can put the whole csv in the paste bin that's fine

dull turtle Sep 27, 2021, 2:19 PM

#

let me try

serene scaffold Sep 27, 2021, 2:19 PM

#

but I just need enough rows to cover two days

#

please ping me with the URL to the paste bin when you have done this.

dull turtle Sep 27, 2021, 2:20 PM

#

okay

dull turtle Sep 27, 2021, 2:24 PM

#

serene scaffold please ping me with the URL to the paste bin when you have done this.

can u try with this https://paste.pythondiscord.com/banowanequ.apache

serene scaffold Sep 27, 2021, 2:24 PM

#

dull turtle can u try with this https://paste.pythondiscord.com/banowanequ.apache

I asked for rows covering at least two days. These are all from one day. I won't be able to help.

eager heath Sep 27, 2021, 2:25 PM

#

@dull turtle could you give us your full csv file please?

dull turtle Sep 27, 2021, 2:27 PM

#

serene scaffold I asked for rows covering at least two days. These are all from one day. I won't...

see my file is 6.5 gb

#

i am not able to open that csv file

#

so i am giving u data read by python

#

https://paste.pythondiscord.com/ijemerimal.apache plz check here

serene scaffold Sep 27, 2021, 2:29 PM

#

The index of each row is a timestamp and I asked for enough rows that cover two calendar days worth of timestamps. You can even just include something like five rows for two calendar days (for a total of ten rows).

dull turtle Sep 27, 2021, 2:30 PM

#

but csv file too big that i am not able to open it directly

#

can u please add some dummy data to it so rows get incresed

#

can u help me how i can get data for per minute

#

for e.g. i have data in seconds and in microseconds so i want to combine all values to get single one minute data

#

2011-08-28 09:15:02.006138097,N,BANKNIFTY,2021-09-02,31500.0,CE,NSEFO,39203,SELL,0,1400000000013113,473430,1175,4734.3,47.0
2011-08-28 09:15:02.707897899,N,BANKNIFTY,2021-09-02,31500.0,CE,NSEFO,39203,BUY,1400000000019595,0,255595,1175,2555.95,47.0
2011-08-28 09:15:04.159525439,N,BANKNIFTY,2021-09-02,31500.0,CE,NSEFO,39203,SELL,0,1400000000034501,607185,1175,6071.85,47.0
2011-08-28 09:15:05.213452151,M,BANKNIFTY,2021-09-02,31500.0,CE,NSEFO,39203,BUY,1400000000019595,0,255605,1175,2556.05,47.0
2011-08-28 09:15:04.404004891,M,BANKNIFTY,2021-09-02,35700.0,CE,NSEFO,39570,BUY,1400000000035647,0,26885,25,268.85,1.0
2011-08-28 09:15:04.405367275,N,BANKNIFTY,2021-09-02,35700.0,CE,NSEFO,39570,BUY,1400000000036502,0,26875,25,268.75,1.0
2011-08-28 09:15:04.405433392,M,BANKNIFTY,2021-09-02,35700.0,CE,NSEFO,39570,BUY,1400000000035647,0,26915,25,269.15,1.0
2011-08-28 09:15:04.405443075,M,BANKNIFTY,2021-09-02,35700.0,CE,NSEFO,39570,BUY,1400000000032054,0,26660,75,266.6,3.0
2011-08-28 09:15:04.405504048,M,BANKNIFTY,2021-09-02,35700.0,CE,NSEFO,39570,BUY,1400000000036395,0,26920,25,269.2,1.0
2011-08-28 09:15:12.178591633,M,BANKNIFTY,2021-09-02,35600.0,CE,NSEFO,39568,SELL,0,1400000000090331,30555,25,305.55,1.0
2011-08-28 09:15:12.178672232,M,BANKNIFTY,2021-09-02,35600.0,CE,NSEFO,39568,SELL,0,1400000000090552,30550,50,305.5,2.0
2011-08-28 09:15:12.178735441,M,BANKNIFTY,2021-09-02,35600.0,CE,NSEFO,39568,BUY,1400000000002103,0,21515,25,215.15,1.0
2011-08-28 09:15:12.17874251``` my data is this way  so i want only single value ```python
date                  open    high    low    close
28-08-2011 09:15:00    val1   val2    val3    val4
28-08-2011 09:16:00    val1   val2    val3    val4``` this way and so on

#

@eager heath do u get my point what i am trying to do ?

eager heath Sep 27, 2021, 2:36 PM

#

I don't know, I am not a datascience person :D

dull turtle Sep 27, 2021, 2:37 PM

#

dull turtle ```python 2011-08-28 09:15:02.006138097,N,BANKNIFTY,2021-09-02,31500.0,CE,NSEFO,...

@serene scaffold see this way i am trying to do , can u ple look into it ?

eager heath Sep 27, 2021, 2:41 PM

#

I believe Steele had to get back to work, but I'm sure someone will come and he'll you. If not, feel free to ask in an help channel!

arctic wedgeBOT Sep 27, 2021, 2:44 PM

#

Hey @plush leaf!

It looks like you tried to attach file type(s) that we do not allow (.ipynb). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

dull turtle Sep 27, 2021, 2:45 PM

#

just ping me when someone reply

dusty cloud Sep 27, 2021, 2:58 PM

#

Does anyone knows how to extract labels from sklearn's Pipeline object?

#

I tried after fitting the pipeline, which gives error

pipe_kmeans = Pipeline([('clustering', KMeans())])
pipe_kmeans.fit(X)
pipe_kmeans.named_steps['clustering'].labels_   #err

lapis sequoia Sep 27, 2021, 2:59 PM

#

dull turtle but csv file too big that i am not able to open it directly

assuming they are sorted in manner of minutes you can read line by line by csv module. also to look for pandas solution I'd suggest read csv with chunk size param. some people here do know how to use it, i have personally not used it yet.

dull turtle Sep 27, 2021, 3:09 PM

#

dull turtle my code https://paste.pythondiscord.com/nuvukilore.py here

@lapis sequoia can u check my code here

lapis sequoia Sep 27, 2021, 3:10 PM

#

no I'm busy today. I'm sorry.

#

just gave you reference which may help.

misty flint Sep 27, 2021, 5:07 PM

#

np.where() is op Praise

fiery sedge Sep 27, 2021, 5:19 PM

#

Hi! I have a question,Can I do a neural network that his work is to see for the actions of a person on his phone and collect those information? It's possible?

quasi parcel Sep 27, 2021, 5:20 PM

#

@fiery sedge can you elobarate why do you need neural network

#

you mean through app

#

you can do some thing called clickstream

#

which will listen users information on phone app like what a user is doing and then

#

send it to a s3

#

its pretty easy to setup the backend

#

but if you have an app in android

#

they need to pass this triggered event to this api which handles clickstrem

#

if you need i have a snippet of code which can handle this clickstream

#

so basically you will be creating a data lake which will have all users triggered events

fiery sedge Sep 27, 2021, 5:41 PM

#

Ok I understand, but I mean in general, not only in an App, I mean to know the actions in his cell phone like for example which apps he use more time, or access to the contacts, it's possible to have this funtionality with only a neuronal network?

quasi parcel Sep 27, 2021, 5:44 PM

#

with only neural networks

#

means i think you can use LSTM

#

long-short term memory

#

i think i read about this

#

one moment

misty flint Sep 27, 2021, 5:53 PM

#

i dont think you need a neural net for that. just sounds like ~~sketchy~~ data collection and then regular analysis

quasi parcel Sep 27, 2021, 5:53 PM

#

that is what i suggested

#

we can do a data lake for all event triggers by users

#

@misty flint

#

generally we can track user behaviour with lstm

#

so one moment

#

https://aws.amazon.com/blogs/big-data/create-real-time-clickstream-sessions-and-run-analytics-with-amazon-kinesis-data-analytics-aws-glue-and-amazon-athena/

Amazon Web Services

Create real-time clickstream sessions and run analytics with Amazon...

Clickstream events are small pieces of data that are generated continuously with high speed and volume. Often, clickstream events are generated by user actions, and it is useful to analyze them. For example, you can detect user behavior in a website or application by analyzing the sequence of clicks a user makes, the amount of […]

#

@fiery sedge

fiery sedge Sep 27, 2021, 6:06 PM

#

Ok I'm gonna read it! 😀

#

👍

quasi parcel Sep 27, 2021, 6:07 PM

#

this is one of the methods

#

to get the users data to database or storage unit

rigid zodiac Sep 27, 2021, 6:08 PM

#

can some one please look at my code? i'mtrying to feed multiple csv file in and do some conversion then save each of them down

#

also where can I paste my code so you can check it

#

!code

quasi parcel Sep 27, 2021, 6:10 PM

#

https://paste.pythondiscord.com

#

please use this

#

@rigid zodiac

rigid zodiac Sep 27, 2021, 6:10 PM

#

thank you, here is my code

#

https://paste.pythondiscord.com/aricoliwip.py

#

I successfully feed them in, but dont know how to save it in separate file

quasi parcel Sep 27, 2021, 6:13 PM

#

i think it is fname

#

where did you define filename and plotnumbers

#

?

rigid zodiac Sep 27, 2021, 6:14 PM

#

fname is just any file in side the folder

#

so what i did was feed each file in separately and then run it through those

rigid zodiac Sep 27, 2021, 6:16 PM

#

quasi parcel i think it is fname

This is what I get when I run it

#

i'm trying to break each one of those into a csv file

quasi parcel Sep 27, 2021, 6:19 PM

#

okay

#

so in each csv file

#

you have this columns

rigid zodiac Sep 27, 2021, 6:20 PM

#

Yep and with that many row

#

so far I add the df.to_csv at the end but it keep written on top of itself

quasi parcel Sep 27, 2021, 6:21 PM

#

what is the error you are getting

#

is there any error

rigid zodiac Sep 27, 2021, 6:21 PM

#

nothing, it just give me back 1 csv file instead of 1000 csv file

quasi parcel Sep 27, 2021, 6:21 PM

#

which name?

#

df.to_csv('/content/drive/MyDrive/Huy_2/nonfall_2ft_groupby/'+filename+str(plot_numbers) + '.csv', index=False)

#

cause in this you have mentioned filename

#

where is this filename is getting updated

#

?

rigid zodiac Sep 27, 2021, 6:23 PM

#

I dont want it to update, i just need it to save down as a csv file...

quasi parcel Sep 27, 2021, 6:23 PM

#

yes dude get it can you show me in the code where you are assigning filename?

rigid zodiac Sep 27, 2021, 6:24 PM

#

like the code before I break that massive csv file into like 1000 csv filed ??

rigid zodiac Sep 27, 2021, 6:26 PM

#

quasi parcel yes dude get it can you show me in the code where you are assigning filename?

I'm not following with your question, tbh. Cause the one I gave you before just taking those csv file in the folder and feed it

quasi parcel Sep 27, 2021, 6:26 PM

#

df.to_csv('/content/drive/MyDrive/Huy_2/nonfall_2ft_groupby/'+filename+str(plot_numbers) + '.csv', index=False) in this line can u tell me where is filename is getting set

#

?

rigid zodiac Sep 27, 2021, 6:27 PM

#

agh... that... that's from previous... i see, let me try to remove it

#

still same issue

#

only 1 file coming out after I remove it

quasi parcel Sep 27, 2021, 6:29 PM

#

can you share the code after that

rigid zodiac Sep 27, 2021, 6:30 PM

#

https://paste.pythondiscord.com/hivukimoxi.py

quasi parcel Sep 27, 2021, 6:35 PM

#

https://paste.pythondiscord.com/iviwedegus.py

#

try this

rigid zodiac Sep 27, 2021, 6:36 PM

#

quasi parcel https://paste.pythondiscord.com/iviwedegus.py

it has this error code

quasi parcel Sep 27, 2021, 6:37 PM

#

create a folder called nonfall_2ft_groupby

rigid zodiac Sep 27, 2021, 6:38 PM

#

I do have that

#

seems like it has .csv.csv

quasi parcel Sep 27, 2021, 6:39 PM

#

https://paste.pythondiscord.com/ibisiladom.py

#

try this

rigid zodiac Sep 27, 2021, 6:39 PM

#

same error code

quasi parcel Sep 27, 2021, 6:39 PM

#

show me the error

#

?

#

please

rigid zodiac Sep 27, 2021, 6:40 PM

#

quasi parcel Sep 27, 2021, 6:44 PM

#

https://paste.pythondiscord.com/ruziziqovi.py

#

try this

rigid zodiac Sep 27, 2021, 6:47 PM

#

it still have the error

#

quasi parcel Sep 27, 2021, 6:48 PM

#

https://paste.pythondiscord.com/zeroqivova.py

rigid zodiac Sep 27, 2021, 6:49 PM

#

same file... I may just delete that file then

#

nope still not work

#

😦

quasi parcel Sep 27, 2021, 6:50 PM

#

same error?

rigid zodiac Sep 27, 2021, 6:51 PM

#

quasi parcel Sep 27, 2021, 6:52 PM

#

https://paste.pythondiscord.com/luguranopi.py

rigid zodiac Sep 27, 2021, 6:53 PM

#

let me reset it again

rigid zodiac Sep 27, 2021, 6:56 PM

#

quasi parcel https://paste.pythondiscord.com/luguranopi.py

#

idk what is going on, like I delete it and it still have the same error

#

keep it still the same

quasi parcel Sep 27, 2021, 6:59 PM

#

can you share the datasets?

#

if you are okay

#

?

rigid zodiac Sep 27, 2021, 6:59 PM

#

all of it?

#

I can give you the original 1, then the code to break it. it will be easier

quasi parcel Sep 27, 2021, 7:00 PM

#

sure

rigid zodiac Sep 27, 2021, 7:01 PM

#

https://paste.pythondiscord.com/gexuvefeta.properties

#

for dataset... how can I send it to you

quasi parcel Sep 27, 2021, 7:02 PM

#

one moment

#

i think i found it

#

https://paste.pythondiscord.com/uwiqufanik.py

rigid zodiac Sep 27, 2021, 7:07 PM

#

it is running hold on

rigid zodiac Sep 27, 2021, 7:10 PM

#

quasi parcel https://paste.pythondiscord.com/uwiqufanik.py

still 1 comming out

quasi parcel Sep 27, 2021, 7:20 PM

#

can you dm

#

?

foggy shuttle Sep 27, 2021, 7:28 PM

#

Hi guys, Can anyone point me to the right resources to start with computer vision video detection problems with transformers. I know NLP but am new to Computer Vision.

quasi parcel Sep 27, 2021, 8:04 PM

#

Hi

#

https://www.youtube.com/watch?v=oXlwWbU8l2o

YouTube

freeCodeCamp.org

OpenCV Course - Full Tutorial with Python

Learn everything you need to know about OpenCV in this full course for beginners. You will learn the very basics (reading images and videos, image transformations) to more advanced concepts (color spaces, edge detection). Towards the end, you'll have hands-on experience building a Deep Computer Vision model to classify between the characters in ...

▶ Play video

#

i think this should be valid source @foggy shuttle

foggy shuttle Sep 27, 2021, 8:06 PM

#

quasi parcel i think this should be valid source <@!888056267057283103>

Thank you... Will check it out 👍

vale zephyr Sep 27, 2021, 8:21 PM

#

Hi ! Anyone know how to do the equivalent of cv2.inrange for HSV color thresholding in PyTorch ?

red pecan Sep 27, 2021, 8:47 PM

#

I wanted to know why is everything related to ai super popular with python in comparison to other languages like c#?

#

Do please @ me if you mind?

grave frost Sep 27, 2021, 9:34 PM

#

red pecan I wanted to know why is everything related to ai super popular with python in co...

because c# is ancient languages for nerds 🦾

#

all the rich kids use python

velvet thorn Sep 27, 2021, 10:39 PM

#

it depends on which part of the stack you're talking about

#

the low-level networking code, everything that handles transactions etc., yes

#

in those contexts, Python is good for backtesting/experiments, minimally

#

and perhaps ML

velvet thorn Sep 27, 2021, 10:41 PM

#

red pecan I wanted to know why is everything related to ai super popular with python in co...

hm I would say it's because a lot of people who work with AI aren't software engineers first

#

but mathematicians

#

and CPython, being dynamically typed + interpreted, is generally easier to work with

#

more or less. the general pattern is: Python bindings for user-friendliness, C/C++/Fortran backend for speed.

#

a really good example is numpy

#

in general, debugging numpy issues is simple

#

compared to going through the underlying BLAS/LAPACK

#

yeah. C is at least reasonably readable by someone who doesn't know it

#

given proficiency in other languages

#

but C++ is a lot more complicated

#

sometimes I check out CPython source to understand how something works

#

if it was C++Python I would probably be like 🥴 and then 😔

#

agreed

#

isn't it weird

#

that VB and Python

#

are basically the same age?

#

even if you look @ VB after it started being on .NET and Python 2

#

say, pre-2.7

#

I would probably use Rust

#

it's a lot nicer to work with

#

apart from the immaturity of tooling

#

yeah, I intend to go take a Master's next year, and then maybe get back into ML

#

it's a pretty cool language! but unless you are a relatively hardcore engineer it'll probably be irrelevant

#

law

#

shrugs

#

I was a data scientist (nominally, though more like ML engineer)

velvet thorn Sep 27, 2021, 10:52 PM

#

velvet thorn I was a data scientist (nominally, though more like ML engineer)

it was my first job

#

why not? I didn't know what I wanted to do @ that time

#

nope

#

so might as well take a professional degree that is reasonably prestigious

#

went for a bootcamp, then got approached

#

it was a really great first job tbh

#

can you elaborate on that

#

hm I'm not sure about that

#

this is probably true, but it was quite intellectually stimulating, and you have better networking opportunities

#

Singapore

#

oh, I was near there once

#

I went to Saudi Arabia to teach data science

#

pretty interesting experience

#

way too dry for me though

#

yeah it wasn't bad! is part of the reason I went overseas to work

#

you mean in SA?

#

like, in Saudi Arabia or Singapore?

#

since you said this

#

oh

#

yeah.

#

but we're small

#

I'm actually working with a bank right now

#

shrugs

#

bootcamps are just a starting point I think

#

it's more about the marketing than anything else

red pecan Sep 27, 2021, 11:52 PM

#

Thank you two very much, I hope you have a great day. @velvet thorn @olive jackal

bold timber Sep 28, 2021, 12:07 AM

#

hi, I have question to handle an outlier: it is possible to handle outlier by using transform with yeo-johnson?

bold timber Sep 28, 2021, 12:27 AM

#

it is possible to use both (scaling and transform) in pipeline?

azure marsh Sep 28, 2021, 1:20 AM

#

Sure, they are just numerical operations

rigid zodiac Sep 28, 2021, 1:24 AM

#

Quick question, have anyone do train test split all of the csv file in the folder before?

bold timber Sep 28, 2021, 1:38 AM

#

azure marsh Sure, they are just numerical operations

what's effect if i putting an outlier into model?

green phoenix Sep 28, 2021, 2:10 AM

#

can someone tell me why its accuracy is so low?

#

im AI noob

#

this is the dataset https://archive.ics.uci.edu/ml/datasets/Chess+(King-Rook+vs.+King)

royal crest Sep 28, 2021, 2:16 AM

#

green phoenix can someone tell me why its accuracy is so low?

the quick answer is that your params are not tuned enough to get an acceptable accuracy

green phoenix Sep 28, 2021, 2:17 AM

#

royal crest the quick answer is that your params are not tuned enough to get an acceptable a...

ok ill keep tweaking them thanks for the input

tender hearth Sep 28, 2021, 2:20 AM

#

Hey guys, for producing sequences with continuous values, what's the norm for determining the length of the output sequence?

#

with decoder networks used in NLP it's easy because you can have a dedicated embedding for EOS tokens

#

but you can't do that with continuous values

green phoenix Sep 28, 2021, 2:35 AM

#

green phoenix can someone tell me why its accuracy is so low?

i got it up to 70% 🙂

royal crest Sep 28, 2021, 2:36 AM

#

green phoenix i got it up to 70% 🙂

well done! what was the change?

green phoenix Sep 28, 2021, 2:37 AM

#

royal crest well done! what was the change?

i used SVM instead of logistic regression

royal crest Sep 28, 2021, 2:37 AM

#

👏

drowsy wraith Sep 28, 2021, 2:40 AM

#

green phoenix ok ill keep tweaking them thanks for the input

I know a little of chess, but maybe, you can include features like the color from the square black or white for each piece, if the king is next each other, if there is a fork or other things like that. It takes some time to understand the features that matter.

azure marsh Sep 28, 2021, 3:48 AM

#

rigid zodiac Quick question, have anyone do train test split all of the csv file in the folde...

Yes, there are plenty of reasons to, e.g. splitting on metadata that isn't in the csv to ensure no leaking of test into train

azure marsh Sep 28, 2021, 3:49 AM

#

bold timber what's effect if i putting an outlier into model?

Depends on the model and how it's affected by outliers.

bold timber Sep 28, 2021, 3:56 AM

#

azure marsh Depends on the model and how it's affected by outliers.

can you elaborate of this?

azure marsh Sep 28, 2021, 4:09 AM

#

Some models and loss functions are affected differently, e.g. L1 vs L2. I'd recommend taking a course on ML, e.g. https://www.coursera.org/learn/machine-learning

lapis sequoia Sep 28, 2021, 4:11 AM

#

how to preprocess a categorical column where categories are ranges, like 'x<100', '100 <= x < 500', 'x >= 500'

dusk depot Sep 28, 2021, 4:13 AM

#

lapis sequoia how to preprocess a categorical column where categories are ranges, like 'x<100'...

what is a categorical column? are you talking about pandas?

lapis sequoia Sep 28, 2021, 4:14 AM

#

categorical column is a column which contains categories

#

I'm talking about ML stuff here.. pandas has nothing to do with it

#

do you know imputation/standardization/preprocessing ?

dusk depot Sep 28, 2021, 4:18 AM

#

no

#

ur saying 'column' like it's in some software or file

azure marsh Sep 28, 2021, 4:20 AM

#

To clarify, you have a numerical column that you would like to convert into a categorical column?

#

Just apply a if/elif/else or lambda function that checks its range

#

!e

func = lambda x: 1 if x < 100 else (2 if x < 500 else 3)
print([func(i) for i in [50, 150, 550]])```

arctic wedgeBOT Sep 28, 2021, 4:21 AM

#

@azure marsh :white_check_mark: Your eval job has completed with return code 0.

[1, 2, 3]

azure marsh Sep 28, 2021, 4:23 AM

#

You can also look into pandas cut

#

!d pandas.cut

arctic wedgeBOT Sep 28, 2021, 4:24 AM

#

pandas.cut


pandas.cut(x, bins, right=True, labels=None, retbins=False, precision=3, include_lowest=False, duplicates='raise', ordered=True)```
Bin values into discrete intervals.

Use cut when you need to segment and sort data values into bins. This function is also useful for going from a continuous variable to a categorical variable. For example, cut could convert ages to groups of age ranges. Supports binning into an equal number of bins, or a pre-specified array of bins.

azure marsh Sep 28, 2021, 4:25 AM

#

Looks like there's a numpy command as well

#

!d numpy.searchsorted

arctic wedgeBOT Sep 28, 2021, 4:25 AM

#

numpy.searchsorted


numpy.searchsorted(a, v, side='left', sorter=None)```
Find indices where elements should be inserted to maintain order.

Find the indices into a sorted array *a* such that, if the corresponding elements in *v* were inserted before the indices, the order of *a* would be preserved.

Assuming that *a* is sorted...

lapis sequoia Sep 28, 2021, 4:27 AM

#

Thanks guys for your help but I'm actually not asking help with Python or any specific library

#

I was only asking a conceptual concept on data preprocessing

#

Thanks for your help in any case. You guys are awesome

azure marsh Sep 28, 2021, 4:29 AM

#

You can look up these encodings for categorical data: ordinal, one-hot, dummy variable, embedding.
You could take ordinal further and even convert it back into a lossy numerical column (e.g. taking the midpoints of the bins)

lilac hull Sep 28, 2021, 4:32 AM

#

so im using the yolo algorithm, but when i run the code, i always get this error:

cv2.error: OpenCV(4.5.3) C:\Users\runneradmin\AppData\Local\Temp\pip-req-build-sn_xpupm\opencv\modules\dnn\src\darknet\darknet_io.cpp:659: error: (-215:Assertion failed) separator_index < line.size() in function 'cv::dnn::darknet::ReadDarknetFromCfgStream'``` how can i fix this?

lavish tundra Sep 28, 2021, 4:46 AM

#

someone know how i can convert that dataframe(image) to a dataframe where each element be part of a column?

lapis sequoia Sep 28, 2021, 4:53 AM

#

lavish tundra someone know how i can convert that dataframe(image) to a dataframe where each e...

df = pd.read_csv('location of dataset', sep=',', parse_dates=[column name which contains dates]) or something like that, you basically want to use sep and parse_dates parameters. look up on google

lavish tundra Sep 28, 2021, 4:55 AM

#

but its not from a file . _.

lilac hull Sep 28, 2021, 4:56 AM

#

lilac hull so im using the yolo algorithm, but when i run the code, i always get this error...

??

azure marsh Sep 28, 2021, 5:04 AM

#

Have you tried searching for that obscure error?

tender hearth Sep 28, 2021, 5:05 AM

#

Hey folks, I'll bump my question from earlier

For producing sequences with continuous values, what's the norm for determining the length of the output sequence?
with decoder networks used in NLP it's easy because you can have a dedicated embedding for EOS tokens
but you can't do that with continuous values

#

a concrete example of this would be generating audio waveforms

royal crest Sep 28, 2021, 5:07 AM

#

lavish tundra someone know how i can convert that dataframe(image) to a dataframe where each e...

split by comma and allocate them into their own respective lists then make a new dataframe out of them

#

though i see one problem being that comma is used as the 1_000 separator

#

surely you can do this from the source file

lilac hull Sep 28, 2021, 5:37 AM

#

azure marsh Have you tried searching for that obscure error?

i have, but didnt find a fix for it

azure marsh Sep 28, 2021, 5:39 AM

#

You've tried using someone else's config file?

#

You've ensured there's no comments without whitespace after the '#' ?

lilac hull Sep 28, 2021, 5:53 AM

#

so i tried verifying if both files were in the specified path, and they werent, as there was a typo. now i fixed that, but i get this:

parse NetParameter file: models/MobileNetSSD_deploy.prototxt in function 'cv::dnn::ReadNetParamsFromTextFileOrDie'```

lilac hull Sep 28, 2021, 5:54 AM

#

azure marsh You've ensured there's no comments without whitespace after the '#' ?

no, there arent any comments in the code

royal crest Sep 28, 2021, 6:01 AM

#

ReadNetParamsFromTextFileOrDie

#

sweatDuck

lilac hull Sep 28, 2021, 6:05 AM

#

lol

rigid zodiac Sep 28, 2021, 8:02 AM

#

azure marsh Yes, there are plenty of reasons to, e.g. splitting on metadata that isn't in th...

Yeah but how can i do it

wide citrus Sep 28, 2021, 9:17 AM

#

Any know about web scraping?

lapis sequoia Sep 28, 2021, 9:39 AM

#

wide citrus Any know about web scraping?

yeah some people do, but this is not the right place, this is #data-science-and-ml

viral juniper Sep 28, 2021, 10:34 AM

#

enjoy some nightmare fuel from early stages of my gan

#

this one was even earlier in training

pastel valley Sep 28, 2021, 11:50 AM

#

yo what should i learn if i want to create a machine learning model to classify different types of fish through image? is covolutional neural network appropriate with it?
how do i know if i am using a good algorithm?

lunar violet Sep 28, 2021, 12:05 PM

#

Hello Friends, I am currently stuck with my BE Project of Helmet Detection System using YOLOv3 on Google Collab with Darknet. I have the training code but not sure its error free & want a proper testing code. I have a custom Dataset which is labelled and ready. However even if i get the readymade Testing code of Yolov3 i dont know what exactly to Add/EDIT in that since i dont know python. Can someone please help me with the python part , i have to present this project to my External and Internal Faculties . Thank You

#

Please Feel Free to DM me with help

gaunt marsh Sep 28, 2021, 1:52 PM

#

I have an Array which looks like this:

[[0.7651453003611763, 0.764035690858367, 0.7355304233745091], [0.6948386732214498, 0.15246199920890194, 0.1504548793580838], [0.6948386732214498, 0.15246199920890194, 0.1504548793580838], [0.8455282724710679, 0.84655663488637, 0.8337125981891232]]

How can I multiplay the values with 255? (These are converted RGB values)

serene scaffold Sep 28, 2021, 2:02 PM

#

gaunt marsh I have an Array which looks like this: ```[[0.7651453003611763, 0.7640356908583...

!e

import numpy as np

arr = np.array([[0.7651453003611763, 0.764035690858367, 0.7355304233745091], 
                [0.6948386732214498, 0.15246199920890194, 0.1504548793580838], 
                [0.6948386732214498, 0.15246199920890194, 0.1504548793580838], 
                [0.8455282724710679, 0.84655663488637, 0.8337125981891232]])

arr2 = arr * 255
print(arr2)

arctic wedgeBOT Sep 28, 2021, 2:02 PM

#

@serene scaffold :white_check_mark: Your eval job has completed with return code 0.

001 | [[195.11205159 194.82910117 187.56025796]
002 |  [177.18386167  38.8778098   38.36599424]
003 |  [177.18386167  38.8778098   38.36599424]
004 |  [215.60970948 215.8719419  212.59671254]]

serene scaffold Sep 28, 2021, 2:02 PM

#

@gaunt marsh multiplying an array by a numeric type will multiply each element by that value.

celest light Sep 28, 2021, 2:11 PM

#

pastel valley yo what should i learn if i want to create a machine learning model to classify ...

Yes CNNs are pretty good for Image classification.
To know whether the model is good, you can use metrics such as f1, accuracy, recall and precision to measure the performance of any classification model. You should also use a holdout validation dataset to compare the performance of the model on the data you trained it on vs the data you didn't train it on.

celest light Sep 28, 2021, 2:16 PM

#

viral juniper enjoy some nightmare fuel from early stages of my gan

Been there. 😂

uncut barn Sep 28, 2021, 2:34 PM

#

Is batch normalization still needed even if we normalize the data beforehand i.e. dividing every pixel by 255.0?

celest light Sep 28, 2021, 2:35 PM

#

uncut barn Is batch normalization still needed even if we normalize the data beforehand i.e...

Batch Normalisation is needed to keep the weights and gradients in check, not the inputs. So I would say it is good to have them.

pastel valley Sep 28, 2021, 2:43 PM

#

celest light Yes CNNs are pretty good for Image classification. To know whether the model is ...

good sir do you recommend any youtube videos for beginner in machine learning may end goal is image classification with machinelearning

arctic wedgeBOT Sep 28, 2021, 2:45 PM

#

Hey @plush leaf!

It looks like you tried to attach file type(s) that we do not allow (.zip). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

viral juniper Sep 28, 2021, 2:50 PM

#

how do you guys load your datasets? I wanted to load new ones from the drive when needed, but keeping already loaded ones in a list, so that accessing them again is faster
turns out that wasn't a great idea because 70 thousand numpy arrays of shape 64x64x3 are not very fit for my 16 gb of ram KEKW

#

I worry that if I make it strictly just pull from hdd every time, I will wear out my drive

celest light Sep 28, 2021, 2:51 PM

#

pastel valley good sir do you recommend any youtube videos for beginner in machine learning ma...

Check out tutorials on Tensorflow's website

celest light Sep 28, 2021, 2:52 PM

#

viral juniper how do you guys load your datasets? I wanted to load new ones from the drive whe...

If you are using deep learning via Tensorflow or pytorch, they have mechanisms to create a input data pipeline that brings in data as and when needed

viral juniper Sep 28, 2021, 2:53 PM

#

ohh interesting, ye im using keras, I'll look into that

#

thank you

celest light Sep 28, 2021, 2:57 PM

#

viral juniper thank you

Yeah. Look into tf.Dataset
Or if you are using images for image classification, you can also try ImageDataGenerator in keras.
Another way is to subclass the keras Sequence class for full control

viral juniper Sep 28, 2021, 3:06 PM

#

I'm training a gan so yea i pull real image samples for the discriminator

lapis sequoia Sep 28, 2021, 4:12 PM

#

hey I got N number of dfs, i wish to merge them.
but i don't know how to merge by some comparing some certain column.

#

example:
df1

a b
1 a
2 b
3 c

df2

a c
1 x
2 y
4 z

the df i want:
a b c
1 a x
2 b y
3 c -
4 - z

ripe forge Sep 28, 2021, 4:18 PM

#

thats just an outer join

lapis sequoia Sep 28, 2021, 4:18 PM

#

yeah hold on i did find something

#

i mean thats an outer join tbh.

odd meteor Sep 28, 2021, 5:00 PM

#

A Quick Question....

I'm planning to start learning DL so I'd like ask

PyTorch or TensorFlow or Keras

Which framework is advisable for a complete beginner in Deep Learning to learn 1st.

Please could you give a reason for your suggestion in Q1.

serene scaffold Sep 28, 2021, 5:08 PM

#

odd meteor A Quick Question.... I'm planning to start learning DL so I'd like ask 1) PyTo...

Are you aware that Keras is part of TensorFlow? In either case, I think it depends on your experience with machine learning in general. Deep learning isn't the be-all-end-all of AI.

#

If you're not familiar with other approaches to AI, I think you'd find your learning experience more satisfying if you start elsewhere.

odd meteor Sep 28, 2021, 5:11 PM

#

serene scaffold Are you aware that Keras is part of TensorFlow? In either case, I think it depen...

Oh I had no idea Keras is part of TF. I've seen quite a few code written with TF where keras was mentioned

serene scaffold Sep 28, 2021, 5:12 PM

#

odd meteor Oh I had no idea Keras is part of TF. I've seen quite a few code written with TF...

yes, Keras is a part of TF that wraps around other parts of TF.

grave frost Sep 28, 2021, 5:15 PM

#

odd meteor A Quick Question.... I'm planning to start learning DL so I'd like ask 1) PyTo...

for beginners, pytorch is great too - even though the preference is TF since you dont really need to understand anything for TF

odd meteor Sep 28, 2021, 5:16 PM

#

serene scaffold If you're not familiar with other approaches to AI, I think you'd find your lear...

Well, I'm almost done with the online ML course I'm using to learn. There's an introductory segment to TF, Keras and PyTorch however that's just that about it. There's no deep material on deep learning

grave frost Sep 28, 2021, 5:16 PM

#

but I would still recommend using Pytorch and JAX, when you get some more experience

odd meteor Sep 28, 2021, 5:18 PM

#

grave frost for beginners, pytorch is great too - even though the preference is TF since you...

Thanks. Does that mean TF is more customer-friendly? 😀

Well, I'd like to be able to understand what's going on in each line of code I'm writing.

grave frost Sep 28, 2021, 5:18 PM

#

wdym by customers?

lapis sequoia Sep 28, 2021, 5:19 PM

#

how do i get into ai

grave frost Sep 28, 2021, 5:19 PM

#

lapis sequoia how do i get into ai

see pinned comments

odd meteor Sep 28, 2021, 5:20 PM

#

grave frost but I would still recommend using Pytorch and JAX, when you get some more experi...

This is my first time of hearing JAX. All the JD I've seen so far seem not to mention this framework though. Perhaps it's somewhat new

grave frost Sep 28, 2021, 5:20 PM

#

odd meteor This is my first time of hearing JAX. All the JD I've seen so far seem not to me...

it is yeah - its for really power users, mostly cutting edge research stuff

#

but it provides an extremem level of flexibility

odd meteor Sep 28, 2021, 5:24 PM

#

grave frost wdym by customers?

'customer-friendly' in that context means that it's perhaps perceived to have a more simpler syntax or more easier

silent pendant Sep 28, 2021, 5:28 PM

#

if I want to drop a column from a pandas df with

schedule.drop(list(schedule.filter(regex='DAY')), axis=1, inplace = True)

is there a way to select more than one filter?

grave frost Sep 28, 2021, 5:28 PM

#

odd meteor 'customer-friendly' in that context means that it's perhaps perceived to have a ...

syntaxwise, TF requires you to write less code automating what happens underneath - whereas Pytorch requires more code but gives you greater control, better docs and lovely syntax

silent pendant Sep 28, 2021, 5:33 PM

#

@olive jackal because I am brand new to pandas 🙂 And dont know all the trix yet

#

Ultimately I dont need the columns at all, and the file will be written back out to excel

#

Defeats the purpose as a Pandas exercise 🙂

young harness Sep 28, 2021, 5:52 PM

#

Hey! I'm currently making a sudoku solver from image with opencv.I've got the initial processing and splitting the image into cells done but im having trouble detecting if a specific cell contains a digit(not classifying the digit). Does anyone know how i can go about solving it?

wide sequoia Sep 28, 2021, 6:20 PM

#

how to tune hyperparameter for gensim doc2vec even though gensim doc2vec doesnt give any accuracy/loss for training?

arctic wedgeBOT Sep 28, 2021, 6:47 PM

#

Hey @plush leaf!

It looks like you tried to attach file type(s) that we do not allow (.zip). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

rigid zodiac Sep 28, 2021, 7:34 PM

#

Hey guy I keep having this issue. But when I run that file separately, it work just fine

ebon lynx Sep 28, 2021, 8:04 PM

#

@rigid zodiac what file

#

frame looks like the index.

#

I think if you want to groupby the index, the parameter for groupby is level=0

#

otherwise the problem might be that your datatypes are interpreted as strings

rigid zodiac Sep 28, 2021, 8:07 PM

#

ebon lynx <@!380930360407621646> what file

The file fall 549

ebon lynx Sep 28, 2021, 8:07 PM

#

what is that

rigid zodiac Sep 28, 2021, 8:07 PM

#

Like i put all of the fall into 1 big folder

#

And loop it through here

ebon lynx Sep 28, 2021, 8:08 PM

#

I have literally no idea what you're talking about

earnest wadi Sep 28, 2021, 8:08 PM

#

Hello, I could really use some help, ive just tried to whip up my first neural network completely from scratch, and I think its almost working, would any expert be so kind to take a few minutes to go through it with me?

ebon lynx Sep 28, 2021, 8:08 PM

#

@earnest wadi post code, and post errors if there are any. that's the only way to get help.

earnest wadi Sep 28, 2021, 8:09 PM

#

hmm, okay

#

https://pastebin.com/kh68Zu6c <- main
https://pastebin.com/qhTy7XWJ <- functions file

Pastebin

from functions import *import numpy as npclass Model(): def __in...

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

Pastebin

import numpy as npimport randomdef sigmoid(x): return 1 / (1+np....

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

#

My back propagation is off i believe

#

 [2.]
 [2.]]
a:\Python\Neural Net testing Stuff\functions.py:8: RuntimeWarning: divide by zero encountered in true_divide
  return 1 / (1-x)
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
[[inf inf inf]
 [inf inf inf]
 [inf inf inf]]
[0. 0.]
[[-0.00109576 -0.00050056]
 [-0.00109576 -0.00050056]
 [-0.00109576 -0.00050056]]
Traceback (most recent call last):
  File "a:/Python/Neural Net testing Stuff/PyNets.py", line 76, in <module>
    nn.fit(training_data, batch_size=8, epochs=250)
  File "a:/Python/Neural Net testing Stuff/PyNets.py", line 27, in fit
    self.backwards_propegate(training_outs[a], output)
  File "a:/Python/Neural Net testing Stuff/PyNets.py", line 46, in backwards_propegate
    layers[i].adjustment = np.dot(layers[i].inputs.T, layers[i].delta)
  File "<__array_function__ internals>", line 5, in dot
ValueError: shapes (2,) and (3,2) not aligned: 2 (dim 0) != 3 (dim 0)
PS A:\Python\Neural Net testing Stuff>```

#

I get some division by zero then it all spirals into madness

ebon lynx Sep 28, 2021, 8:14 PM

#

ok I'm not touching this one.

earnest wadi Sep 28, 2021, 8:15 PM

#

ebon lynx ok I'm not touching this one.

lol, its my first time ever having a go from scratch :c

prime hearth Sep 28, 2021, 8:22 PM

#

hello, suppose i want to find P(A|B,C)

#

for bayem theorm

#

would i treat P(A|B) as like X and then do P(X|c)?

#

just confused on how to do this. with 2 conditionals

thorn bobcat Sep 28, 2021, 8:43 PM

#

anyone here good with hardware, need someone to help me pick between 2 hardware choices

#

for AI

rigid zodiac Sep 28, 2021, 8:52 PM

#

Sorry, gotta do some work

#

@ebon lynx what part that you confused?

worldly lake Sep 28, 2021, 9:23 PM

#

Guys, does anyone know how I can put a limit on adding 100 items from the list each time?

#

100 elements are processed first, then the next 100 elements in the list, so that less load on the system in general

serene scaffold Sep 28, 2021, 9:24 PM

#

thorn bobcat anyone here good with hardware, need someone to help me pick between 2 hardware ...

what hardware are you trying to pick between?

errant parcel Sep 28, 2021, 9:30 PM

#

Any recommendations for intro ML books which have good content on time series/LSTM?

thorn bobcat Sep 28, 2021, 11:04 PM

#

serene scaffold what hardware are you trying to pick between?

Nvidia Quaddro 5000 and Radion P560

#

but apparently Quadro Sux..

fluid sparrow Sep 29, 2021, 12:20 AM

#

Question: why is ggplot not showing a plot and instead showing a list

serene scaffold Sep 29, 2021, 12:53 AM

#

fluid sparrow Question: why is ggplot not showing a plot and instead showing a list

Can you provide more context? We have no idea what code you've executed or what its inputs were.

drowsy wraith Sep 29, 2021, 1:22 AM

#

thorn bobcat Nvidia Quaddro 5000 and Radion P560

Does Radeon work with ML?

serene scaffold Sep 29, 2021, 1:31 AM

#

drowsy wraith Does Radeon work with ML?

I assume you mean "cuda enabled"?

drowsy wraith Sep 29, 2021, 1:32 AM

#

with the GPU enabled I think

#

but, yeah, in the documentation i only saw CUDA

tender hearth Sep 29, 2021, 1:41 AM

#

there is not very good support for non-Nvidia GPUs

ocean swallow Sep 29, 2021, 1:42 AM

#

Hello. I am in need of some help :((( Do you know those chain supermarket brochures? I'm gonna need to extract manufacturer/title and description info for each peoduct on it.

tender hearth Sep 29, 2021, 1:43 AM

#

thorn bobcat Nvidia Quaddro 5000 and Radion P560

you might as well go for the Titan RTX at that price point

ocean swallow Sep 29, 2021, 1:43 AM

#

Finding products/info is easy with object detection, but i don't know how to extract that info

serene scaffold Sep 29, 2021, 1:43 AM

#

ocean swallow Hello. I am in need of some help :((( Do you know those chain supermarket brochu...

how are you going about converting the content of the brochure to "regular" text?

ocean swallow Sep 29, 2021, 1:44 AM

#

serene scaffold how are you going about converting the content of the brochure to "regular" text...

Object detector + OCR

#

it is extremely robust

#

Assume I can convert it to text

serene scaffold Sep 29, 2021, 1:45 AM

#

I would confirm that you can accurately convert it to text. However spaCy might have a ready-made recognizer for manufacturers.

#

extracting the description of a given product is going to be more difficult because it's hard to say when a description starts or ends.

ocean swallow Sep 29, 2021, 1:46 AM

#

Yes exactly :/ I can also find the title and say the rest is description.

#

Title is basically what the thing is. But I have never done natural language processing on production level

serene scaffold Sep 29, 2021, 1:47 AM

#

ocean swallow Yes exactly :/ I can also find the title and say the rest is description.

this might sound like a dumb question, but how do you know what the title is? and what is "the rest"?

ocean swallow Sep 29, 2021, 1:47 AM

#

Or text classification

serene scaffold Sep 29, 2021, 1:48 AM

#

I do NLP professionally for some reason.

ocean swallow Sep 29, 2021, 1:48 AM

#

Title is basically what the product is. Say broom from vileda

#

vileda being manufactuter

serene scaffold Sep 29, 2021, 1:48 AM

#

so any time "x from y" is one sentence, that is always going to be a product and a manufacturer?

ocean swallow Sep 29, 2021, 1:48 AM

#

Description has info like say 100 cm length etc

serene scaffold Sep 29, 2021, 1:49 AM

#

I never do anything with documents that haven't already been converted into ascii/unicode/etc

ocean swallow Sep 29, 2021, 1:49 AM

#

serene scaffold so any time "x from y" is one sentence, that is always going to be a product and...

it never is like that. The title is just written as say "PLC Broom"

serene scaffold Sep 29, 2021, 1:50 AM

#

My point is, if you don't have carefully constructed training data for this, you will either have to use a classifier that has already been trained or come up with rules

ocean swallow Sep 29, 2021, 1:50 AM

#

As a human it is easy to extract that info

ocean swallow Sep 29, 2021, 1:51 AM

#

serene scaffold My point is, if you don't have carefully constructed training data for this, you...

Yes agreed. I am going with rule based but it is becoming harder and harder unfortunately.

#

As title and manufacturer is all mixed together

serene scaffold Sep 29, 2021, 1:52 AM

#

So you may have to go with a pre-built model and accept a certain amount of inaccuracy

ocean swallow Sep 29, 2021, 1:52 AM

#

Which kibd of model would you suggest for such task?

serene scaffold Sep 29, 2021, 1:53 AM

#

are you familiar with named entity recognition?

ocean swallow Sep 29, 2021, 1:53 AM

#

I have heard but not really know

serene scaffold Sep 29, 2021, 1:54 AM

#

It's where you recognize words/phrases that belong to a certain category. "product" and "manufacturer" are clear-cut categories, but "description" isn't really.

ocean swallow Sep 29, 2021, 1:57 AM

#

Is it okay for those models not to include some categories? Sometimes it just writes "Tomato"

#

Do you have anything as a name, that is pretrained etc for that?

serene scaffold Sep 29, 2021, 1:59 AM

#

ocean swallow Do you have anything as a name, that is pretrained etc for that?

spaCy lets people train and publish models for a lot of different NLP things, so it's great for making NLP accessible to a general programming audience. I would look into what NER models they have for product-related stuff.

#

You're probably not the first person to want to do this kind of thing. But be warned, I still don't know what to do about the product descriptions.

ocean swallow Sep 29, 2021, 2:00 AM

#

If I find title and manufacturer and remove them from the whole text, then I will be left with descriptions.

#

So it is not really a big issue :)

#

I will definetly be looking into those thank you so much :)

fluid sparrow Sep 29, 2021, 2:29 AM

#

serene scaffold Can you provide more context? We have no idea what code you've executed or what ...

Thank you for seeing my question I gave up on figuring out something I was trying to program. I was asking a very general and broad question. But thank you nonetheless 🙂

lilac hull Sep 29, 2021, 3:44 AM

#

so im using open opencv to make an object detection program. when i run the code, i get this error:

AttributeError: module 'cv2.cv2' has no attribute 'dnn_DetectionModel'```

#

i gotta submit the assignment today, so uhh its kinda urgent...

lilac hull Sep 29, 2021, 4:02 AM

#

if you need the module versions:

#

alright i think i know what caused it now..

#

im upgrading opencv-contrib-python i'll see if that fixes it

#

new error:
[ WARN:0] global C:\Users\runneradmin\AppData\Local\Temp\pip-req-build-1i5nllza\opencv\modules\videoio\src\cap_msmf.cpp (438) `anonymous-namespace'::SourceReaderCB::~SourceReaderCB terminating async callback

#

Now its this:
[ERROR:0] global C:\Users\runneradmin\AppData\Local\Temp\pip-req-build-1i5nllza\opencv\modules\dnn\src\tensorflow\tf_importer.cpp (2805) cv::dnn::dnn4_v20210608::`anonymous-namespace'::TFImporter::parseNode DNN/TF: Can't parse layer for node='Fp\pip-req-build-1i5nllza\opencv\modules\dnn\src\tensorflow\tf_importer.cpp:2478: error: (-2:Unspecified error) Const input blob for weights not found in function 'cv::dnn::dnn4_v20210608::`anonymous-namespace'::TFImporter::getConstBlob'

#

pls help

surreal jetty Sep 29, 2021, 6:27 AM

#

will this work if df is sorted by something else, or will the order be wrong?

df['val'] = df.sort_values(by=['time']).loc[:, 'val'].apply(foo)

#

i guess the question is does pandas use the index when group assigning values

desert oar Sep 29, 2021, 6:48 AM

#

surreal jetty i guess the question is does pandas use the index when group assigning values

yes

#

!e ```python
import pandas as pd
df = pd.DataFrame({
'x': [1,2,3],
'y': [4,5,6],
}, index=list('abc'))
print(df)
df['x'] = df['x'].iloc[::-1] + 10
print(df)

arctic wedgeBOT Sep 29, 2021, 6:48 AM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 |    x  y
002 | a  1  4
003 | b  2  5
004 | c  3  6
005 |     x  y
006 | a  11  4
007 | b  12  5
008 | c  13  6

desert oar Sep 29, 2021, 6:49 AM

#

assignment to a column in a pandas df is actually a join on the index

surreal jetty Sep 29, 2021, 6:49 AM

#

desert oar assignment to a column in a pandas df is actually a join on the index

ah thats nice to know, thanks!

desert oar Sep 29, 2021, 6:49 AM

#

.join, .loc[]=, pd.concat, and pd.merge all perform joins, with overlapping but not entirely redundant options/features

desert oar Sep 29, 2021, 6:50 AM

#

surreal jetty ah thats nice to know, thanks!

yes, it's incredibly useful and one of the best parts about pandas

#

people who don't like pandas are usually the same people who don't understand the index system

surreal jetty Sep 29, 2021, 6:54 AM

#

Haha i still fall in the latter group i think

#

Especially with multilevel indexes

#

do you know if there is any nice ways accessing multindex values?
filtering stuff like df[df['name'] == 'B'] is just so much more convenient than df[df.index.get_level_values('name') == 'B'] or whatever

#

even with 600k rows and no index pandas is more than fast enough so i struggle to see the value of using indexes apart from various transformations which needs indexing

bronze lichen Sep 29, 2021, 7:25 AM

#

Hello

#

Do you need Data science for Ai or vice versa

#

Anyways i cant do ML or Ai rn anyways so

#

How can i get started with Data science?

#

Im reading the pinned messages let me check they usually have some good stuff

desert oar Sep 29, 2021, 7:42 AM

#

surreal jetty do you know if there is any nice ways accessing multindex values? filtering stu...

Use pd.IndexSlice for that

uncut barn Sep 29, 2021, 7:47 AM

#

How would I do early stopping, if the validation dice coefficient is above 0.5?

velvet thorn Sep 29, 2021, 8:46 AM

#

desert oar assignment to a column in a pandas df is actually a join on the index

this is actually so profound

#

and not how I thought of it

#

it was a bit mindbending tbh

#

but that’s a good observation

odd meteor Sep 29, 2021, 9:53 AM

#

prime hearth just confused on how to do this. with 2 conditionals

Isn't this why Naive Bayes algorithm (GaussianNB) is mostly preferable for conditional probability?

You can use Naive Bayes to solve this. Although there probably might be a better approach... You can even do this from scratch

bronze lichen Sep 29, 2021, 10:07 AM

#

Any cool Data science projects?