#data-science-and-ml

1 messages · Page 343 of 1

prisma mulch
#

(scikit)

velvet thorn
#

a random forest is a bunch of decision trees, each of which is fit to a subset of your training data (basically)

#

n_estimators = number of trees

prisma mulch
#

thank you

velvet thorn
#

yw

prisma mulch
#

so if i set the value of n_estimators too high, will overfitting like problems reappear?

velvet thorn
#

so

#

why do we fit multiple trees?

#

what are the characteristics of a single tree? <- start with this

prisma mulch
#

to get better results?

prisma mulch
#

that is all i got

velvet thorn
#

what I mean is more like

#

are you familiar with bias and variance

#

?

prisma mulch
velvet thorn
#

if you fit a decision tree on your data

#

without any constraints

#

if possible

#

it will overfit madly

#

because

prisma mulch
#

it will identify wrong patterns

velvet thorn
#

it can memorise your entire training set

#

agree?

prisma mulch
#

yeah

velvet thorn
#

the trees in a random forest are constrained in 2 ways

#
  1. they don't see the whole dataset
#
  1. they are limited in depth
#

this serves to limit overfitting

prisma mulch
#

yeah

#

Oh nice

prisma mulch
#

but does it mean that they can be vulnerable to underfitting then if the value is too low?

prisma mulch
#

does it dynamically split the trees?

velvet thorn
velvet thorn
prisma mulch
# velvet thorn elaborate

if the trees it is making is going to be fixed length with fixed dataset whether n_estimator is 2 or 15, then yes

velvet thorn
#

unless you change the setting

#

but eahc tree sees a slightly different subset

#

it's called bagging (bootstrap aggregation)

prisma mulch
#

so, I guess it should be vulnerable to underfitting?

velvet thorn
#

well, depends on the settings

#

in general, you have p tiny trees for random forests, so yes

prisma mulch
#

so, why don't you set n_estimator to the highest values?

#

performance?

velvet thorn
prisma mulch
velvet thorn
prisma mulch
#

ahh

velvet thorn
#

think about it this way

#

each tree sees a random subset of the data

#

but the fitting itself is deterministic-ish

#

the more trees you have

#

the higher the probability that two trees will see the same data

lapis sequoia
#

Mathematics is problem solving. Just watching videos does not teach, you have to do tasks. Watching videos and reading is good, but solving problems develops the most.

prisma mulch
#

thanks for your help @velvet thorn

velvet thorn
serene scaffold
velvet thorn
#

I personally don't do videos @ all

serene scaffold
#

see here @lapis sequoia

lapis sequoia
lapis sequoia
serene scaffold
#

On a more general note, I'm of the opinion that all learning is self-learning.

lapis sequoia
#

Yes. I see too much here in university that some students think they get everything on a tray, even though the intention is to develop in increasingly challenging problem-solving tasks.

desert bear
#

I managed to find the answer. I asked about it without showing an example, because I saw this in many ML models and though it was a general thing.
Here's the answer https://stats.stackexchange.com/questions/153823/what-is-verbose-in-scikit-learn-package-of-python

surreal jetty
#

hello! pandas' resample somehow moves my table columns around. Any idea how to revert it?
original:

---------------------
| time | name | val |
---------------------

after resample it looks like this:

---------------------
|        name | val |
| time |
---------------------

Any idea how to revert it back to the original, or prevent it from happening?

velvet thorn
#

that column

#

becomes the index

#

you can turn it back into a column with reset_index()

#

but

#

why?

surreal jetty
#

becase accessing the data is a bit more tricky

#

cant do df['time'] anymore

velvet thorn
#

not a big thing though

#

shrugs

surreal jetty
#

seems like the other values are a bit tricky to access as well

velvet thorn
surreal jetty
#

but i guess thats not really that relevant. What im really trying to do is given the resampled series

    time                   val
0    2021-09-23 13:27:00    1092.307692
1    2021-09-23 13:30:00    1091.789474
2    2021-09-23 13:33:00    1089.692308
3    2021-09-23 13:36:00    1089.000000
4    2021-09-23 13:39:00    1089.200000
5    2021-09-23 13:42:00    1089.400000
6    2021-09-23 13:45:00    1089.333333
7    2021-09-23 13:48:00    1089.666667
8    2021-09-23 13:51:00    1089.000000
9    2021-09-23 13:54:00    1089.000000
10    2021-09-23 13:57:00    1089.666667

and turn it into a "change per hour" using least square. The sklearn's reg.fit expects some training values (which given this case im not sure is a right approach)

velvet thorn
#

and turn it into a "change per hour" using least square.

#

this part

surreal jetty
#

looking at the data, it seems like val is reduced by 3 in 30 minutes. So thats 6/hr, which is what im trying to fit using least square

velvet thorn
#

the difference

#

between successive values?

surreal jetty
#

the real world data is a lot more noisy so a plain diff doesnt work that well

#

i think ordinary least squares is whats its called

#

i got a c implementation somewhere, but it's quite alot of code so i'd rather use a library if possible

velvet thorn
#

how does that relate to taking a diff

surreal jetty
#

yeah i guess. my english math terms are a bit rusty

#

well i mean you could model the change using a simple diff

#

or you could use regression

civic wadi
#

Hey I am currently doing a research internship in nlp it's basically handeling homonyms and contextual words in sentiment analysis anyone know any nice papers related to this topic ?

bronze skiff
bronze skiff
#

it's kinda faux pas to just say that all regression is original least squares

#

your english terms aren't rusty

pure gull
#

(At least as a first step)

lapis sequoia
#

Hi guys, is there someone with knowledge about recommendation engines? I'm writing my thesis on this and would like to talk to an expert

grave frost
#

Every problem is 🥳
change my mind

serene scaffold
#

brb gonna do linear regression to figure out what 5 + x is for any x

grave frost
#

every problem is a result of several other problems created by intelligent apes known as humans. In the end, intelligence is just a biophysical process.

create AGI, create intelligence.

solve everything

#

pretty good startup pitch, eh? 😏

coral kindle
#

Just saw scikit-learn upgraded to 1.0

#

Welp, RIP compatibility

#

Though I think some of us will stick with 0.24 for a while

violet walrus
#

Hi all, hope all is well. Is this the best channel to discuss MLE and Data Scientist interview questions? I'm looking for a channel/resource to do specifically that

bronze skiff
#

most things are literally either stable, or things that should be deprecated are deprecated

coral kindle
#

I've been using the pipeline and GridSearchCV APIs

#

But i think it shouldn't be a problem

outer sorrel
#

Anyone wanting to join me on an open source project to create a 2d self driving car simulation using NEAT and pygame? ( i have completed most of the code, i just need a few teammates to help with more features and bugs, i can send you the git hub link)

tacit raft
#

Hello. I would like to try to learn about Reinforcement Learning. Most of the material I find either yada yadas over creating an environment etc. Or are super technical. I am willing do do a deep dive into the technical but would like a happy medium to start with. Anyone have any good resources? Thanks in advance.

odd meteor
outer sorrel
#

Neuro evolution of augmenting toppoligies

#

NEAT

#

thats a basic rundown of it^

#

or actually, i think thats the longest one, there are shorter ones on te website.

odd meteor
lapis sequoia
#

hello

worthy flower
#

i was scraping instagram posts links using selenium and my script is working fine but i am able to scrape only 2k links but the posts are 300,000 and i don't why the browser stops loading content

lapis sequoia
#

can someone help me?

#

hi, i spent the whole day fixing many errors, but im stuck on this one, can someone help me? im working on OpenCV for a project, making the harry potter invisibility cloak, if someone knows anything about this error then please help

#

hi, i spent the whole day fixing many errors, but im stuck on one, can someone help me? im working on OpenCV for a project, making the harry potter invisibility cloak, if someone knows how to use it then dm me please

edgy hearth
#

hey can anyone help me out

#

can you tell me where do i start in ml or data sicence

#

??

ripe forge
edgy hearth
#

oke

#

thanks

edgy hearth
#

to learn AI

#

i found a book in the pinned stuff

odd meteor
edgy hearth
odd meteor
edgy hearth
#

yes ik jk

odd meteor
#

I was taught R in school but I later decided to learn python because that's the programming language used in most of the courses I'm using to learn.

So yeah you can start with Python.

edgy hearth
#

ik everything about python

#

almost

#

what do you want to give info

#

yess plz

#

lol which hand

#

yeah even i want an example

#

hehe

#

yes it is

#

i dont knwo ai

#

so im asking what to do

#

ye

zinc rock
#

random question but is pytorch supposed to download painfully slow

#

the wheel for older python version doesnt work so i have to use conda install

velvet thorn
zinc rock
#

tensorflow installs like

#

instantly

#

is there a way to speed things up im not sure why the pip installs dont work

austere swift
#

yeah their repos are pretty slow

#

you can't do anything about it

tough bolt
tough bolt
# zinc rock

installed it today, I'd usually have 100 mbit down yet it took me what felt like 30 minutes

tough bolt
drowsy wraith
#

Does someone know an easy way to install opencv with cuda?

serene scaffold
drowsy wraith
#

ubuntu

tender hearth
#

... Linux breaks sometimes, don't judge

lapis sequoia
#

hi guys, can i you help with an assignmen, that I can't solve ?

tough bolt
solid raptor
#

can anyone here guide me for a time series based dataset
i'm having trouble processing it
basically not knowing where to start

#

Its for an assignment but any guidance would be appreciable

tough bolt
solid raptor
#

Thank u for replying @tough bolt
sorry i was collecting images

#

Basically this is the data
it has 3Mx3 rows and colums
each of patientid,date of the incident and incident

serene scaffold
solid raptor
#

its a dataset of 27k unique patients
having different incidents on diff dates
and our motive is to predict who can survive a new "Target drug" based on their historical incidents

#

Some of the patients present in the test file are eligible for the drug prescription within a month and some of them are not, using each patient’s historical data predict if he/she is eligible for the “Target Drug”

#

the problem i'm facing is i dont know on what factor should i configure if someone is eligible for the drug or not

#

i can pm u the assignment if i couldn't explain the problem properly

feral patrol
#

Hi,
I am having problem figuring which version of python is on the cluster.
If I type spark-submit --version
2.2.0.cloudera2 Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_131
If I type python --version I get
Python 2.7.6
Reason is that I am trying to use a version of OneHotEncoder, but it changed in 2.4

#

The error I get using OneHotEncoder is TypeError: __init__() got an unexpected keyword argument 'outputCols'
I cannot import OneHotEncoderEstimator

zealous hinge
#

this probably doesn't help you any, but are you sure you want to use python2? It's quite obsolete

#

unsupported, etc

#

also IME it's not super-obvious which version of python that spark will run, if you're submitting a python script

#

the one time I did that, I made sure to put my preferred python right at the front of PATH

feral patrol
#

yeah, its not up to me. Else I would had switch

zealous hinge
#

can you submit a simple script that looks like ```py
import sys
print("Hello world! I am python", sys.version)

#

That would tell you which version is running on the cluster

feral patrol
#

('Hello world! I am python', '2.7.13 |Anaconda 4.3.1 (64-bit)| (default, Dec 20 2016, 23:09:15) \n[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]')

#

With that then I will just work with the idea that I am in 2.7 and find a work around.

zealous hinge
#

👍

proven quail
#

Hi, i have some issue with a code, someone have deep understanding in matplotlib that i can pm please ?

feral patrol
#

Is there an IDE to recommend for someone starting? Something that would help me get to documentation faster, import helps, autofill after a tab, and stuff like that.

zealous hinge
#

VSC is popular

#

I kinda dig it, kinda

#

it's not at all specific to data science, that I know of, although there might be some handy plugins

feral patrol
#

Is it hard to set up to be able to code in PySpark? I created a new file, and you can only get Python.
Sadly, I am not allow to use Pandas atm. I used to know why, but I forgot :S Maybe because distributed environment.

zealous hinge
#

I'd say "no" since I've done it, and I'm dumb 🙂

#

dunno what you mean by "you can only get Python"

feral patrol
#

I couldnt load any spark libraries, so it was basically Python only. I am trying to set the environment right now.

zealous hinge
#

how were you trying to "load spark libraries"?

feral patrol
#

from pyspark import SparkContext

zealous hinge
#

what happened?

#

when I do that, iirc, it pauses for like 30 seconds, but then returns sucessfully

#

oh, no, it's when I do wat = SparkContext() that takes forever. (It's spinning up a giant Java process in the background)

feral patrol
#

imports cannot be resolved by Pylance
Sadly I see that I would also need to create a whole new envieronment using python 2.7 to be able to work on it.
I am trying to use OneHotEncoder, which I manage to get to work in Docker with Pythong 3. Sadly int he cluster is 2.7
So it doesnt accept multiple columns, so I went around it and made each column at a time.
and now it tells me that
"'OneHotEncoder' object has no attribute 'fit'"
Which I dont know how to go around. I need to learn more about how to find documentation (but from 2.7)

zealous hinge
#

pylance might only work with python3, for all I know

#

I don't know what OneHotEncoder is, but it too might only work with python3

#

that's the world you're in, you'll have to get used to it

feral patrol
#

its been on python since 2.4 under that name, and it does work on the 2 previous lines. I just need to figure whats the translation from "fit" to python 2.7

tacit basin
#

What is clustering method where cluster centers are points from input data. For example kmeans will optput cluster centeres that are not in general points from input data. Is there a such an algorithm in scikit learn for example?

serene scaffold
#

is there any reason that you can't pick the points closest to the centroid determined by kmeans?

tacit basin
#

i was thinking that calculating kmeans and then finding closest point would be like duplicating calculations. i am am thinking about tweaking kmeans to come up with centeres that are in points.

celest light
celest light
tacit basin
tender stag
#

hey can someone help me with evaluating certain pts in a numpy array without looping

errant parcel
#

Whats a good way of cheaply ingesting massive amounts of table data into some cloud service (so i can move it around and download specific parts more easily)

#

would be terabytes as json so i want a proper format ideally but also something that i can easily append to and won't become corrupted if a write messes up etc

#

(would be time series and frequently written to)

zealous hinge
#

I imagine AWS, Google, Azure, etc have that sort of thing -- Azure's is called "Databricks" iirc

errant parcel
#

data lakes is a new word to me, looks like i should do some more reading, thanks

#

i was hoping to avoid ingesting it into an actual database service just cause that makes it harder to download chunks to work with offline

#

or i assumed it would at least

umbral skiff
#

I have html code and I want get value with regular expression, but I'm note getting.

<td>Vínculo</td>
                                <td>CARGO COMISSIONADO</td>
vinculo = re.findall("""<td>Vínculo</td>
                                <td>([A-Z]+)</td>""", html_detalhes)                             ```
errant parcel
#

what

#

it's a different word

#

Vínculo vs Matrícula

#

??

royal crest
#

i presume they want everything within <td> and </td>

errant parcel
#

oh

umbral skiff
#

I want to get the value "CARGO COMISSIONADO"

royal crest
#

i just think your regex is flawed

umbral skiff
#

code correct!

vinculo = re.findall("""<td>Vínculo</td>
                                <td>([A-Z]+)</td>""", html_detalhes)
tender hearth
#

er, this is off-topic for this channel

royal crest
#

regex needs a channel of its own 😜

royal crest
#

whooooosh

lime current
#

Hi, I am new to Machine learning .

#

I need someone's guidance on a project which I picked up from "ineuron open data science project".
Hoping for a start to end mentorship.
(Some of the work like data scraping, data preprocessing , pipeline, HLD and LLD documentation).

hollow palm
#

Hello, can anyone help me understand what this means:

#

I am not sure what the phi symbol is at all

#

But the equation is something to do with Gaussian Models

velvet thorn
tacit basin
prisma mulch
#

can someone eli5 dropout to me

#

and why it is so great

desert oar
#

this looks like a bayesian mixture model. this line states that the likelihood of x, given the full set of parameters Θ, is a weighted sum of two different gaussian likelihoods

desert oar
# prisma mulch can someone eli5 dropout to me

having fewer brain cells makes you less clever. models should be clever enough to make generalizations. but if your model gets too clever, it starts to find patterns in the data that don't exist. this is bad. so we make our models less clever in order to make them smarter in the long run.

in other words, dropout helps prevent overfitting.

velvet thorn
#

I like it a lot

celest light
outer girder
#

i'm struggling to create an array, is it possible to create one between 2 aranges?

#

its not creating the list as intented

#

is it because they are different sizes?

serene scaffold
#

@outer girder if A and B are different shapes then that won't work, I don't think.

#

Arrays have to be "rectangular" for whatever number of dimensions they have.

outer girder
#

its my first week in coding

serene scaffold
#

That's okay! We can help you

#

What are you trying to do exactly?

outer girder
#

the program im trying to write is something like this

#

Write a program that prints a table converting Fahrenheit degrees to degrees
Celsius. The values ​​must be calculated from 5th to 5th and the maximum and minimum limits must
be chosen by the user.

serene scaffold
#

Why are you using numpy? Is this for a data science class?

outer girder
#

we have been using numpy since week one, but im pretty sure im not "forced" to use it

#

i just dont know anything else besides it xD

serene scaffold
#

Numpy encourages you to think about your data differently than general python usage

#

If this isn't for a data science class then I wouldn't use it

outer girder
#

the class is called " Computation for Geologists" (which is my field)" xD

#

so i have no clue if its considered data science or not

serene scaffold
#

Then numpy would probably help you

outer girder
#

so do you have any clue where i went wrong with the code?

#

and where i could change the array

ripe forge
#

sidenote, (havent seen the core question yet) i think you should learn both the builtin datatypes, and numpy, and learn when to use which.

serene scaffold
#

I don't know what you mean by "from 5th to 5th"

ripe forge
#

that will help you out in the long run

outer girder
#

its like 0ºc to 5ºc to 10ºc

serene scaffold
#

If you're making a table then I would use pandas

#

I'll be back soon, hopefully.

ripe forge
#

oh nah nah, i think all that is overkill

velvet thorn
ripe forge
#

to me it sounds like a simple task trying to teach you loops

ripe forge
#

lol

serene scaffold
ripe forge
#

to me, sounds like this assignment is trying to teach range

#

the instructions are like this: print stuff from blah1 to blah2 in increments of 5

serene scaffold
#

They introduced numpy in week one 🤷‍♂️

ripe forge
#

i.. uh... touche.

outer girder
#

something like that atleast

#

i'm gonna try to learn range and come back with the results xD

feral patrol
#

I am using StringIndexer (python 2.7) and I am trying to understand if all the information is being put on the on the driver. I am looking at the spark.apache documentation for my version, but I cannot decide where the data is taken from.
This is not the firs time that I have this doubs, is there a way/place to see this easier?

tacit basin
old grove
#

Hyperparemeter tuning is decreasing accuracy....any idea what could be the problem ?

tough bolt
#

Is anyone here familiar with NetworkX?

#

I'm not sure where or how to correctly set the relationship between nodes

#

e.g.
the default graph

#

where would I define the distance between 3 and 2

or 2 and 1?

median fulcrum
#

anyone that is familiar with spacy?

#

I need some help, tried various help channels and servers but seems that don't have a lot of people with knowledge on this lib

serene scaffold
median fulcrum
#

I think this part is the problem

serene scaffold
median fulcrum
#

ValueError: [E966] nlp.add_pipe now takes the string name of the registered component factory, not a callable component. Expected string, but got <spacy.pipeline.textcat.TextCategorizer object at 0x7fc953d7def0> (name: 'None').

serene scaffold
#

alright, let me see.

median fulcrum
#

this is very common while trying to apply code that works in older versions of spacy, but the cases are different

serene scaffold
median fulcrum
serene scaffold
serene scaffold
median fulcrum
#

oh

#

sorry

serene scaffold
#

just try deleting model.add_pipe(categories) and see if it works.

serene scaffold
#

🔥 happypeepo

median fulcrum
#

probably model.add_pipe(categories) was necessary in this case

median fulcrum
#

:/

#

oh no

#

ValueError: [E989] nlp.update() was called with two positional arguments. This may be due to a backwards-incompatible change to the format of the training data in spaCy 3.0 onwards. The 'update' function should now be called with a batch of Example objects, instead of (text, annotation) tuples.

#

spacy 3.0 please don't

#

😫

#

@serene scaffold what means 'with a batch of Example objects'?

serene scaffold
#

some spaCy contributor I am, I know

median fulcrum
median fulcrum
#
from spacy.training.example import Example

model.begin_training()


for epoch in range(1000):
  random.shuffle(final_data_base)
  losses = {}
  for batch in spacy.util.minibatch(final_data_base, size=30):
    texts = [model(text) for text, entities in batch]
    annotations = [{'cats': entities} for text, entities in batch]
    example = Example.from_dict(texts, annotations)
    model.update([example], losses=losses)
  if epoch % 100 == 0:
    print(losses)
    historic.append(losses) 
#

my code is that

#

any idea @serene scaffold

#

?

serene scaffold
serene scaffold
serene scaffold
arctic wedgeBOT
#

Please provide a full traceback to your exception in order for us to identify your issue.

A full traceback could look like:

Traceback (most recent call last):
    File "tiny", line 3, in
        do_something()
    File "tiny", line 2, in do_something
        a = 6 / 0
ZeroDivisionError: integer division or modulo by zero

The best way to read your traceback is bottom to top.

• Identify the exception raised (e.g. ZeroDivisionError)
• Make note of the line number, and navigate there in your program.
• Try to understand why the error occurred.

To read more about exceptions and errors, please refer to the PyDis Wiki or the official Python tutorial.

solid raptor
#

Ok so i was trying to create dummy variables of large catogorical data
and this is what i got
train["Patient-Uid"] = pd.get_dummies(train["Patient-Uid"],drop_first=True)
Error:
MemoryError: Unable to allocate 81.1 GiB for an array with shape (3220868, 27033) and data type uint8

#

🙂

lofty plover
#

Hey, I've been asked to calculate a weighted average for binned data (basically a histogram) using the method mentioned here:
https://stats.stackexchange.com/questions/531794/how-to-calculate-the-mean-from-bin-endpoints-and-frequencies
However, the final bin in my data doesn't have an endpoint for me to calculate a midpoint from.

minor geyser
#

Whats the best way to learn datascience (maybe from a begginer standpoint of python)

azure marsh
wary phoenix
#

from random import randint

class Character:
def init(self):
self.name = ""
self.health = 1
self.health_max = 1
def do_damage(self, enemy):
damage = min(
max(randint(0, self.health) - randint(0, enemy.health), 0),
enemy.health)
enemy.health = enemy.health - damage
if damage == 0: print "%s evades %s's attack." % (enemy.name, self.name)
else: print "%s hurts %s!" % (self.name, enemy.name)
return enemy.health <= 0

class Enemy(Character):
def init(self, player):
Character.init(self)
self.name = 'a goblin'
self.health = randint(1, player.health)

class Player(Character):
def init(self):
Character.init(self)
self.state = 'normal'
self.health = 10
self.health_max = 10
def dfffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

azure marsh
#

Is that a very long datafffframe?

zealous hinge
#

it's a long sigh of exasperation.

serene scaffold
lapis sequoia
#

Does anyone know how I can create this?

#

I made an example of it using this

#
ggplot(data = mpg, aes(y = hwy, c = drv)) +
geom_boxplot(fill = 'darkgreen')
#

but they want me to create a ggplot box plot that show the distribution of Calories in the following Starbucks beverages:

Classic Espresso Drinks
Frappuccino® Blended Coffee
Shaken Iced Beverages

#

anyone knows how to do that it would help me a lot.

serene scaffold
#

I was fired from Starbucks today.
So triggered rn.

zealous hinge
#

😦

serene scaffold
lapis sequoia
serene scaffold
lapis sequoia
#

R and python

serene scaffold
lapis sequoia
#

I don’t know😹

#

Teacher wants it so

serene scaffold
#

Uh okay

#

Well I guess make the dataframe in python first

lapis sequoia
#

Yea idk how to do that for just the three drinks

serene scaffold
#

Which three?

lapis sequoia
#

Classic Espresso Drinks
Frappuccino® Blended Coffee
Shaken Iced Beverages

#

I have a table but idk how to do it

serene scaffold
#

Those are categories of drinks

#

Trust me, I worked there from January 2016 until today, coincidentally.

#

Anyway, you can use loc

#

And this

#

I can't be much more helpful at the moment as I'm on my phone. I might remember to check on you later @lapis sequoia

lapis sequoia
serene scaffold
#

That's fine

lapis sequoia
#

And Idk how to make it as a code

serene scaffold
lapis sequoia
#

Starbucks.csv

serene scaffold
#
import pandas as pd
#

Start with that

#

And then use pd.read_csv

lapis sequoia
#

This is what I have

serene scaffold
#

Yay

lapis sequoia
#

That’s what she gave me

serene scaffold
#

This is R. I can't help with that.

lapis sequoia
#

Damn all good thx tho

#

I did python I’m having trouble wit R

#

Do you know any discord server that can help me with it?

serene scaffold
#

Let me see

lapis sequoia
#

Thanks

main fox
azure marsh
serene scaffold
serene scaffold
main fox
#

Lol I hope it doesn't affect any future job search you may have

serene scaffold
umbral skiff
#

I have a code that extracts data from html pages and makes some filters until generating this list of dictionaries. I want the "header" information to be the header of a CSV file, but I don't know how to do this correctly. Does anyone have a tip?

I want the file with these columns:

"Matrícula","Referência","Vínculo","Servidor","Cargo","CPF","Lotação","Remuneração","Abono","Eventuais","Desconto","Salário Líquido"
def filtrador():
  informacoes = []
  for item in raspador():
    header = item[1].strip("</td>")
    info = item[2].split("</td>")
    informacoes.append({header: info[0]})
  return informacoes
    
filtrador()


[output]

[{'Matrícula': '00101105'},
 {'Referência': '09 / 2021'},
 {'Vínculo': 'CARGO COMISSIONADO'},
 {'Servidor': 'DOUGLAS HENRIQUE SANTOS'},
 {'Cargo': 'SECRETARIO PARLAMENTAR'},
 {'CPF': '***42475'},
 {'Lotação': 'COMISSIONADO - GABINETE'},
 {'Remuneração': 'R$ 4.800,00'},
 {'Abono': 'R$ 0,00'},
 {'Eventuais': 'R$ 0,00'},
 {'Desconto': 'R$ 849,42'},
 {'Salário Líquido': 'R$ 3.950,58'},
 {'Matrícula': '00092175'},
 {'Referência': '09 / 2021'},
 {'Vínculo': 'CARGO COMISSIONADO'},
 {'Servidor': 'DULCEANA PALMEIRA DE SA'},
 {'Cargo': 'CHEFE DE GABINETE 2oSECRETARIO'},
 {'CPF': '***31400'},
 {'Lotação': 'MESA'},
 {'Remuneração': 'R$ 9.100,00'},
 {'Abono': 'R$ 0,00'},
 {'Eventuais': 'R$ 0,00'},
 {'Desconto': 'R$ 2.178,33'},
 {'Salário Líquido': 'R$ 6.921,67'},
 {'Matrícula': '00092182'},
 {'Referência': '09 / 2021'},
 {'Vínculo': 'CARGO COMISSIONADO'},
 {'Servidor': 'EDIJANE ALVES SANTOS SILVA'},
 {'Cargo': 'CARGOS DE NATUREZA ESPECIAL'},
 {'CPF': '***14404'},
 {'Lotação': 'MESA'},
 {'Remuneração': 'R$ 2.250,00'},
 {'Abono': 'R$ 0,00'},
 {'Eventuais': 'R$ 0,00'},
 {'Desconto': 'R$ 199,30'},
 {'Salário Líquido': 'R$ 2.050,70'}]

serene scaffold
umbral skiff
boreal loom
#

Any ideas on how to optimize this piece of code?

#
dataframe_tokenized_speech_Only = all_data_tokenized_FreqDist_df[["Speech"]]
dataframe_tokenized_speech_Only

for country_year_vector, Speech_dictionary in tqdm(dataframe_tokenized_speech_Only.iterrows()):
    country =(country_year_vector[0])
    year = country_year_vector[1]
    
    for key,value in Speech_dictionary["Speech"].items():
        if key in dataframe_tokenized_speech_Only.columns:
            dataframe_tokenized_speech_Only.loc[country, year][key] = value
        else:
            dataframe_tokenized_speech_Only[key] =0
            dataframe_tokenized_speech_Only.loc[country, year][key] = value
arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

boreal loom
serene scaffold
#

Thanks

boreal loom
#

I am looking into numba, is that a thing?

serene scaffold
#

But what are you trying to do?

#

You want to avoid for loops as much as possible.

boreal loom
#

Yeah

#

Unfortunately the dataframe, has a nested dictionary

serene scaffold
#

Why

boreal loom
#

Unlucky

#

What can i say

#

Someone from the preprocessing gave it like this

serene scaffold
#

I would straighten that out before trying to work around it.

serene scaffold
austere swift
boreal loom
#

Numba seems to like for loops

austere swift
serene scaffold
#

Regardless, if one column of a dataframe contains dicts, make a separate data frame with the same index and expand the dicts into columns.

boreal loom
#

Yeah that would make sense, but the dictionary is different in every row

#

Tried that approach too

serene scaffold
#

Different sets of keys?

boreal loom
#

Yeppity Yippity

serene scaffold
#

Whatever data they contain, think of how you would structure it in a database

boreal loom
#

I will try to talk them into not giving me that monstrosity

serene scaffold
#

Also your variable names are quite long

#

But I'll let you do the cost benefit analysis on that.

azure marsh
#

You might be able to do some kind of join on the two instead of your double for loops. The intersecting columns will be the same for each row

#

Agree on the variable names, it's generally recommended to not include the type in the variable name anymore (aka Systems hungarian notation) as modern tools make it easy to ascertain the type. EDIT: correction on specific type of hungarian

#

and it just takes up space, making it harder to understand what's going on. It's certainly better than too short names, though.

#

A good read is the chapter "Meaningful Names" in the book Clean Code

velvet thorn
#

that's not what Hungarian notation was originally meant to be

#

"type" was meant in the business case sense, not the formal type sense

azure marsh
#

I know that is not what it was originally meant to be

velvet thorn
azure marsh
#

but the end result in the code is similar

velvet thorn
azure marsh
#

The type of the structure is in the variable name

velvet thorn
#

if you do it as it was meant to be, it isn't

#

(unless you have, like, refinement types, but most languages don't)

azure marsh
#

Not really clear on what you're getting at in relation to their code, in general it adds unnecessary verbiage

#

Do you disagree that they should remove "dataframe_" as a prefix?

azure marsh
#

That's my only point.

velvet thorn
azure marsh
#

Ok, no disagreements there.

lilac hull
#

so.. idk if this is the right place to ask, but i recently made an object detection project using opencv. it works fine, except for the fact that it detects things like chairs as toilets, and spectacles as scissors. is there a way to make the model better?

onyx drum
#

Just used np.savetxt() to store a huge python array into a text file on disk. However, this spiked up my RAM a lot and I didn't store this np.savetxt() into a specific variable to be able to delete the array

Where is this RAM held and how do I release it? (I can't reset the Jupyter notebook because it takes a looong time to re-simulate my stuff)

lapis sequoia
lapis sequoia
#

gc.collect()

lone drum
#

Hello
I am using resample function of pandas .
I have tick by tick data
I want that data in minutewise

#

Can anyone look into this?

royal crest
lone drum
#

My code

for i, chunk in enumerate(pd.read_csv(f'{path}{file_name}{extension}' , engine='python',  chunksize=250000 , iterator=True,  names = ['Msgtype', 'Activity Type', 'Transaction Time', 'script_name', 'expiry', 'strike_price', 'call/put', 'Exchange', 'Token', 'Buy/Sell', 'Buy Order number', 'Sell order number', 'Price', 'qty', 'price_in_rupees', 'Lot'])) :
    chunk['Transaction Time'] = pd.to_datetime(chunk['Transaction Time'], errors='coerce')    
    
    chunk = pd.DataFrame(chunk).set_index('Transaction Time')
    print('chunk1...')
    print(chunk)
    print()
        
    chunk2 = chunk.resample('T')['price_in_rupees'].agg(['first', 'max', 'min', 'last']).set_axis(['Open', 'High', 'Low', 'Close'],axis=1)
    chunk2 = chunk2[chunk2.Close > 0]
    
    print('chunk2...')
    print(chunk2)
    print()
    
    chunk2.to_csv(f'{new_path}{output_file_name}{extension}',  mode= 'a', header=None)
royal crest
#

use the level arg

#

please check out the links i've attached.

lone drum
royal crest
#

If that works, then yes

#

though it should be a str or an int as per the documentation

#

just be wary

#

it also must be datetime-like.

lone drum
#

See my data this way

royal crest
#

Please don't ping me either directly or indirectly, I am right here.

lone drum
#

I am getting

Traceback (most recent call last):

  File "E:\python files\resample_practice.py", line 26, in <module>
    chunk2 = chunk.resample('T', level = 'm')['price_in_rupees'].agg(['first', 'max', 'min', 'last']).set_axis(['Open', 'High', 'Low', 'Close'],axis=1)

  File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\core\generic.py", line 8369, in resample
    return get_resampler(

  File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\core\resample.py", line 1311, in get_resampler
    return tg._get_resampler(obj, kind=kind)

  File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\core\resample.py", line 1466, in _get_resampler
    self._set_grouper(obj)

  File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\core\groupby\grouper.py", line 381, in _set_grouper
    raise ValueError(f"The level {level} is not valid")

ValueError: The level m is not valid
#

Above error

#

Ping me when replying

royal crest
#

Have you checked out the links I have attached?

#

One of them is a comprehensive user guide.

lone drum
#

Which link?

lone drum
royal crest
#

What have you tried?

lone drum
#

I tried minutewise example vivek in doc

#

But not get expected output

royal crest
#

Could you share the part of the code where you've made changes?

#

What is the expected output and what is the output you are getting?

lone drum
#

I tried chunk.resample('1T')
But not worked as u can see abov ss u can see i am getting same output

#

I am expecting
Open, high, low , close columns for each time

#
09:15
09:16
09:17
...
15:30
``` this way
#

Do u get my point?

royal crest
#

I think it's tricky for me to explain without the data in hand. Would you mind sharing the csv of the data?

lone drum
tender hearth
#

What do you guys think about PyTorch Lightning

royal crest
#

maybe chop it to the first 1000 columns?

lone drum
royal crest
#

DMs are reserved for discord friends, sorry.

lone drum
#

Can I provide u ss of data
Can u please make dummy CSV from it

royal crest
#

What's SS?

#

If it's a link to the data, sure

lone drum
#

Screenshot of data

lilac hull
#

pls help

#

im using the coco algorithm

rigid zodiac
#

Hi Everyone, I have a quick question. I keep getting this error, how can i fix it

random nest
lilac hull
random nest
#

It’s a funny issue tbh

lilac hull
#

yeah lol

serene scaffold
rigid zodiac
dull turtle
#

hello my data this way python Activity Type script_name ... price_in_rupees Lot Transaction Time ... 2011-08-28 09:15:02.006138097 N BANKNIFTY ... 4734.30 47.0 2011-08-28 09:15:02.707897899 N BANKNIFTY ... 2555.95 47.0 2011-08-28 09:15:03.373856246 N BANKNIFTY ... 2556.00 20.0 2011-08-28 09:15:04.159525439 N BANKNIFTY ... 6071.85 47.0 2011-08-28 09:15:05.213452151 M BANKNIFTY ... 2556.05 47.0 ... ... ... ... ... 2011-08-28 09:15:20.175758062 N BANKNIFTY ... 125.00 1.0 2011-08-28 09:15:20.175804372 M BANKNIFTY ... 149.10 1.0 2011-08-28 09:15:20.176193109 M BANKNIFTY ... 148.60 1.0 2011-08-28 09:15:20.176239215 M BANKNIFTY ... 150.70 8.0 2011-08-28 09:15:20.176248648 M BANKNIFTY ... 150.90 8.0 this way

#

i want to above tick by tick data converted into opne, high, low, close columns

serene scaffold
dull turtle
serene scaffold
#

I'm still not clear on what you are trying to do.

dull turtle
serene scaffold
dull turtle
#

i have stock market tick by tick data. I want to convert that data in open, high, low, close columns

serene scaffold
#

alright. let me know when you've provided enough data for me to solve this.

serene scaffold
#

while we're at it, try print(df.head(30).to_csv()) and put it in the paste bin

#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

dull turtle
# serene scaffold this does not help, unfortunately. why don't you do `print(df.head().to_csv())`?
Transaction Time,Activity Type,script_name,expiry,strike_price,call/put,Exchange,Token,Buy/Sell,Buy Order number,Sell order number,Price,qty,price_in_rupees,Lot
2011-08-28 09:15:02.006138097,N,BANKNIFTY,2021-09-02,31500.0,CE,NSEFO,39203,SELL,0,1400000000013113,473430,1175,4734.3,47.0
2011-08-28 09:15:02.707897899,N,BANKNIFTY,2021-09-02,31500.0,CE,NSEFO,39203,BUY,1400000000019595,0,255595,1175,2555.95,47.0
2011-08-28 09:15:03.373856246,N,BANKNIFTY,2021-09-02,31500.0,CE,NSEFO,39203,BUY,1400000000027793,0,255600,500,2556.0,20.0
2011-08-28 09:15:04.159525439,N,BANKNIFTY,2021-09-02,31500.0,CE,NSEFO,39203,SELL,0,1400000000034501,607185,1175,6071.85,47.0
2011-08-28 09:15:05.213452151,M,BANKNIFTY,2021-09-02,31500.0,CE,NSEFO,39203,BUY,1400000000019595,0,255605,1175,2556.05,47.0```
serene scaffold
#

please put that in the paste bin when you have it.

dull turtle
serene scaffold
dull turtle
#

let me try

serene scaffold
#

but I just need enough rows to cover two days

#

please ping me with the URL to the paste bin when you have done this.

dull turtle
#

okay

serene scaffold
eager heath
#

@dull turtle could you give us your full csv file please?

dull turtle
#

i am not able to open that csv file

#

so i am giving u data read by python

serene scaffold
#

The index of each row is a timestamp and I asked for enough rows that cover two calendar days worth of timestamps. You can even just include something like five rows for two calendar days (for a total of ten rows).

dull turtle
#

but csv file too big that i am not able to open it directly

#

can u please add some dummy data to it so rows get incresed

#

can u help me how i can get data for per minute

#

for e.g. i have data in seconds and in microseconds so i want to combine all values to get single one minute data

#
2011-08-28 09:15:02.006138097,N,BANKNIFTY,2021-09-02,31500.0,CE,NSEFO,39203,SELL,0,1400000000013113,473430,1175,4734.3,47.0
2011-08-28 09:15:02.707897899,N,BANKNIFTY,2021-09-02,31500.0,CE,NSEFO,39203,BUY,1400000000019595,0,255595,1175,2555.95,47.0
2011-08-28 09:15:04.159525439,N,BANKNIFTY,2021-09-02,31500.0,CE,NSEFO,39203,SELL,0,1400000000034501,607185,1175,6071.85,47.0
2011-08-28 09:15:05.213452151,M,BANKNIFTY,2021-09-02,31500.0,CE,NSEFO,39203,BUY,1400000000019595,0,255605,1175,2556.05,47.0
2011-08-28 09:15:04.404004891,M,BANKNIFTY,2021-09-02,35700.0,CE,NSEFO,39570,BUY,1400000000035647,0,26885,25,268.85,1.0
2011-08-28 09:15:04.405367275,N,BANKNIFTY,2021-09-02,35700.0,CE,NSEFO,39570,BUY,1400000000036502,0,26875,25,268.75,1.0
2011-08-28 09:15:04.405433392,M,BANKNIFTY,2021-09-02,35700.0,CE,NSEFO,39570,BUY,1400000000035647,0,26915,25,269.15,1.0
2011-08-28 09:15:04.405443075,M,BANKNIFTY,2021-09-02,35700.0,CE,NSEFO,39570,BUY,1400000000032054,0,26660,75,266.6,3.0
2011-08-28 09:15:04.405504048,M,BANKNIFTY,2021-09-02,35700.0,CE,NSEFO,39570,BUY,1400000000036395,0,26920,25,269.2,1.0
2011-08-28 09:15:12.178591633,M,BANKNIFTY,2021-09-02,35600.0,CE,NSEFO,39568,SELL,0,1400000000090331,30555,25,305.55,1.0
2011-08-28 09:15:12.178672232,M,BANKNIFTY,2021-09-02,35600.0,CE,NSEFO,39568,SELL,0,1400000000090552,30550,50,305.5,2.0
2011-08-28 09:15:12.178735441,M,BANKNIFTY,2021-09-02,35600.0,CE,NSEFO,39568,BUY,1400000000002103,0,21515,25,215.15,1.0
2011-08-28 09:15:12.17874251``` my data is this way  so i want only single value ```python
date                  open    high    low    close
28-08-2011 09:15:00    val1   val2    val3    val4
28-08-2011 09:16:00    val1   val2    val3    val4``` this way and so on
#

@eager heath do u get my point what i am trying to do ?

eager heath
#

I don't know, I am not a datascience person :D

dull turtle
eager heath
#

I believe Steele had to get back to work, but I'm sure someone will come and he'll you. If not, feel free to ask in an help channel!

arctic wedgeBOT
#

Hey @plush leaf!

It looks like you tried to attach file type(s) that we do not allow (.ipynb). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

dull turtle
#

just ping me when someone reply

dusty cloud
#

Does anyone knows how to extract labels from sklearn's Pipeline object?

#

I tried after fitting the pipeline, which gives error

pipe_kmeans = Pipeline([('clustering', KMeans())])
pipe_kmeans.fit(X)
pipe_kmeans.named_steps['clustering'].labels_   #err
lapis sequoia
dull turtle
lapis sequoia
#

no I'm busy today. I'm sorry.

#

just gave you reference which may help.

misty flint
#

np.where() is op Praise

fiery sedge
#

Hi! I have a question,Can I do a neural network that his work is to see for the actions of a person on his phone and collect those information? It's possible?

quasi parcel
#

@fiery sedge can you elobarate why do you need neural network

#

you mean through app

#

you can do some thing called clickstream

#

which will listen users information on phone app like what a user is doing and then

#

send it to a s3

#

its pretty easy to setup the backend

#

but if you have an app in android

#

they need to pass this triggered event to this api which handles clickstrem

#

if you need i have a snippet of code which can handle this clickstream

#

so basically you will be creating a data lake which will have all users triggered events

fiery sedge
#

Ok I understand, but I mean in general, not only in an App, I mean to know the actions in his cell phone like for example which apps he use more time, or access to the contacts, it's possible to have this funtionality with only a neuronal network?

quasi parcel
#

with only neural networks

#

means i think you can use LSTM

#

long-short term memory

#

i think i read about this

#

one moment

misty flint
#

i dont think you need a neural net for that. just sounds like sketchy data collection and then regular analysis

quasi parcel
#

that is what i suggested

#

we can do a data lake for all event triggers by users

#

@misty flint

#

generally we can track user behaviour with lstm

#

so one moment

#
Amazon Web Services

Clickstream events are small pieces of data that are generated continuously with high speed and volume. Often, clickstream events are generated by user actions, and it is useful to analyze them. For example, you can detect user behavior in a website or application by analyzing the sequence of clicks a user makes, the amount of […]

#

@fiery sedge

fiery sedge
#

Ok I'm gonna read it! 😀

#

👍

quasi parcel
#

this is one of the methods

#

to get the users data to database or storage unit

rigid zodiac
#

can some one please look at my code? i'mtrying to feed multiple csv file in and do some conversion then save each of them down

#

also where can I paste my code so you can check it

#

!code

quasi parcel
#

please use this

#

@rigid zodiac

rigid zodiac
#

thank you, here is my code

#

I successfully feed them in, but dont know how to save it in separate file

quasi parcel
#

i think it is fname

#

where did you define filename and plotnumbers

#

?

rigid zodiac
#

fname is just any file in side the folder

#

so what i did was feed each file in separately and then run it through those

rigid zodiac
#

i'm trying to break each one of those into a csv file

quasi parcel
#

okay

#

so in each csv file

#

you have this columns

rigid zodiac
#

Yep and with that many row

#

so far I add the df.to_csv at the end but it keep written on top of itself

quasi parcel
#

what is the error you are getting

#

is there any error

rigid zodiac
#

nothing, it just give me back 1 csv file instead of 1000 csv file

quasi parcel
#

which name?

#

df.to_csv('/content/drive/MyDrive/Huy_2/nonfall_2ft_groupby/'+filename+str(plot_numbers) + '.csv', index=False)

#

cause in this you have mentioned filename

#

where is this filename is getting updated

#

?

rigid zodiac
#

I dont want it to update, i just need it to save down as a csv file...

quasi parcel
#

yes dude get it can you show me in the code where you are assigning filename?

rigid zodiac
#

like the code before I break that massive csv file into like 1000 csv filed ??

rigid zodiac
quasi parcel
#

df.to_csv('/content/drive/MyDrive/Huy_2/nonfall_2ft_groupby/'+filename+str(plot_numbers) + '.csv', index=False) in this line can u tell me where is filename is getting set

#

?

rigid zodiac
#

agh... that... that's from previous... i see, let me try to remove it

#

still same issue

#

only 1 file coming out after I remove it

quasi parcel
#

can you share the code after that

rigid zodiac
quasi parcel
#

try this

rigid zodiac
quasi parcel
#

create a folder called nonfall_2ft_groupby

rigid zodiac
#

I do have that

#

seems like it has .csv.csv

quasi parcel
#

try this

rigid zodiac
#

same error code

quasi parcel
#

show me the error

#

?

#

please

rigid zodiac
quasi parcel
#

try this

rigid zodiac
#

it still have the error

quasi parcel
rigid zodiac
#

same file... I may just delete that file then

#

nope still not work

#

😦

quasi parcel
#

same error?

rigid zodiac
quasi parcel
rigid zodiac
#

let me reset it again

rigid zodiac
#

idk what is going on, like I delete it and it still have the same error

#

keep it still the same

quasi parcel
#

can you share the datasets?

#

if you are okay

#

?

rigid zodiac
#

all of it?

#

I can give you the original 1, then the code to break it. it will be easier

quasi parcel
#

sure

rigid zodiac
#

for dataset... how can I send it to you

quasi parcel
#

one moment

#

i think i found it

rigid zodiac
#

it is running hold on

rigid zodiac
quasi parcel
#

can you dm

#

?

foggy shuttle
#

Hi guys, Can anyone point me to the right resources to start with computer vision video detection problems with transformers. I know NLP but am new to Computer Vision.

quasi parcel
#

Hi

#

i think this should be valid source @foggy shuttle

foggy shuttle
vale zephyr
#

Hi ! Anyone know how to do the equivalent of cv2.inrange for HSV color thresholding in PyTorch ?

red pecan
#

I wanted to know why is everything related to ai super popular with python in comparison to other languages like c#?

#

Do please @ me if you mind?

grave frost
#

all the rich kids use python

velvet thorn
#

it depends on which part of the stack you're talking about

#

the low-level networking code, everything that handles transactions etc., yes

#

in those contexts, Python is good for backtesting/experiments, minimally

#

and perhaps ML

velvet thorn
#

but mathematicians

#

and CPython, being dynamically typed + interpreted, is generally easier to work with

#

more or less. the general pattern is: Python bindings for user-friendliness, C/C++/Fortran backend for speed.

#

a really good example is numpy

#

in general, debugging numpy issues is simple

#

compared to going through the underlying BLAS/LAPACK

#

yeah. C is at least reasonably readable by someone who doesn't know it

#

given proficiency in other languages

#

but C++ is a lot more complicated

#

sometimes I check out CPython source to understand how something works

#

if it was C++Python I would probably be like 🥴 and then 😔

#

agreed

#

isn't it weird

#

that VB and Python

#

are basically the same age?

#

even if you look @ VB after it started being on .NET and Python 2

#

say, pre-2.7

#

I would probably use Rust

#

it's a lot nicer to work with

#

apart from the immaturity of tooling

#

yeah, I intend to go take a Master's next year, and then maybe get back into ML

#

it's a pretty cool language! but unless you are a relatively hardcore engineer it'll probably be irrelevant

#

law

#

shrugs

#

I was a data scientist (nominally, though more like ML engineer)

velvet thorn
#

why not? I didn't know what I wanted to do @ that time

#

nope

#

so might as well take a professional degree that is reasonably prestigious

#

went for a bootcamp, then got approached

#

it was a really great first job tbh

#

can you elaborate on that

#

hm I'm not sure about that

#

this is probably true, but it was quite intellectually stimulating, and you have better networking opportunities

#

Singapore

#

oh, I was near there once

#

I went to Saudi Arabia to teach data science

#

pretty interesting experience

#

way too dry for me though

#

yeah it wasn't bad! is part of the reason I went overseas to work

#

you mean in SA?

#

like, in Saudi Arabia or Singapore?

#

since you said this

#

oh

#

yeah.

#

but we're small

#

I'm actually working with a bank right now

#

shrugs

#

bootcamps are just a starting point I think

#

it's more about the marketing than anything else

red pecan
#

Thank you two very much, I hope you have a great day. @velvet thorn @olive jackal

bold timber
#

hi, I have question to handle an outlier: it is possible to handle outlier by using transform with yeo-johnson?

bold timber
#

it is possible to use both (scaling and transform) in pipeline?

azure marsh
#

Sure, they are just numerical operations

rigid zodiac
#

Quick question, have anyone do train test split all of the csv file in the folder before?

bold timber
green phoenix
#

can someone tell me why its accuracy is so low?

#

im AI noob

royal crest
green phoenix
tender hearth
#

Hey guys, for producing sequences with continuous values, what's the norm for determining the length of the output sequence?

#

with decoder networks used in NLP it's easy because you can have a dedicated embedding for EOS tokens

#

but you can't do that with continuous values

green phoenix
royal crest
green phoenix
royal crest
#

👏

drowsy wraith
# green phoenix ok ill keep tweaking them thanks for the input

I know a little of chess, but maybe, you can include features like the color from the square black or white for each piece, if the king is next each other, if there is a fork or other things like that. It takes some time to understand the features that matter.

azure marsh
azure marsh
bold timber
azure marsh
lapis sequoia
#

how to preprocess a categorical column where categories are ranges, like 'x<100', '100 <= x < 500', 'x >= 500'

dusk depot
lapis sequoia
#

categorical column is a column which contains categories

#

I'm talking about ML stuff here.. pandas has nothing to do with it

#

do you know imputation/standardization/preprocessing ?

dusk depot
#

no

#

ur saying 'column' like it's in some software or file

azure marsh
#

To clarify, you have a numerical column that you would like to convert into a categorical column?

#

Just apply a if/elif/else or lambda function that checks its range

#

!e

func = lambda x: 1 if x < 100 else (2 if x < 500 else 3)
print([func(i) for i in [50, 150, 550]])```
arctic wedgeBOT
#

@azure marsh :white_check_mark: Your eval job has completed with return code 0.

[1, 2, 3]
azure marsh
#

You can also look into pandas cut

#

!d pandas.cut

arctic wedgeBOT
#

pandas.cut(x, bins, right=True, labels=None, retbins=False, precision=3, include_lowest=False, duplicates='raise', ordered=True)```
Bin values into discrete intervals.

Use cut when you need to segment and sort data values into bins. This function is also useful for going from a continuous variable to a categorical variable. For example, cut could convert ages to groups of age ranges. Supports binning into an equal number of bins, or a pre-specified array of bins.
azure marsh
#

Looks like there's a numpy command as well

#

!d numpy.searchsorted

arctic wedgeBOT
#

numpy.searchsorted(a, v, side='left', sorter=None)```
Find indices where elements should be inserted to maintain order.

Find the indices into a sorted array *a* such that, if the corresponding elements in *v* were inserted before the indices, the order of *a* would be preserved.

Assuming that *a* is sorted...
lapis sequoia
#

Thanks guys for your help but I'm actually not asking help with Python or any specific library

#

I was only asking a conceptual concept on data preprocessing

#

Thanks for your help in any case. You guys are awesome

azure marsh
#

You can look up these encodings for categorical data: ordinal, one-hot, dummy variable, embedding.
You could take ordinal further and even convert it back into a lossy numerical column (e.g. taking the midpoints of the bins)

lilac hull
#

so im using the yolo algorithm, but when i run the code, i always get this error:

cv2.error: OpenCV(4.5.3) C:\Users\runneradmin\AppData\Local\Temp\pip-req-build-sn_xpupm\opencv\modules\dnn\src\darknet\darknet_io.cpp:659: error: (-215:Assertion failed) separator_index < line.size() in function 'cv::dnn::darknet::ReadDarknetFromCfgStream'``` how can i fix this?
lavish tundra
#

someone know how i can convert that dataframe(image) to a dataframe where each element be part of a column?

lapis sequoia
lavish tundra
#

but its not from a file . _.

azure marsh
#

Have you tried searching for that obscure error?

tender hearth
#

Hey folks, I'll bump my question from earlier

For producing sequences with continuous values, what's the norm for determining the length of the output sequence?
with decoder networks used in NLP it's easy because you can have a dedicated embedding for EOS tokens
but you can't do that with continuous values

#

a concrete example of this would be generating audio waveforms

royal crest
#

though i see one problem being that comma is used as the 1_000 separator

#

surely you can do this from the source file

lilac hull
azure marsh
#

You've tried using someone else's config file?

#

You've ensured there's no comments without whitespace after the '#' ?

lilac hull
#

so i tried verifying if both files were in the specified path, and they werent, as there was a typo. now i fixed that, but i get this:

parse NetParameter file: models/MobileNetSSD_deploy.prototxt in function 'cv::dnn::ReadNetParamsFromTextFileOrDie'```
lilac hull
royal crest
#

ReadNetParamsFromTextFileOrDie

lilac hull
#

lol

wide citrus
#

Any know about web scraping?

lapis sequoia
viral juniper
#

enjoy some nightmare fuel from early stages of my gan

#

this one was even earlier in training

pastel valley
#

yo what should i learn if i want to create a machine learning model to classify different types of fish through image? is covolutional neural network appropriate with it?
how do i know if i am using a good algorithm?

lunar violet
#

Hello Friends, I am currently stuck with my BE Project of Helmet Detection System using YOLOv3 on Google Collab with Darknet. I have the training code but not sure its error free & want a proper testing code. I have a custom Dataset which is labelled and ready. However even if i get the readymade Testing code of Yolov3 i dont know what exactly to Add/EDIT in that since i dont know python. Can someone please help me with the python part , i have to present this project to my External and Internal Faculties . Thank You

#

Please Feel Free to DM me with help

gaunt marsh
#

I have an Array which looks like this:

[[0.7651453003611763, 0.764035690858367, 0.7355304233745091], [0.6948386732214498, 0.15246199920890194, 0.1504548793580838], [0.6948386732214498, 0.15246199920890194, 0.1504548793580838], [0.8455282724710679, 0.84655663488637, 0.8337125981891232]]

How can I multiplay the values with 255? (These are converted RGB values)

serene scaffold
arctic wedgeBOT
#

@serene scaffold :white_check_mark: Your eval job has completed with return code 0.

001 | [[195.11205159 194.82910117 187.56025796]
002 |  [177.18386167  38.8778098   38.36599424]
003 |  [177.18386167  38.8778098   38.36599424]
004 |  [215.60970948 215.8719419  212.59671254]]
serene scaffold
#

@gaunt marsh multiplying an array by a numeric type will multiply each element by that value.

celest light
celest light
uncut barn
#

Is batch normalization still needed even if we normalize the data beforehand i.e. dividing every pixel by 255.0?

celest light
pastel valley
arctic wedgeBOT
#

Hey @plush leaf!

It looks like you tried to attach file type(s) that we do not allow (.zip). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

viral juniper
#

how do you guys load your datasets? I wanted to load new ones from the drive when needed, but keeping already loaded ones in a list, so that accessing them again is faster
turns out that wasn't a great idea because 70 thousand numpy arrays of shape 64x64x3 are not very fit for my 16 gb of ram KEKW

#

I worry that if I make it strictly just pull from hdd every time, I will wear out my drive

celest light
celest light
viral juniper
#

ohh interesting, ye im using keras, I'll look into that

#

thank you

celest light
# viral juniper thank you

Yeah. Look into tf.Dataset
Or if you are using images for image classification, you can also try ImageDataGenerator in keras.
Another way is to subclass the keras Sequence class for full control

viral juniper
#

I'm training a gan so yea i pull real image samples for the discriminator

lapis sequoia
#

hey I got N number of dfs, i wish to merge them.
but i don't know how to merge by some comparing some certain column.

#

example:
df1

a b
1 a
2 b
3 c

df2

a c
1 x
2 y
4 z

the df i want:
a b c
1 a x
2 b y
3 c -
4 - z

ripe forge
#

thats just an outer join

lapis sequoia
#

yeah hold on i did find something

#

i mean thats an outer join tbh.

odd meteor
#

A Quick Question....

I'm planning to start learning DL so I'd like ask

  1. PyTorch or TensorFlow or Keras

Which framework is advisable for a complete beginner in Deep Learning to learn 1st.

  1. Please could you give a reason for your suggestion in Q1.
serene scaffold
#

If you're not familiar with other approaches to AI, I think you'd find your learning experience more satisfying if you start elsewhere.

odd meteor
serene scaffold
grave frost
odd meteor
grave frost
#

but I would still recommend using Pytorch and JAX, when you get some more experience

odd meteor
grave frost
#

wdym by customers?

lapis sequoia
#

how do i get into ai

grave frost
odd meteor
grave frost
#

but it provides an extremem level of flexibility

odd meteor
silent pendant
#

if I want to drop a column from a pandas df with

schedule.drop(list(schedule.filter(regex='DAY')), axis=1, inplace = True)

is there a way to select more than one filter?

grave frost
silent pendant
#

@olive jackal because I am brand new to pandas 🙂 And dont know all the trix yet

#

Ultimately I dont need the columns at all, and the file will be written back out to excel

#

Defeats the purpose as a Pandas exercise 🙂

young harness
#

Hey! I'm currently making a sudoku solver from image with opencv.I've got the initial processing and splitting the image into cells done but im having trouble detecting if a specific cell contains a digit(not classifying the digit). Does anyone know how i can go about solving it?

wide sequoia
#

how to tune hyperparameter for gensim doc2vec even though gensim doc2vec doesnt give any accuracy/loss for training?

arctic wedgeBOT
#

Hey @plush leaf!

It looks like you tried to attach file type(s) that we do not allow (.zip). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

rigid zodiac
#

Hey guy I keep having this issue. But when I run that file separately, it work just fine

ebon lynx
#

@rigid zodiac what file

#

frame looks like the index.

#

I think if you want to groupby the index, the parameter for groupby is level=0

#

otherwise the problem might be that your datatypes are interpreted as strings

rigid zodiac
ebon lynx
#

what is that

rigid zodiac
#

Like i put all of the fall into 1 big folder

#

And loop it through here

ebon lynx
#

I have literally no idea what you're talking about

earnest wadi
#

Hello, I could really use some help, ive just tried to whip up my first neural network completely from scratch, and I think its almost working, would any expert be so kind to take a few minutes to go through it with me?

ebon lynx
#

@earnest wadi post code, and post errors if there are any. that's the only way to get help.

earnest wadi
#

hmm, okay

#
#

My back propagation is off i believe

#
 [2.]
 [2.]]
a:\Python\Neural Net testing Stuff\functions.py:8: RuntimeWarning: divide by zero encountered in true_divide
  return 1 / (1-x)
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
[[inf inf inf]
 [inf inf inf]
 [inf inf inf]]
[0. 0.]
[[-0.00109576 -0.00050056]
 [-0.00109576 -0.00050056]
 [-0.00109576 -0.00050056]]
Traceback (most recent call last):
  File "a:/Python/Neural Net testing Stuff/PyNets.py", line 76, in <module>
    nn.fit(training_data, batch_size=8, epochs=250)
  File "a:/Python/Neural Net testing Stuff/PyNets.py", line 27, in fit
    self.backwards_propegate(training_outs[a], output)
  File "a:/Python/Neural Net testing Stuff/PyNets.py", line 46, in backwards_propegate
    layers[i].adjustment = np.dot(layers[i].inputs.T, layers[i].delta)
  File "<__array_function__ internals>", line 5, in dot
ValueError: shapes (2,) and (3,2) not aligned: 2 (dim 0) != 3 (dim 0)
PS A:\Python\Neural Net testing Stuff>```
#

I get some division by zero then it all spirals into madness

ebon lynx
#

ok I'm not touching this one.

earnest wadi
prime hearth
#

hello, suppose i want to find P(A|B,C)

#

for bayem theorm

#

would i treat P(A|B) as like X and then do P(X|c)?

#

just confused on how to do this. with 2 conditionals

thorn bobcat
#

anyone here good with hardware, need someone to help me pick between 2 hardware choices

#

for AI

rigid zodiac
#

Sorry, gotta do some work

#

@ebon lynx what part that you confused?

worldly lake
#

Guys, does anyone know how I can put a limit on adding 100 items from the list each time?

#

100 elements are processed first, then the next 100 elements in the list, so that less load on the system in general

serene scaffold
errant parcel
#

Any recommendations for intro ML books which have good content on time series/LSTM?

thorn bobcat
#

but apparently Quadro Sux..

fluid sparrow
#

Question: why is ggplot not showing a plot and instead showing a list

serene scaffold
drowsy wraith
serene scaffold
drowsy wraith
#

with the GPU enabled I think

#

but, yeah, in the documentation i only saw CUDA

tender hearth
#

there is not very good support for non-Nvidia GPUs

ocean swallow
#

Hello. I am in need of some help :((( Do you know those chain supermarket brochures? I'm gonna need to extract manufacturer/title and description info for each peoduct on it.

tender hearth
ocean swallow
#

Finding products/info is easy with object detection, but i don't know how to extract that info

serene scaffold
ocean swallow
#

it is extremely robust

#

Assume I can convert it to text

serene scaffold
#

I would confirm that you can accurately convert it to text. However spaCy might have a ready-made recognizer for manufacturers.

#

extracting the description of a given product is going to be more difficult because it's hard to say when a description starts or ends.

ocean swallow
#

Yes exactly :/ I can also find the title and say the rest is description.

#

Title is basically what the thing is. But I have never done natural language processing on production level

serene scaffold
ocean swallow
#

Or text classification

serene scaffold
#

I do NLP professionally for some reason.

ocean swallow
#

Title is basically what the product is. Say broom from vileda

#

vileda being manufactuter

serene scaffold
#

so any time "x from y" is one sentence, that is always going to be a product and a manufacturer?

ocean swallow
#

Description has info like say 100 cm length etc

serene scaffold
#

I never do anything with documents that haven't already been converted into ascii/unicode/etc

ocean swallow
serene scaffold
#

My point is, if you don't have carefully constructed training data for this, you will either have to use a classifier that has already been trained or come up with rules

ocean swallow
#

As a human it is easy to extract that info

ocean swallow
#

As title and manufacturer is all mixed together

serene scaffold
#

So you may have to go with a pre-built model and accept a certain amount of inaccuracy

ocean swallow
#

Which kibd of model would you suggest for such task?

serene scaffold
#

are you familiar with named entity recognition?

ocean swallow
#

I have heard but not really know

serene scaffold
#

It's where you recognize words/phrases that belong to a certain category. "product" and "manufacturer" are clear-cut categories, but "description" isn't really.

ocean swallow
#

Is it okay for those models not to include some categories? Sometimes it just writes "Tomato"

#

Do you have anything as a name, that is pretrained etc for that?

serene scaffold
#

You're probably not the first person to want to do this kind of thing. But be warned, I still don't know what to do about the product descriptions.

ocean swallow
#

If I find title and manufacturer and remove them from the whole text, then I will be left with descriptions.

#

So it is not really a big issue :)

#

I will definetly be looking into those thank you so much :)

fluid sparrow
lilac hull
#

so im using open opencv to make an object detection program. when i run the code, i get this error:

AttributeError: module 'cv2.cv2' has no attribute 'dnn_DetectionModel'```
#

i gotta submit the assignment today, so uhh its kinda urgent...

lilac hull
#

if you need the module versions:

#

alright i think i know what caused it now..

#

im upgrading opencv-contrib-python i'll see if that fixes it

#

new error:
[ WARN:0] global C:\Users\runneradmin\AppData\Local\Temp\pip-req-build-1i5nllza\opencv\modules\videoio\src\cap_msmf.cpp (438) `anonymous-namespace'::SourceReaderCB::~SourceReaderCB terminating async callback

#

Now its this:
[ERROR:0] global C:\Users\runneradmin\AppData\Local\Temp\pip-req-build-1i5nllza\opencv\modules\dnn\src\tensorflow\tf_importer.cpp (2805) cv::dnn::dnn4_v20210608::`anonymous-namespace'::TFImporter::parseNode DNN/TF: Can't parse layer for node='Fp\pip-req-build-1i5nllza\opencv\modules\dnn\src\tensorflow\tf_importer.cpp:2478: error: (-2:Unspecified error) Const input blob for weights not found in function 'cv::dnn::dnn4_v20210608::`anonymous-namespace'::TFImporter::getConstBlob'

#

pls help

surreal jetty
#

will this work if df is sorted by something else, or will the order be wrong?

df['val'] = df.sort_values(by=['time']).loc[:, 'val'].apply(foo)
#

i guess the question is does pandas use the index when group assigning values

desert oar
#

!e ```python
import pandas as pd
df = pd.DataFrame({
'x': [1,2,3],
'y': [4,5,6],
}, index=list('abc'))
print(df)
df['x'] = df['x'].iloc[::-1] + 10
print(df)

arctic wedgeBOT
#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 |    x  y
002 | a  1  4
003 | b  2  5
004 | c  3  6
005 |     x  y
006 | a  11  4
007 | b  12  5
008 | c  13  6
desert oar
#

assignment to a column in a pandas df is actually a join on the index

surreal jetty
desert oar
#

.join, .loc[]=, pd.concat, and pd.merge all perform joins, with overlapping but not entirely redundant options/features

desert oar
#

people who don't like pandas are usually the same people who don't understand the index system

surreal jetty
#

Haha i still fall in the latter group i think

#

Especially with multilevel indexes

#

do you know if there is any nice ways accessing multindex values?
filtering stuff like df[df['name'] == 'B'] is just so much more convenient than df[df.index.get_level_values('name') == 'B'] or whatever

#

even with 600k rows and no index pandas is more than fast enough so i struggle to see the value of using indexes apart from various transformations which needs indexing

bronze lichen
#

Hello

#

Do you need Data science for Ai or vice versa

#

Anyways i cant do ML or Ai rn anyways so

#

How can i get started with Data science?

#

Im reading the pinned messages let me check they usually have some good stuff

uncut barn
#

How would I do early stopping, if the validation dice coefficient is above 0.5?

velvet thorn
#

and not how I thought of it

#

it was a bit mindbending tbh

#

but that’s a good observation

odd meteor
bronze lichen
#

Any cool Data science projects?