glad aspen Aug 19, 2021, 11:26 AM

#

its a dataframe from pandas

#

anyone know the efficient way?

#

just need the date time portion

#

I've tried df["created_at"] = df.created_at.replace(tzinfo=None)

weary echo Aug 19, 2021, 11:29 AM

#

to the best of my knowledge, some people did use boxplots in their time series plots. Although, in my opinion, it's not really "pretty".

I personally used boxplots to identify the existence of outliers.

#

in the end, it's a matter of preference

acoustic forge Aug 19, 2021, 11:29 AM

#

weary echo to the best of my knowledge, some people did use boxplots in their time series p...

Not talking about boxplots, but rather Box-Pierce test or Ljung-Box test

weary echo Aug 19, 2021, 11:30 AM

#

oh my god, I misread it haha

acoustic forge Aug 19, 2021, 11:30 AM

#

Time series is not a plot but rather a data format for predictive analytics

#

No worries 😛

acoustic halo Aug 19, 2021, 11:49 AM

#

Okay gtp-3 is brilliant

#

Sad :(
Crying :'(
Laughing :D
Surprise :O
Angry :@```

#

It generated the smilies for Laughing surprise and angry itself

inland zephyr Aug 19, 2021, 12:04 PM

#

Is the Malay Peninsula Standard Time should be MST in standard time format (like PST,GMT,CEST)?
to remove it i use lambda map with function:

splitted = your_str.split(" ")
day = splitted[0]+" "+splitted[1]
return datetime.strptime(day, '%Y-%m-%d %H:%M:%S')```

#

and do df[your_ts] = df.map(lambda x: funct(x))

#

it will automatically update to datetime type

#

or afaik MST is +8hour, add 8 hour before return the time

#

dunno if there is other elegant way to do this

woven rivet Aug 19, 2021, 12:19 PM

#

So I’m dumb and have little to no Idea how to do this but how would I make an AI that can recognize my pet rabbit from other rabbits and tell it’s him, I know basic python. When neural networks do stuff like this do they plot points on the images and look for patterns? Any help would be gratefully appreciated I really only know basic python so please explain things

serene scaffold Aug 19, 2021, 12:19 PM

#

inland zephyr Is the Malay Peninsula Standard Time should be MST in standard time format (like...

You could do this without apply

serene scaffold Aug 19, 2021, 12:20 PM

#

inland zephyr and do ```df[your_ts] = df.map(lambda x: funct(x))```

But if you want to use apply, you don't need to use a lambda when the function takes exactly one argument. .apply(funct) would suffice.

acoustic halo Aug 19, 2021, 12:31 PM

#

Accidentally created the AI from I have no mouth and I must scream

#

grave frost Aug 19, 2021, 12:33 PM

#

acoustic halo

that's golden 💰 imma keep it till I die 🤣

#

~~someone make it into a copypasta~~

grave frost Aug 19, 2021, 12:57 PM

#

ooh, spicy stuff going on here 🌶️ https://www.reddit.com/r/MachineLearning/comments/p6hsoh/p_appleneuralhash2onnx_reverseengineered_apple/

r/MachineLearning - [P] AppleNeuralHash2ONNX: Reverse-Engineered Ap...

1,505 votes and 213 comments so far on Reddit

#

we already have the collider projects on github. Can't wait to spam all those images to get them viral and half the population arrested 🤣

mortal dove Aug 19, 2021, 1:14 PM

#

I'm looking for a good book that covers time series analysis that's more focused on the mathematics. Does anyone have some suggestions? Ideally free, but I won't mind paying for a solid book.

inland zephyr Aug 19, 2021, 1:32 PM

#

serene scaffold But if you want to use apply, you don't need to use a lambda when the function t...

i forgot this

desert oar Aug 19, 2021, 1:37 PM

#

sorry, got busy at work yesterday. where is the part in this code that does the "don't buy if i already bought it" logic?

#

also, i don't know much about trading strategies, but it sounds like your strategy doesn't allow for the possibility of buying more of $FOO after you've already bought $FOO. is that right?

chilly skiff Aug 19, 2021, 1:38 PM

#

no problem, I appreciate you remembering 🙂

chilly skiff Aug 19, 2021, 1:38 PM

#

desert oar also, i don't know much about trading strategies, but it sounds like your strate...

yes, that's correct

desert oar Aug 19, 2021, 1:38 PM

#

SMA = simple moving average? moving average of what?

#

and RSI is this? https://www.investopedia.com/terms/r/rsi.asp

Investopedia

Relative Strength Index (RSI)

The Relative Strength Index (RSI) is a momentum indicator that measures the magnitude of recent price changes to analyze overbought or oversold conditions.

chilly skiff Aug 19, 2021, 1:39 PM

#

the prices

#

yep

desert oar Aug 19, 2021, 1:39 PM

#

so this python code is your whole trading strategy? good on you for being willing to share it, instead of being under the delusion that if you share it you're going to leak your genius secrets to the world and lose out on your gains 😛

chilly skiff Aug 19, 2021, 1:40 PM

#

since my question yesterday I've made it a lot faster. I used vectorization to remove as much data from the dataframe as possible (areas where it will not buy/sell), then I used list comprehension for the rest of the data I needed to manual iterate through

#

lmao

#

this was from my testing yesterday

#

Dictionary: 10.56 mins
To_list: 4.16 mins
zip list comprehension: 4.04 mins
vectorization & zip list comprehension: 3.36 mins
pre-vectorization & zip list comprehension: 1.13 mins
pre-vectorization & better list comprehension: 0.86 mins```

red hound Aug 19, 2021, 1:43 PM

#

Does anyone know, where the idea behind Word Embeddings, especially these kind of Embeddings the Tensorflow Embedding Layer produces, comes from? I would like to cite the idea and I already figured out that there happened a lot of work over decades, but im still not sure who i can cite as an author of the idea. If anyone has an idea, feel free to @ me. Thanks 🙂

desert oar Aug 19, 2021, 1:43 PM

#

chilly skiff since my question yesterday I've made it a lot faster. I used vectorization to r...

that sounds great. i'd love to see your updated version

desert oar Aug 19, 2021, 1:45 PM

#

red hound Does anyone know, where the idea behind Word Embeddings, especially these kind o...

you can cite the original word2vec papers as one of the early popular implementations, but i don't think the idea has a single "originator"
https://arxiv.org/abs/1301.3781
https://arxiv.org/abs/1310.4546

arXiv.org

Efficient Estimation of Word Representations in Vector Space

We propose two novel model architectures for computing continuous vector
representations of words from very large data sets. The quality of these
representations is measured in a word similarity...

arXiv.org

Distributed Representations of Words and Phrases and their Composit...

The recently introduced continuous Skip-gram model is an efficient method for
learning high-quality distributed vector representations that capture a large
number of precise syntactic and semantic...

chilly skiff Aug 19, 2021, 1:45 PM

#

I'm working on the next step in the project right now. When it's done the program will have to run for many hours most likely, so in that time ima clean up the code and i'll send you the updated code

desert oar Aug 19, 2021, 1:46 PM

#

sure, would be happy to see what you did

#

if your code is all "numeric" (i.e. no strings, dicts, etc), you can probably get significant speedups by running with numba in "nopython" mode

red hound Aug 19, 2021, 1:46 PM

#

desert oar you can cite the original word2vec papers as one of the early popular implementa...

Thanks, will have a look 👍

chilly skiff Aug 19, 2021, 1:47 PM

#

yeah I looked at that yesterday but if I'm being honest it looked very difficult to install all the stuff for it and get it running properly. I know I'll have to do it eventually but I thought I'd just wait for now xD

grave frost Aug 19, 2021, 1:54 PM

#

hmmm... OpenAI didn't look like they did much filtering

desert oar Aug 19, 2021, 2:18 PM

#

chilly skiff yeah I looked at that yesterday but if I'm being honest it looked very difficult...

it's not at all. usually you can just do pip install numba and get going with it

chilly skiff Aug 19, 2021, 2:19 PM

#

yeah that's what I did, but put it simple, it didn't work as well as I was hoping xD

#

But i'll probably have to use it soon

lapis sequoia Aug 19, 2021, 2:19 PM

#

hey anyone knows where i can find a good written explanation on why normalisation doesn't work on SMOT data? I know it is because we are already normalising unbalanced class, but was wondering if there is a better explanation out there that I could use.

desert oar Aug 19, 2021, 2:30 PM

#

lapis sequoia hey anyone knows where i can find a good written explanation on why normalisatio...

what kind of normalization?

flat hollow Aug 19, 2021, 3:08 PM

#

Does anyone have experience with creating a Monte Carlo simulation? I need to create a 2D model of a blood vessel in the brain (simple - central circle enclosed by 2 barriers with some permeability, i.e. either a chance for molecule to get through based on dice roll or trapping and releasing after some time) where in the middle I would have a stream of new particles (simulating blood passing through) and I would need to do random walk while recording the number of particles in each of the 3 compartments (inside, between barriers, outside). I was thinking about using something like pygame to do the simulation, but I would prefer doing a LOT of particles and efficiency is the key since Im running it on a laptop.

desert oar Aug 19, 2021, 3:17 PM

#

use pypy

#

store your classes with __slots__ (although this is mostly a no-op in pypy)

#

that said, if you can implement this as a loop over a numpy array, you can probably do even better with numba, than with pypy

#

that, or maybe you can repurpose BUGS or Stan to do the heavy lifting for you; i've only used those for bayesian probability modeling so i wouldn't know how

#

e.g. in numba it might look something like this:

import numba

@numba.njit
def run_simulation(n_samples):
    n_inside = 0
    n_between = 0
    n_outside = 0
    for i in range(n_samples):
        # Do your complicated stuff here
        ...

#

but it sounds like maybe you can "vectorize" this simulation? e.g. pre-generating a big list of random values with np.random and then doing cumsum-type calculations thereupon. if you can post the actual algorithm for the simulation i can probably help more

#

depends of course on your performance requirements too

chilly geyser Aug 19, 2021, 3:31 PM

#

Before pypy I would recommend just having anything working first

flat hollow Aug 19, 2021, 3:31 PM

#

@desert oar lots of info here, unfortunately I got the task like 30 minutes ago and I don't even know how to set the model up such that the particles interact with the barrier yet 😦 once I figure that out I can start working on optimisation

chilly geyser Aug 19, 2021, 3:31 PM

#

Then you can think about jit/compilers/etc. later

#

Yeah I recommend having a minimum working example before optimisation

#

Although the faster you get a minimum working example, the faster you can optimise

flat hollow Aug 19, 2021, 3:34 PM

#

chilly geyser Yeah I recommend having a minimum working example before optimisation

any ideas on how to do random walk simulation with a barrier that has some permeability? all the random walk tutorials I found do it in free space and only a few particles, I need constrained space with a ton of particles

#

ofc getting it done with just one is fine for now, I just don't know how to do the particle-wall interaction because I feel like that is game design and I have no experience in that

#

which is why I thought about using pygame at first...

#

hm... if the barrier is circular, would I make the particles do the classic random walk and after each step, check if their initial position was less than r away from the centre and final step more than r and if it is, add a random probability of them not moving at all?

#

does that sound good as barrier simulation?

desert oar Aug 19, 2021, 3:50 PM

#

https://mathematica.stackexchange.com/q/57561/16075
https://mathematica.stackexchange.com/q/49063/16075
here are mathematica demos of such a thing

Mathematica Stack Exchange

2D random walk within a bounded area

I want to simulate a random walk in two dimensions within a bounded area, such as a square or a circle. I am thinking of using an If statement to define a boundary. Is there a better way to define a

Mathematica Stack Exchange

Simulating a bounded continuous random walk in $n$-dimensions

You're given three $n$-dimensional real vectors, $\mathbf{x_0}$, $\mathbf{a}$ and $\mathbf{b}$, with $\mathbf{a} \le \mathbf{x_0} \le \mathbf{b}$ (vector inequality means component-wise inequality).

#

in your case, what's the logic? "particle hits boundary, then p% particle goes through it vs bounces off"?

chilly geyser Aug 19, 2021, 3:58 PM

#

flat hollow any ideas on how to do random walk simulation with a barrier that has some perme...

? Just have a probability of it passing/not passing?

flat hollow Aug 19, 2021, 3:59 PM

#

chilly geyser ? Just have a probability of it passing/not passing?

ye that's probably the easiest in terms of theory

flat hollow Aug 19, 2021, 4:01 PM

#

desert oar in your case, what's the logic? "particle hits boundary, then p% particle goes t...

I think that's going to be my initial try, I am simulating a particle going from blood through a barrier into a different area and then again into another area so probabilities like that make sense to me

#

I might eventually have to add in a bias in the walk that would make it more likely to walk away from the centre but that's down the road

desert oar Aug 19, 2021, 4:01 PM

#

this is just 2d though right? circle inside, infinite area outside, particle has p chance to pass outside the circle when it hits the boundary?

flat hollow Aug 19, 2021, 4:02 PM

#

yeah I think 2D will be enough in this case just for simplicity

#

I can try drawing it

arctic wedgeBOT Aug 19, 2021, 4:09 PM

#

Hey @flat hollow!

It looks like you tried to attach file type(s) that we do not allow (.heic). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

flat hollow Aug 19, 2021, 4:09 PM

#

noone likes apple's .heic 😦

#

@desert oar something like that? There could be a central starting point, then 2 circular barriers with p1 and p2 probabilities of particles passing through them and infinite space outside (may need to constrain it eventually, have to talk to supervisor first)

chilly geyser Aug 19, 2021, 4:29 PM

#

Seems possible

chilly geyser Aug 19, 2021, 4:30 PM

#

flat hollow <@!389497659087650836> something like that? There could be a central starting po...

Actually thinking about it, it might be possible to solve without simulation assuming circles, but don't quote me on this

flat hollow Aug 19, 2021, 4:33 PM

#

chilly geyser Actually thinking about it, it *might* be possible to solve without simulation a...

Im not doing the math for that 😄 I just want a nice simulation that is repeatable, people can see the evolution and I can plot the results nicely for my (hopefully) future paper 🙂

desert oar Aug 19, 2021, 4:36 PM

#

can the particles bump into each other?

#

and do the particles have spatial extent or are they just points?

flat hollow Aug 19, 2021, 4:39 PM

#

for the simplicity of getting at least something done I would do point-like and no collisions, but eventually I would probably have to add collisions and Im not sure about dimensions, would have to check the tables

#

I have molecules inside a blood vessel, do point-like particles work with collisions? 😄

#

doesn't sound like they would...

waxen sinew Aug 19, 2021, 5:58 PM

#

https://colab.research.google.com/drive/1Lof26snWVk6wm3Y-sRtKCYrGZvFdHCwP?usp=sharing I'm having error how to correct this?

Google Colaboratory

desert oar Aug 19, 2021, 6:29 PM

#

@flat hollow do the whole thing in polar coordinates

#

a particle isn't an (x,y) pair, it's an (angle,radius) pair

#

then a "collision" with the barrier is just particle.radius >= barrier_radius

#

the "particles" can be a 2d numpy array, and you can also use an array to track which particle is in which circle, so you don't have to recompute it at every step

particle_positions = np.array([
    [radius0, angle0],
    [radius1, angle1],
    ...
])

particle_circles = np.array([
    0,  # inside the inner circle
    1,  # between the circles
    2,  # outside the outer circle
    ...
])

#

a "step" could be something as simple as a fixed-size step in a random direction

#

just need to do some trig to figure out the new angle and radius after a step

pine wolf Aug 19, 2021, 6:36 PM

#

desert oar then a "collision" with the barrier is just `particle.radius >= barrier_radius`

this sounds fun, simplifies even particle-particle collision

#

though i often use manhattan distance when doing that stuff

desert oar Aug 19, 2021, 6:38 PM

#

idk how well it works for non-point particles

#

i also can't think of how to calculate the next step without going back to cartesian coordinates 🤦

#

i'll have to write it out on paper

#

oh i think that's actually how you do it

pine wolf Aug 19, 2021, 6:46 PM

#

yeah, but you can vectorize that stuff conveniently --- while particle-particle collision is probably some python loop, dunno

flat hollow Aug 19, 2021, 6:48 PM

#

ah, my terrible math skills are catching up to me, the one thing that Im not sure about right now is the particle tracking...

#

unfortunately in 7 hours Im driving 1,5 hours and then walking for 8 more hours so I need some sleep, thanks a lot for your time @desert oar

desert oar Aug 19, 2021, 7:06 PM

#

of course, i'm happy to procrastinate on my own work with this 😛

desert oar Aug 19, 2021, 7:06 PM

#

pine wolf yeah, but you can vectorize that stuff conveniently --- while particle-particle ...

that would be my guess, but if that gets slow you can drop down to numba and do it

desert oar Aug 19, 2021, 7:24 PM

#

@pine wolf https://math.stackexchange.com/q/1365622/117452

Mathematics Stack Exchange

Adding two polar vectors

Is there a way of adding two vectors in polar form without first having to convert them to cartesian or complex form?

pine wolf Aug 19, 2021, 7:33 PM

#

adding this to a long todo list

lapis sequoia Aug 19, 2021, 8:10 PM

#

desert oar what kind of normalization?

Minmax, standard etc

desert oar Aug 19, 2021, 8:11 PM

#

@pine wolf @flat hollow https://paste.pythondiscord.com/bihipocuri messing around a bit

#

need to figure out how to animate

desert oar Aug 19, 2021, 8:12 PM

#

lapis sequoia Minmax, standard etc

i haven't heard that before, but i can imagine that there is a problem with using simulated data to estimate things like the maximum, mean, etc. i would guess that you should do those things before oversampling

iron basalt Aug 19, 2021, 8:46 PM

#

flat hollow Does anyone have experience with creating a Monte Carlo simulation? I need to cr...

Are you looking for this? https://en.wikipedia.org/wiki/Brownian_motion

Brownian motion

Brownian motion, or pedesis (from Ancient Greek: πήδησις /pɛ̌ːdɛːsis/ "leaping"), is the random motion of particles suspended in a medium (a liquid or a gas).This pattern of motion typically consists of random fluctuations in a particle's position inside a fluid sub-domain, followed by a relocation to another sub-domain. Each relocation is follo...

#

Or just really simple, bunch of particles bouncing around and when they hit the border they have some chance of being repelled or passing through.

pine wolf Aug 19, 2021, 8:54 PM

#

desert oar need to figure out how to animate

#

gif is choppier than the program itself

lapis sequoia Aug 19, 2021, 9:07 PM

#

desert oar i haven't heard that before, but i can imagine that there is a problem with usin...

💩 sorry, was doing some scaling while typing, i mean sqrt, cbrt, log etc

pine wolf Aug 19, 2021, 9:25 PM

#

ok, this is with a .001 probability of pass the barrier

grave frost Aug 19, 2021, 9:35 PM

#

pine wolf

what is it? 🤔

pine wolf Aug 19, 2021, 9:36 PM

#

grave frost what is it? 🤔

#data-science-and-ml message

#

this is only a single barrier, which i didn't draw

grave frost Aug 19, 2021, 9:40 PM

#

flat hollow <@!389497659087650836> something like that? There could be a central starting po...

so you take a initial set of points and simulate random forces?

#

ohh, brownian motion

#

but why? 🤔

pine wolf Aug 19, 2021, 9:43 PM

#

it's red blood cell diffusion apparently

#

which is brownian motion with a barrier

grave frost Aug 19, 2021, 9:44 PM

#

Brain tickler - I have a set of 2 angles (in rads) can anyone think up a way to represent two angles with a single number?

pine wolf Aug 19, 2021, 9:45 PM

#

complex(angle_1, angle_2)

grave frost Aug 19, 2021, 9:45 PM

#

the function has to be reversible too 👈

grave frost Aug 19, 2021, 9:45 PM

#

pine wolf `complex(angle_1, angle_2)`

A real number, that is

pine wolf Aug 19, 2021, 9:46 PM

#

unless there's some constraint on your angles, i don't think there's a reversible method for arbitrary angles

grave frost Aug 19, 2021, 9:47 PM

#

usually those numbers would have a high decimal accuracy too

grave frost Aug 19, 2021, 9:47 PM

#

pine wolf unless there's some constraint on your angles, i don't think there's a reversibl...

yea, I was thinking that way too

#

guess I will just do multi-output regression ¯_(ツ)_/¯ returning the 2d vector in the end...

desert oar Aug 19, 2021, 10:31 PM

#

wouldn't the banach tarski theorem suggest that such a function exists, albeit probably not one that we can comprehend or even write the definition of?

desert oar Aug 19, 2021, 10:32 PM

#

pine wolf ok, this is with a .001 probability of pass the barrier

gonna share that code? 👀

#

matplotlib animations are being a pain

pine wolf Aug 19, 2021, 10:40 PM

#

desert oar gonna share that code? 👀

https://github.com/salt-die/nurses_2/blob/main/examples/red_blood_cell_diffusion.py

desert oar Aug 19, 2021, 10:40 PM

#

bless you

#

let's see how bad mine is compared to yours

#

i haven't done the random diffusion collision part yet, got sidetracked w/ animatinos

pine wolf Aug 19, 2021, 10:41 PM

#

i just hacked apart your code

desert oar Aug 19, 2021, 10:41 PM

#

i hacked apart my code too 😛

pine wolf Aug 19, 2021, 10:41 PM

#

might require some knowledge of nurses_2 particles

desert oar Aug 19, 2021, 10:42 PM

#

nurses 2?

#

is that your own library?

pine wolf Aug 19, 2021, 10:44 PM

#

yes

#

terminal graphics library

#

the README has animated examples

pine wolf Aug 19, 2021, 10:49 PM

#

desert oar wouldn't the banach tarski theorem suggest that such a function exists, albeit p...

i mean the size of the sets RxR and R are the same, so there's definitely a mapping somehow --- i'm not gonna dream one up

desert oar Aug 19, 2021, 10:49 PM

#

pine wolf terminal graphics library

👀

#

unixporn people might like that kind of thing

pine wolf Aug 19, 2021, 10:50 PM

#

it's pretty python-specific -- there's better in the c-domain

desert oar Aug 19, 2021, 10:52 PM

#

fair enough

pine wolf Aug 19, 2021, 10:53 PM

#

if notcurses python bindings get improved and they add support for windows terminal, i'll probably make a nurses_3

velvet thorn Aug 19, 2021, 11:22 PM

#

grave frost Brain tickler - I have a set of 2 angles (in rads) can anyone think up a way to ...

depends

#

we're not talking arbitrarily, right

grave frost Aug 19, 2021, 11:22 PM

#

velvet thorn we're not talking arbitrarily, right

we are

velvet thorn Aug 19, 2021, 11:22 PM

#

grave frost we are

then how are you storing them

grave frost Aug 19, 2021, 11:22 PM

#

the angles have a pretty high degree of accuracy

velvet thorn Aug 19, 2021, 11:22 PM

#

like

#

when I say "arbitrary"

#

I mean, not real numbers

pine wolf Aug 19, 2021, 11:22 PM

#

you can take every digit of one angle and every digit of another angle and zip them together

velvet thorn Aug 19, 2021, 11:22 PM

#

but some numpy number type

grave frost Aug 19, 2021, 11:23 PM

#

like 3.245145416426537532

#

np.float64

velvet thorn Aug 19, 2021, 11:23 PM

#

grave frost np.float64

then it can't be done

#

at least, not in the general case

grave frost Aug 19, 2021, 11:23 PM

#

velvet thorn then it can't be done

thought so too

velvet thorn Aug 19, 2021, 11:23 PM

#

or rather...

pine wolf Aug 19, 2021, 11:23 PM

#

angle_1 = .01010101010...
angle_2 = .5959595959...

compressed = .051905190519...

#

this would work

grave frost Aug 19, 2021, 11:23 PM

#

oh, and the number has to be real and function reversible

velvet thorn Aug 19, 2021, 11:24 PM

#

pine wolf ```py angle_1 = .01010101010... angle_2 = .5959595959... compressed = .05190519...

it wouldn't work if both numbers are @ max double precision

pine wolf Aug 19, 2021, 11:24 PM

#

it would still work, you would half their precision before compressing

grave frost Aug 19, 2021, 11:24 PM

#

as in they have different lenghts?

velvet thorn Aug 19, 2021, 11:24 PM

#

pine wolf it would still work, you would half their precision before compressing

which means you lose information

#

therefore not reversible

pine wolf Aug 19, 2021, 11:25 PM

#

just use a higher precision float for the final

grave frost Aug 19, 2021, 11:25 PM

#

well, they are both angles so the function should be sensitive to both and actually retain their information

pine wolf Aug 19, 2021, 11:25 PM

#

and you don't lose information

grave frost Aug 19, 2021, 11:26 PM

#

A simpler method could be just to output a 2D vector lemon_tongue but I wanna see what you guys come up with

velvet thorn Aug 19, 2021, 11:26 PM

#

pine wolf just use a higher precision float for the final

if your initial values are float64 you're out of luck

pine wolf Aug 19, 2021, 11:26 PM

#

then use float128

velvet thorn Aug 19, 2021, 11:26 PM

#

float128 is just float64

#

with extra padding

pine wolf Aug 19, 2021, 11:27 PM

#

why is that my fault

velvet thorn Aug 19, 2021, 11:27 PM

#

pine wolf just use a higher precision float for the final

it's not, but this is not possible

#

the constraints were stated @ the start

#

@grave frost realistically speaking though

#

are you going to need all the digits?

pine wolf Aug 19, 2021, 11:28 PM

#

you can create a new type that stored 128 bits, and then you can represent your 2 64 bit numbers as a single 128 bit number

grave frost Aug 19, 2021, 11:28 PM

#

yep,

Function has to be reversible
output should be Real
Information from both angles should be preserved in the output

velvet thorn Aug 19, 2021, 11:28 PM

#

15 is generally a lot

pine wolf Aug 19, 2021, 11:28 PM

#

you can't store 8 bits of information in 4 bits though

grave frost Aug 19, 2021, 11:28 PM

#

velvet thorn <@!738058085083381760> realistically speaking though

maybe not - but approximations would lose on error

velvet thorn Aug 19, 2021, 11:29 PM

#

pine wolf you can create a new type that stored 128 bits, and then you can represent your ...

you could do this if you're willing to give up easy vectorisation

velvet thorn Aug 19, 2021, 11:29 PM

#

grave frost A simpler method could be just to output a 2D vector <:lemon_tongue:780477651314...

this seems much more practical

pine wolf Aug 19, 2021, 11:29 PM

#

i mean, i'm not recommending this

velvet thorn Aug 19, 2021, 11:29 PM

#

okay

#

how about this

pine wolf Aug 19, 2021, 11:29 PM

#

this seems like an XY problem, but i don't address this issue

velvet thorn Aug 19, 2021, 11:29 PM

#

commission a custom processor that can natively handle quad precision floats

grave frost Aug 19, 2021, 11:29 PM

#

pine wolf this seems like an XY problem, but i don't address this issue

its not really a huge problem per se - I can get by without solving it

velvet thorn Aug 19, 2021, 11:29 PM

#

along with the attendant firmware etc.

#

problem solved

#

🙏

grave frost Aug 19, 2021, 11:30 PM

#

nice. anyone up for some funding?

pine wolf Aug 19, 2021, 11:30 PM

#

like, what problem do you solve by using one float to represent two?

velvet thorn Aug 19, 2021, 11:30 PM

#

I've got a couple of thoughts and prayers

velvet thorn Aug 19, 2021, 11:30 PM

#

pine wolf like, what problem do you solve by using one float to represent two?

convenience, apparently

#

but I also think it's not the right way to do it

grave frost Aug 19, 2021, 11:30 PM

#

pine wolf like, what problem do you solve by using one float to represent two?

nothing really, as I said thats much more to tickle your brains

pine wolf Aug 19, 2021, 11:30 PM

#

especially when you have to use resources to go to and from the representation

grave frost Aug 19, 2021, 11:30 PM

#

I can just get a 2D array

#

and treat it as a multi-output regression problem 🤷

grave frost Aug 19, 2021, 11:31 PM

#

pine wolf especially when you have to use resources to go to and from the representation

nah resources dont matter

#

just asking mathematically here. 2 angles, 3 conditions

pine wolf Aug 19, 2021, 11:31 PM

#

you can do it with infinite precision floats

velvet thorn Aug 19, 2021, 11:31 PM

#

yeah

grave frost Aug 19, 2021, 11:31 PM

#

maybe there's something with some transformation?

velvet thorn Aug 19, 2021, 11:31 PM

#

mathematically, it is defo possible

iron basalt Aug 19, 2021, 11:31 PM

#

In C you can just create 128bit floats directly (GCC x86, x86-64). Idk if numpy supports it.

velvet thorn Aug 19, 2021, 11:32 PM

#

iron basalt In C you can just create 128bit floats directly (GCC x86, x86-64). Idk if numpy ...

not as of stable, AFAIK

grave frost Aug 19, 2021, 11:32 PM

#

iron basalt In C you can just create 128bit floats directly (GCC x86, x86-64). Idk if numpy ...

would torch even work with those, if torch's own tensor is not available at 128b

velvet thorn Aug 19, 2021, 11:32 PM

#

velvet thorn mathematically, it is defo possible

but the point here is that floats, more than many other things, are a leaky abstraction

iron basalt Aug 19, 2021, 11:33 PM

#

Can of course just wrap the 128 bit floats with cython.

grave frost Aug 19, 2021, 11:33 PM

#

I don't think that its actually fully 64b

iron basalt Aug 19, 2021, 11:34 PM

#

Doing it manually without the hardware support would be super slow.

grave frost Aug 19, 2021, 11:34 PM

#

its def in middle

#

like a little more than float32

#

technically, isn't it a 2D array - so a vector that with a transformation be transformerd in such a way as to get to 1D line?

#

so we could reverse the transformation and get theoretically the same thing back and the transforming matrix has no eigenvectors

iron basalt Aug 19, 2021, 11:38 PM

#

This using one number to represent two thing seems like something one would find in some old commodore 64 code or something so maybe look in that area if you really want an answer.

#

But it does sound useless even if it's doable, much like the xor variable swap.

pine wolf Aug 19, 2021, 11:44 PM

#

desert oar gonna share that code? 👀

it was bugging me, so i vectorized the conversion back to cartesian coordinates

finite wasp Aug 20, 2021, 12:05 AM

#

I've got a question in Chocolate if someone can help out.

serene scaffold Aug 20, 2021, 12:56 AM

#

@finite wasp idk what chocolate is, but you should always just ask your question. Asking if someone knows about the topic of an unasked question is less helpful than putting the real question out there.

#

Looks like I may have misread what you had said 🤷‍♂️

desert oar Aug 20, 2021, 2:18 AM

#

pine wolf it was bugging me, so i vectorized the conversion back to cartesian coordinates

I got mine humming along with polar, i kind of like the idea of not going to cartesian except for plotting. Figuring out the angle of reflection off the barrier will be interesting though

pine wolf Aug 20, 2021, 2:20 AM

#

i just didn't change the radius at all when crossing barrier, instead of reflecting

desert oar Aug 20, 2021, 2:20 AM

#

I saw

pine wolf Aug 20, 2021, 2:20 AM

#

the lazy way

#

i don't think you have to add too much to reflect

#

or maybe you do

#

could just give the particles a velocity and then all you have to do is ~~reverse~~ negate it

desert oar Aug 20, 2021, 2:24 AM

#

To reflect i think you'd have to find the distance from point to barrier, compute the tangent of the circle at that point, get the perpendicular line to that, then compute the angle of reflection around it. Then reposition the particle accordingly

#

Or yeah don't use reflection rules and just reverse direction lol

#

All this probably goes out the window if the blood cells have spatial extent anyway

old thorn Aug 20, 2021, 3:11 AM

#

Has anyone ever deployed a ML model on a chrome extension, looking to do that but haven't found many resources sadge

arctic wedgeBOT Aug 20, 2021, 3:47 AM

#

Hey @errant flare!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .csv attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

errant flare Aug 20, 2021, 3:49 AM

#

Quick question i have a .csv file data that is structured kinda like this with a 150 responses (https://docs.google.com/spreadsheets/d/12z4FBN8_mW3T7I4WPb2JbLx9WGxVnelfNspRwVMKF9Q/edit?usp=sharing) , is this is any way worthy of a linear regression or multivariate regression model? and if so on the basis of what independent and dependent factors? Kinda new to ai and datascience so yeah a little confused here

And if it isn't are there any other models or ways through which I could build a predicitive model with this kind of dataset?

I'd appreciate any and all direction / help, thanks once again!

Google Docs

Sample Survey

Sample_Survey

Timestamp,Which Country are you from?,Age,Gender,Name a random source of information,On a Scale of 1 to 10, how much blah blah,On a Scale of 1 to 10, how much blah blah,On a Scale of 1 to 10, how much blah blah,On a Scale of 1 to 10, how much blah blah,On a Scale of 1 to 10, how mu...

umbral ferry Aug 20, 2021, 3:55 AM

#

errant flare Quick question i have a .csv file data that is structured kinda like this with a...

in order of increasing complexity (and usually increasing quality), you can look at regression trees, random forest, adaboost, gradient boost, xgboost

#

statquest on youtube has a good explination of all of them, and usually he or others have implementation examples in python

errant flare Aug 20, 2021, 4:06 AM

#

thanks a lot!

#

i'll have a look at them!

errant flare Aug 20, 2021, 4:10 AM

#

umbral ferry in order of increasing complexity (and usually increasing quality), you can look...

could i though uh try to have a regression model with the age variable in comparison to the 1 to 10 number being the dependent variable?

#

cuz for some reason they were a little specific on having a regression model

#

i'm not sure why

umbral ferry Aug 20, 2021, 4:11 AM

#

you can have whatever you want be your inputs and whatever you want be your output

errant flare Aug 20, 2021, 4:12 AM

#

hmmm

umbral ferry Aug 20, 2021, 4:12 AM

#

I've never really done multivariable linear regression but I think that's also a thing

#

there are advantages and disadvantages to every method

errant flare Aug 20, 2021, 4:12 AM

#

yep i don't have that many independents though for multivariable

#

anyways thanks for the help, gotta go now so yeah!

umbral ferry Aug 20, 2021, 4:13 AM

#

gl!

gusty frost Aug 20, 2021, 4:19 AM

#

Do any of you know where the best place is to learn data scientist and will age be a barrier?

velvet thorn Aug 20, 2021, 4:20 AM

#

gusty frost Do any of you know where the best place is to learn data scientist and will age ...

are you old enough to work in your country?

gusty frost Aug 20, 2021, 4:21 AM

#

velvet thorn are you old enough to work in your country?

I'm talking about learning it.

#

I got a while before I can work.

velvet thorn Aug 20, 2021, 4:21 AM

#

gusty frost I'm talking about learning it.

if you are old enough to understand language and abstractions

#

then you're old enough to learn

gusty frost Aug 20, 2021, 4:22 AM

#

velvet thorn if you are old enough to understand language and abstractions

I've done a lot of java so I'm familiar with programming languages

velvet thorn Aug 20, 2021, 4:22 AM

#

gusty frost I've done a lot of java so I'm familiar with programming languages

Java

#

🥴

#

ew

#

sorry...not a Java fan

#

but yeah

#

you can defo start learning

#

how're you @ mathematics?

#

in particular, statistics

gusty frost Aug 20, 2021, 4:22 AM

#

velvet thorn how're you @ mathematics?

Pretty good.

#

in algebra

velvet thorn Aug 20, 2021, 4:23 AM

#

linear algebra?

gusty frost Aug 20, 2021, 4:23 AM

#

velvet thorn linear algebra?

Haven't done it before

#

I'll look into it.

velvet thorn Aug 20, 2021, 4:23 AM

#

statistics knowledge is important for data science

#

it depends on what you wanna do, though

#

it's a wide field

#

depending on your specialisation

#

graph theory

#

linear algebra

#

calculus

#

all might be relevant

gusty frost Aug 20, 2021, 4:24 AM

#

Do you know where I can learn data science?

velvet thorn Aug 20, 2021, 4:25 AM

#

gusty frost Do you know where I can learn data science?

nope

#

there's tons of stuff online

thorn bobcat Aug 20, 2021, 7:24 AM

#

gpt-j is nice

errant flare Aug 20, 2021, 7:39 AM

#

what the hekk is KeyError

#

in pandas

#

god its driving me nuts

#

nvm i got it but still why

velvet thorn Aug 20, 2021, 8:33 AM

#

errant flare nvm i got it but still why

you probably were looking for a column that wasn’t there

errant flare Aug 20, 2021, 8:44 AM

#

yeah deleted a piece of code in my notebook by mistake

#

and sometimes there's apparently a space in the csv's name which I didn't notice

maiden bluff Aug 20, 2021, 9:27 AM

#

Hey there, i'm really interested in Data Science, but how should I begin with?

agile cobalt Aug 20, 2021, 9:28 AM

#

do you already know Python or just getting started overall?

maiden bluff Aug 20, 2021, 9:28 AM

#

Well, I already know python

agile cobalt Aug 20, 2021, 9:28 AM

#

mostly this channel's pins then

maiden bluff Aug 20, 2021, 9:31 AM

#

Sorry, english is not my first language, could you try to elaborate?

agile cobalt Aug 20, 2021, 9:31 AM

#

Check the Pinned messages 📌

maiden bluff Aug 20, 2021, 9:32 AM

#

Thanks!

errant flare Aug 20, 2021, 9:53 AM

#

uh quick question

i'm doing this in python with pandas

soi = data["SOI"]```
and the output i get has the index column in it, any way i can get rid of it?

vague stratus Aug 20, 2021, 10:09 AM

#

errant flare uh quick question i'm doing this in python with pandas ```Python soi = data["S...

Try out
soi = data.SOI.values

velvet thorn Aug 20, 2021, 10:10 AM

#

errant flare uh quick question i'm doing this in python with pandas ```Python soi = data["S...

why do you want to do that?

#

data.reset_index()['SOI'] should do

errant flare Aug 20, 2021, 10:15 AM

#

velvet thorn why do you want to do that?

i wanna use the index for other parts of the program but i wanna isolate only the soi colum without the indexes so yeah

#

and soi = data["SOI"].values works in that regards i think so yeah thanks!

thorn bobcat Aug 20, 2021, 10:28 AM

#

can I ask you a question

#

if you train a transformer on the bible

#

what would be the result?

errant flare Aug 20, 2021, 10:29 AM

#

wait what

thorn bobcat Aug 20, 2021, 10:29 AM

#

would it come up with new data or would it take input and put it into context

errant flare Aug 20, 2021, 10:29 AM

#

a transformer?

thorn bobcat Aug 20, 2021, 10:29 AM

#

a transformer is an NLP model.

errant flare Aug 20, 2021, 10:30 AM

#

have no experience with nlp

thorn bobcat Aug 20, 2021, 10:30 AM

#

ahh yea same here tbh..

#

I wanna get into it. I just worked with image processing.

errant flare Aug 20, 2021, 10:31 AM

#

but from a couple of google searches i'm assuming it would come up with new data

#

that would resemble somethings that the bible has

#

because apparently transformer models are used for text summarization

thorn bobcat Aug 20, 2021, 10:32 AM

#

would it be an accurate representation of the bible

errant flare Aug 20, 2021, 10:32 AM

#

so that's taking the idea of the text but coming up with technically new data

errant flare Aug 20, 2021, 10:32 AM

#

thorn bobcat would it be an accurate representation of the bible

yeah no i highly doubt that it would be

thorn bobcat Aug 20, 2021, 10:32 AM

#

errant flare so that's taking the idea of the text but coming up with technically new data

or would it be a new idea.

#

yea... I was wondering what kind of model I would need to have to answer questions directly from say the bible, torah or the Quran.

errant flare Aug 20, 2021, 10:33 AM

#

ahhhhhh

#

interesting

thorn bobcat Aug 20, 2021, 10:33 AM

#

where it has to be onpoint in the interpretation of questions, verses and choice of answers.

errant flare Aug 20, 2021, 10:34 AM

#

hmmm maybe then

thorn bobcat Aug 20, 2021, 10:34 AM

#

transformers are all on the rage these days to be honest thought it would be a good choice

#

have you tried gpt-j?

#

https://6b.eleuther.ai/

EleutherAI - text generation testing UI

EleutherAI web app testing for language models

#

try this

errant flare Aug 20, 2021, 10:35 AM

#

thorn bobcat transformers are all on the rage these days to be honest thought it would be a g...

i have no clue how to help but uh here? https://towardsdatascience.com/nlp-building-a-question-answering-model-ed0529a68c54

Medium

NLP — Building a Question Answering model

Doing cool things with data!

errant flare Aug 20, 2021, 10:35 AM

#

thorn bobcat https://6b.eleuther.ai/

yep just gave it a try hehe

#

the bible i'm assuming is quite vast so it's not like only 5 or 6 questions that you have to answer

thorn bobcat Aug 20, 2021, 10:36 AM

#


For, it seems that the church has no plans for a Christmas celebration this year. Instead, the Vatican is proposing a celebration of the end of the world.

Last week a Vatican statement said that the Pope is planning to address the United Nations on December 21st, in order to address the "global environmental crisis".

It said that the Pope will urge the world's leaders to work for a "dramatic reduction" in carbon emissions.

In his address, the Pope is```

#

the prompt was In god we trust, said the Pope to lol

errant flare Aug 20, 2021, 10:37 AM

#

in addition it isn't like the questions from a bible are like "accurate" as in when someone asks a question they aren't looking for a specific value it could be varied

#

probably very tough to be honest seeing how there might not be one answer to a question asked in regards to religion

thorn bobcat Aug 20, 2021, 10:37 AM

#

errant flare the bible i'm assuming is quite vast so it's not like only 5 or 6 questions that...

yea it's interpretations and rulings etc. The same model would essentially be able to derive legal rulings and supporting verses from say the constitution or penal code.

errant flare Aug 20, 2021, 10:37 AM

#

and some that could be quite wrong

thorn bobcat Aug 20, 2021, 10:38 AM

#

errant flare probably very tough to be honest seeing how there might not be one answer to a q...

yea.. that's also a problem

#

hm...

errant flare Aug 20, 2021, 10:38 AM

#

thorn bobcat yea it's interpretations and rulings etc. The same model would essentially be ab...

yeah if you're building and nlp to be able to interpret something and not just spit the same thing out that automatically throws accuracy to the root text out the window I think

thorn bobcat Aug 20, 2021, 10:38 AM

#

but there should be general consensus.

thorn bobcat Aug 20, 2021, 10:38 AM

#

errant flare yeah if you're building and nlp to be able to interpret something and not just s...

it should interpret based on the root text.

#

and a supporting corpus perhaps.

errant flare Aug 20, 2021, 10:39 AM

#

hmmm i don't know how to help but yeah lol

#

anyways man gotta run bye!

thorn bobcat Aug 20, 2021, 10:39 AM

#

based on what i know from law there's cases based on cases based on the constitution.

thorn bobcat Aug 20, 2021, 10:39 AM

#

errant flare hmmm i don't know how to help but yeah lol

ahh it's alright.. I'll look into it through a search

thorn bobcat Aug 20, 2021, 10:40 AM

#

errant flare anyways man gotta run bye!

take care

old grove Aug 20, 2021, 10:41 AM

#

Hey Guys I have Outliers in my covid dataset and i am not getting how do i deal with it.. Like say active cases so some states like Usa,Nw in that Usa has value aboove 100 k or something which impacts the mean too, so the outlier is actually a valid type s you can have no of cases in 100k so how to deal with such type of outliers ?

wheat yew Aug 20, 2021, 10:49 AM

#

hey need some help with a simple numpy question

#

#

i asked this some day ago as well but couldnt figure it out

royal crest Aug 20, 2021, 10:50 AM

#

have you tried the help channels

#

#❓｜how-to-get-help

#

ah beat me to it

wheat yew Aug 20, 2021, 10:50 AM

#

yes

royal crest Aug 20, 2021, 10:50 AM

#

what have you tried so far

wheat yew Aug 20, 2021, 10:51 AM

#

well i have been taught stack concentate arange and some other basic numpy stuff

#

and i havent been able to do anything lol

#

idk how

#

because this isnt a hard one

#

i had someone help me with slicing and stuff a few days ago

#

actually i got the first function now

#

#!/usr/bin/env python3

import numpy as np

def get_row_vectors(a):
    return [i[np.newaxis,:] for i in a]


def get_column_vectors(a):
    s = [a[:,i] for i in range(len(a))]
    return [i[:,np.newaxis] for i in s]
    
def main():

    

    np.random.seed(0)
    a=np.random.randint(0,10, (4,4))
    #a = [[5, 0 ,3], [3, 7, 9]]

    print("a:", a)
    print("Row vectors:", get_row_vectors(a))
    print("Column vectors:", get_column_vectors(a))

if __name__ == "__main__":
    main()

#

this is what i got atm

#

this almost passes the tests. it says this though:

"FAIL:
RowsAndColumns: test_column_count
3 != 5 : Wrong number of columns"

royal crest Aug 20, 2021, 11:00 AM

#

ahh yea

#

it works when n == m but not when n != m

wheat yew Aug 20, 2021, 11:00 AM

#

ur talking about (n, m)

royal crest Aug 20, 2021, 11:01 AM

#

yes

wheat yew Aug 20, 2021, 11:01 AM

#

kk

#

if u use that list that is hidden with "#"

#

my code fails

#

says list indices are tuples

#

what is [:,np.newaxis] in numbers?

#

how do i fix my code

#

fixed it god damn this one sucked

royal crest Aug 20, 2021, 11:06 AM

#

one problem i found was the range(len(a))

#

ah good

#

👍

wheat yew Aug 20, 2021, 11:07 AM

#

yep that was it

#

i had to put range(a.shape[1])

royal crest Aug 20, 2021, 11:07 AM

#

if in doubt print everything haha

wheat yew Aug 20, 2021, 11:07 AM

#

yep that helped it

#

i have another one but i have to try it first

#

it looks pretty hard tho

royal crest Aug 20, 2021, 11:08 AM

#

i did len(a[0])

#

same thing i guess

wheat yew Aug 20, 2021, 11:08 AM

#

that doesnt work

royal crest Aug 20, 2021, 11:08 AM

#

oh?

wheat yew Aug 20, 2021, 11:08 AM

#

if u call with (5,4)

#

its gonna go up to 4

#

and while the list only has 0,1,2,3

royal crest Aug 20, 2021, 11:09 AM

#

Screen_Shot_2021-08-20_at_9.08.59_pm.png

wheat yew Aug 20, 2021, 11:09 AM

#

"IndexError: index 4 is out of bounds for axis 1 with size 4"

#

this is what i get

royal crest Aug 20, 2021, 11:09 AM

#

I didn't remove range if that's what you mean

grave frost Aug 20, 2021, 11:10 AM

#

thorn bobcat if you train a transformer on the bible

just fine-tune a model on the bible

thorn bobcat Aug 20, 2021, 11:10 AM

#

grave frost just fine-tune a model on the bible

fine-tune a pre existing model?

grave frost Aug 20, 2021, 11:10 AM

#

thorn bobcat fine-tune a pre existing model?

yes

wheat yew Aug 20, 2021, 11:10 AM

#

royal crest I didn't remove range if that's what you mean

did u copy my code

thorn bobcat Aug 20, 2021, 11:10 AM

#

or create it from scratch purely based on the bible?

thorn bobcat Aug 20, 2021, 11:11 AM

#

grave frost yes

but won't it make assumptions from outside the scope of the bible?

royal crest Aug 20, 2021, 11:11 AM

#

wheat yew did u copy my code

yes just added three chars to your get_column_vectors function

wheat yew Aug 20, 2021, 11:11 AM

#

show what u did

grave frost Aug 20, 2021, 11:11 AM

#

thorn bobcat but won't it make assumptions from outside the scope of the bible?

no

royal crest Aug 20, 2021, 11:11 AM

#

def get_column_vectors(a):
  s = [a[:, i] for i in range(len(a[0]))]
  return [i[:, np.newaxis] for i in s]

wheat yew Aug 20, 2021, 11:12 AM

#

ah that works yeah

#

i thought u meant u did

#

range(a.shape[0])

#

this doesnt work

royal crest Aug 20, 2021, 11:12 AM

#

royal crest i did `len(a[0])`

noo haha

#

anyways yay!

#

Penguin_dance

thorn bobcat Aug 20, 2021, 11:20 AM

#

grave frost no

what is fine-tuning exactly

#

I'll have to look into it

cedar void Aug 20, 2021, 11:28 AM

#

I couldn't find any channel for machine learning doubts, so should I post them here?

thorn bobcat Aug 20, 2021, 11:32 AM

#

yes

cedar void Aug 20, 2021, 11:37 AM

#

Topic - Decision Trees in ML in Python

Given two different datasets, a training dataset and a testing dataset, the instruction was to model a decision tree on the training dataset, make predictions, select the best or the most ideal value of max_depth for the tree and then compare the results with the testing dataset.

I thought of splitting the training dataset, writing the training algorithm inside a loop over an arbitrary range, then select the result with the best accuracy and the corresponding max_depth. Is this a good way to get the best value of max_depth?

I would be happy to get suggestions.

cedar void Aug 20, 2021, 11:39 AM

#

cedar void Topic - Decision Trees in ML in Python Given two different datasets, a training...

And the similar process for other machine learning algorithms too.

hollow falcon Aug 20, 2021, 12:20 PM

#

how to change the legend when i plot this way

lapis sequoia Aug 20, 2021, 12:38 PM

#

Is there a written source or any book where I can learn python image processing?
I'm so bored watching videos

royal crest Aug 20, 2021, 1:01 PM

#

arXiv

fallow prism Aug 20, 2021, 1:32 PM

#

Classify text with small data set

Hello, I am looking for ideas and knowledge, my task is classify legal text sentences very particulars and the size of my train data set is 1200 classified sentences, I have to classify in 4 or 5 classes, I mean 4 or 5 because I know what is the problem.

My vocabulary is around 20k (filtered by min_df=10) of unique words and I try classify with BERT, CNN and SVM+TF-IDF.

The length of my sentences is close to 512 words although I can change it.

My scores with the test part of 300 sentences is close to 65% (precision, recall, F1, etc.).

I don't know what I have to try, help me with links or papers or something for text with small data set.

serene scaffold Aug 20, 2021, 1:41 PM

#

fallow prism # Classify text with small data set Hello, I am looking for ideas and knowledge,...

what about it are you trying to classify?

#

the topic of the sentences? something else?

desert oar Aug 20, 2021, 1:58 PM

#

fallow prism # Classify text with small data set Hello, I am looking for ideas and knowledge,...

1200 isn't a lot. i strongly recommend pursuing dimension reduction, so that your model has fewer parameters to learn

velvet thorn Aug 20, 2021, 1:58 PM

#

hollow falcon how to change the legend when i plot this way

not easily

#

plotting with pandas is convenient but you lose customisability

random prairie Aug 20, 2021, 2:16 PM

#

hello all. i am facing problem related to installation of pandas. please help

desert oar Aug 20, 2021, 2:20 PM

#

@fallow prism some options:

use word2vec, glove, fasttext, bert, etc. to generate sentence vectors, then logistic regression
PCA on the count-vectorized sentences, then logistic regression
Factorization machine on the count-vectorized sentences

Logistic regression with L2/"ridge" regularization and linear SVM are somewhat interchangeable; mathematically they amount to the same model with a different loss function. There's also L1/"lasso" regularization and elastic-net regularization which is a blend of ridge and lasso. The differences should be fairly minor among all of these, although you can efficiently compute the entire "regularization path" for elastic-net and lasso, and you can efficiently compute "generalized cross validation" for ridge. Generally I tend to prefer logistic over linear SVM anyway because you also get a decent probability model out of it.

#

basically all 4 models are the same, minimizing the difference between y and w*x , but with different loss functions

lapis sequoia Aug 20, 2021, 2:52 PM

#

lapis sequoia Is there a written source or any book where I can learn python image processing?...

can anyone help?

serene scaffold Aug 20, 2021, 2:55 PM

#

                     crf    bilstm      bert       crf    bilstm      bert       crf    bilstm      bert
               precision precision precision    recall    recall    recall        f1        f1        f1
micro animal    0.930068  0.890407  0.843404  0.702475  0.781577  0.854810  0.800407  0.832450  0.849069
      dose      0.711624  0.668271  0.636848  0.445432  0.617051  0.763121  0.547908  0.641640  0.694290
      exposure  0.853591  0.809184  0.642202  0.542343  0.695919  0.675735  0.663268  0.748290  0.658542
      endpoint  0.685054  0.705040  0.650032  0.367747  0.512205  0.617647  0.478584  0.593348  0.633426
macro animal    0.930068  0.890407  0.843404  0.702475  0.781577  0.854810  0.800407  0.832450  0.849069
      dose      0.711624  0.668271  0.636848  0.445432  0.617051  0.763121  0.547908  0.641640  0.694290
      exposure  0.853591  0.809184  0.642202  0.542343  0.695919  0.675735  0.663268  0.748290  0.658542
      endpoint  0.685054  0.705040  0.650032  0.367747  0.512205  0.617647  0.478584  0.593348  0.633426

I want the columns to be ordered by (crf, bilstm, bert) and then (precision, recall, f1) within those three groups.

#

calling sortlevel on the mulitindex for the columns worked.

#

and reindexing from there.

flat hollow Aug 20, 2021, 3:15 PM

#

@desert oar @pine wolf you guys are wizards 😄 I was worried the whole day about coding in polar coords with numpy (havent done it before) and after a long day I come back to what seems like a working model? I will try to download nurses_2 and run both codes tomorrow, I am completely exhausted after today's hike.

#

@desert oar thanks for including links to explanations ❤️

exotic palm Aug 20, 2021, 3:38 PM

#

so in a couple of weeks im supposed to start an ai with python course

#

which i have been invited to

#

and i dont knwo anything about how those two work together and how to work with them as is

#

I have finished a couple of python courses and somewhat decently know python but i have no idea about its use in ai

#

can i get some help with that?

digital sandal Aug 20, 2021, 3:40 PM

#

Aren't you supposed to learn all that on the course? Or is like an non-beginner course?

exotic palm Aug 20, 2021, 3:41 PM

#

I just want to know what to look for

#

since this is the place i reached out when i was beggining my initial python course

#

Just give me a little info about it so i can understand more of it when it comes to it

trail horizon Aug 20, 2021, 3:44 PM

#

i hope u all are okay
i wondering
hoes anyone know a platform where i can do data analysis interviews ? like leetcode but for data analysis
i know about kaggle
but im asking is something like leetcode

shadow gate Aug 20, 2021, 3:47 PM

#

Is this a good channel for asking help with Telegram bots?

serene scaffold Aug 20, 2021, 3:51 PM

#

shadow gate Is this a good channel for asking help with Telegram bots?

what is a telegram bot?

flat hollow Aug 20, 2021, 4:00 PM

#

trail horizon i hope u all are okay i wondering hoes anyone know a platform where i can do dat...

https://www.stratascratch.com/

civic summit Aug 20, 2021, 4:14 PM

#

Need a bit of help with an ordinal regression. Dv= 10 point scale or 0-10 and IV= are a 4 point scale of 1-4. I have two independent variables that are a 2 point scale, yes/no. I am wondering if I can keep these variables because the basis of my analysis assumes that there is 1 order of magnitude between 1-2,2-3,3-4 etc but with binary it's more like 0-100.

#

Is there a book, I can read up a bit on to understand how to best set up an ordinal regression?

pine wolf Aug 20, 2021, 4:19 PM

#

flat hollow <@!389497659087650836> <@!431341013479718912> you guys are wizards 😄 I was worr...

unless you're super interested in terminal graphics, i might recommend using pygame or kivy to render this, i just used something that was convenient for me -- but i did update it with a nice background circle:

trail horizon Aug 20, 2021, 4:20 PM

#

flat hollow https://www.stratascratch.com/

thank bro its just what i was looking for

flat hollow Aug 20, 2021, 4:20 PM

#

very cool, I would also need to keep track of the number of particles in each of the 3 areas and a 2nd barrier... so much to do 😦 but now at least I have an example to work from 🙂

pine wolf Aug 20, 2021, 4:22 PM

#

does the diffusion rate increase as the diameter decreases?

hasty mountain Aug 20, 2021, 4:37 PM

#

Hey guys, I want to create a new column for a dataset, but I'm having a small issue here. I've used the Close column to get the EMA9 and EMA21 column. However, I've noticed that those EMAs aren't properly alligned(I want the EMA9 for the day 09/19/2014, index 2, to be at the index 1, something that can be made in Excel).

I've tried doing this by removing the first row of EMA9 with

data['EMA9'] = data['EMA9'].drop([0])

However, this only makes the index 0 in EMA9 to become NaN. I've also tried using the argument inplace = True, but this results in the entire column being replaced by NaN.
Can someone lend me a help?

flat hollow Aug 20, 2021, 4:49 PM

#

pine wolf does the diffusion rate increase as the diameter decreases?

🤷 that's probably one of the outcomes of the simulation? I just need a randomwalk with the barriers with some permeability

pine wolf Aug 20, 2021, 4:50 PM

#

my guess is it must --- as diameter decreases there should be more collisions with it, increasing the odds that a cell passes through

flat hollow Aug 20, 2021, 4:50 PM

#

hm... wouldnt that be solved by adding particle collisions?

#

and perhaps some momentum calculations?

pine wolf Aug 20, 2021, 4:51 PM

#

i don't think it matters

#

i think that's just what happens in simulation or real life, probably

#

probably good, so our fingers don't asphyxiate

flat hollow Aug 20, 2021, 4:51 PM

#

oh sorry, I misunderstood your question, I think yes, for the same number of particles there should be an increase

flat hollow Aug 20, 2021, 4:52 PM

#

hasty mountain Hey guys, I want to create a new column for a dataset, but I'm having a small is...

so you just want to remove the 2nd row in column EMA9?

hasty mountain Aug 20, 2021, 4:54 PM

#

flat hollow so you just want to remove the 2nd row in column EMA9?

Just the 1st index and move everything below it up in EMA9 column.

mortal dove Aug 20, 2021, 4:55 PM

#

data[EMA9].shift(1)

#

can't remember if it shifts forward of backwards, so shift(1) or shift(-1)

flat hollow Aug 20, 2021, 4:55 PM

#

mortal dove can't remember if it shifts forward of backwards, so shift(1) or shift(-1)

I think they want to keep index 0 intact

hasty mountain Aug 20, 2021, 4:56 PM

#

flat hollow I think they want to keep index 0 intact

For the other columns, yes. But for EMA9 and EMA21 I want the index 0 to be removed.

flat hollow Aug 20, 2021, 4:57 PM

#

ah, then shift should work

hasty mountain Aug 20, 2021, 4:57 PM

#

Yes, it worked. Thank you!

mortal dove Aug 20, 2021, 4:59 PM

#

Keep in mind that if you're working on a price prediction model or any trading model, you're now looking at the next timeframe's data in that same row, so you're basically looking at a future value - so practically you would not be able to use that value in a real scenario to trade

hasty mountain Aug 20, 2021, 5:03 PM

#

mortal dove Keep in mind that if you're working on a price prediction model or any trading m...

Yes, thanks for the advice. I was trying to get my dataset lined up with the charts where I got the data from. But, now that you've mentioned it...I should probably rethink.

#

At least now I know how to modify datasets rows like this. I had to do this once with another dataset and ended up just opening the DataFrame in an excel file to do this.

lapis sequoia Aug 20, 2021, 6:03 PM

#

how to data science?

serene scaffold Aug 20, 2021, 6:14 PM

#

lapis sequoia how to data science?

try reading "data science from scratch"

lapis sequoia Aug 20, 2021, 6:36 PM

#

serene scaffold try reading "data science from scratch"

u mean...scratch pl?

serene scaffold Aug 20, 2021, 6:38 PM

#

lapis sequoia u mean...scratch pl?

No, it's a book

lapis sequoia Aug 20, 2021, 6:42 PM

#

serene scaffold No, it's a book

ok

#

thx

umbral ferry Aug 20, 2021, 7:12 PM

#

maybe a hard question to answer but... does anyone know the major difference between k-modes clustering and multiple correspondence analysis? they both seem to have similar results and methods, but I'm not sure how to interpret them

wheat yew Aug 20, 2021, 7:22 PM

#

can someone help me with numpy and specifically concencate and how to use it

#

it combines arrays into one but what else can it do

serene scaffold Aug 20, 2021, 7:24 PM

#

wheat yew it combines arrays into one but what else can it do

That's the point of it

wheat yew Aug 20, 2021, 7:24 PM

#

well i got a question thats hard for me and i gotta use concencate in it

#

and some other things

serene scaffold Aug 20, 2021, 7:25 PM

#

What is it?

wheat yew Aug 20, 2021, 7:25 PM

#

serene scaffold Aug 20, 2021, 7:26 PM

#

Do you know how to use the eye function?

wheat yew Aug 20, 2021, 7:27 PM

#

yea i know what it does

#

i havent used it though

#

yeah cant do this without some help seems a bit too hard

desert oar Aug 20, 2021, 7:57 PM

#

hasty mountain Hey guys, I want to create a new column for a dataset, but I'm having a small is...

data['EMA9'] = data['Close'].ewm(span=9).mean().shift(-11)

like this?

hasty mountain Aug 20, 2021, 7:58 PM

#

desert oar ```python data['EMA9'] = data['Close'].ewm(span=9).mean().shift(-11) ``` like th...

Yes, thanks!

desert oar Aug 20, 2021, 7:58 PM

#

out of curiosity why do you want it aligned like that?

hasty mountain Aug 20, 2021, 7:59 PM

#

Just so it matches the chart where I took the data from.

arctic wedgeBOT Aug 20, 2021, 8:06 PM

#

Hey @timid grove!

It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

mortal dove Aug 20, 2021, 8:07 PM

#

hasty mountain Just so it matches the chart where I took the data from.

oh, if you want it to match without shift use this:

data['EMA9'] = data['Close'].ewm(span=9, adjust=False).mean()

hasty mountain Aug 20, 2021, 8:08 PM

#

Nah, what I wanted was indeed shift.

mortal dove Aug 20, 2021, 8:09 PM

#

with adjust enabled, it does the calculation around the centre value, with it disabled it does it around the last value, as it would in a technical indicator on a chart

hasty mountain Aug 20, 2021, 8:09 PM

#

It's just that the EMA9 for day X in the chart was registered as the EMA9 for day X+1

mortal dove Aug 20, 2021, 8:09 PM

#

I'd consider shift a workaround since you'd have to change the value depending on the window you're using

timid grove Aug 20, 2021, 8:09 PM

#

in this i downloaded a dataset of approx 800 images , but this algorithm giving only 102 images as output other images are taken into consideration...

hasty mountain Aug 20, 2021, 8:09 PM

#

mortal dove with adjust enabled, it does the calculation around the centre value, with it di...

Hm...I tried using adjust = False, but it didn't work.

mortal dove Aug 20, 2021, 8:10 PM

#

That's interesting, currently slowly working on a technical analysis library and didn't run into that issue for the EMA

timid grove Aug 20, 2021, 8:10 PM

#

this is the error..

#

plz help!!

lapis sequoia Aug 20, 2021, 8:11 PM

#

anonymous

desert oar Aug 20, 2021, 8:11 PM

#

mortal dove I'd consider shift a workaround since you'd have to change the value depending o...

in their case they literally needed a 1-day shift i think, no?

lapis sequoia Aug 20, 2021, 8:12 PM

#

timid grove plz help!!

are you keen on anonymous group and data science?

timid grove Aug 20, 2021, 8:12 PM

#

lapis sequoia are you keen on anonymous group and data science?

no , could you please help me resolve this

mortal dove Aug 20, 2021, 8:13 PM

#

desert oar in their case they literally needed a 1-day shift i think, no?

From original question yes, they confirmed your solution of shifting by -11 though (probably a typo?), so thought it might have meant something different

lapis sequoia Aug 20, 2021, 8:13 PM

#

timid grove no , could you please help me resolve this

i'm a noob, sorry

timid grove Aug 20, 2021, 8:13 PM

#

lapis sequoia i'm a noob, sorry

np!

desert oar Aug 20, 2021, 8:13 PM

#

mortal dove From original question yes, they confirmed your solution of shifting by -11 thou...

oh hah yes, the 11 was a typo

#

it was supposed to be -1

mortal dove Aug 20, 2021, 8:14 PM

#

Yea, that's why it just confused me a bit, thought they might have the centring issue. Weird that the chart they're pulling from has it offset of 1 though

desert oar Aug 20, 2021, 8:17 PM

#

i suspect there's something missing in their explanation, but 🤷‍♂️

#

perhaps they meant

data['EMA9'] = data['Close'].shift(-1).ewm(span=9).mean()

dense sinew Aug 20, 2021, 8:26 PM

#

timid grove in this i downloaded a dataset of approx 800 images , but this algorithm giving ...

help

lapis sequoia Aug 20, 2021, 10:19 PM

#

Hey. Using matplotlib.pyplot as plt how can I change the appearance of the date tick labels in my graph x-axis without re-writing my code in the way demonstrated in the docs: https://matplotlib.org/stable/gallery/text_labels_and_annotations/date.html

?

I'm not using fig, ax = plt.subplots() and then using fig or ax to control my graph. I'm just using plt so the ax methods are not available to me in the way demonstrated.

velvet thorn Aug 21, 2021, 12:06 AM

#

lapis sequoia Hey. Using `matplotlib.pyplot as plt` how can I change the appearance of the dat...

ax = plt.gca()

#

then go from there

#

in general, though

#

I would advise against using plt methods

#

it makes it a lot harder to do stuff

lapis sequoia Aug 21, 2021, 12:34 AM

#

velvet thorn it makes it a lot harder to do stuff

Thanks, and yes, I'm beginning to see this.

desert oar Aug 21, 2021, 12:46 AM

#

i generally use plt for quick things, and switch/refactor to fig, ax for more involved tasks

#

or use seaborn

vestal agate Aug 21, 2021, 1:10 AM

#

Huber(x)= {12x2 for |x|≤δ,
{δ|x|−12δ2 otherwise.

what do i need to learn to understand this lol

desert oar Aug 21, 2021, 1:17 AM

#

vestal agate Huber(x)= {12x2 for |x|≤δ, {δ|x|−12δ2 oth...

math notation

vestal agate Aug 21, 2021, 1:22 AM

#

desert oar math notation

i wouldnt know

tender hawk Aug 21, 2021, 1:41 AM

#

Hello 😄 I have a questions about Pandas and trying to figure something out (and I'm having very little luck finding out how).

Question: How do I aggregate data within a CSV (Add 2 cells together from different rows based on a common value)
Example:

Employee ID:, Box Type:, Count Per Order
0001, Large, 4
0001, Small, 2
0001, Large, 2
0001, Small, 3
0020, Small, 1
0020, Small, 2

I want to be able to calculate
Employee 0001 - Large - 6
Employee 0001 - Small - 5
etc

#

How would I go about doing this? or would I use something besides Pandas?

odd meteor Aug 21, 2021, 2:13 AM

#

velvet thorn I would advise against using `plt` methods

😀 Idk why but I also prefer Seaborn to Matplotlib. The syntax is more direct and more 'customer friendly'

random solar Aug 21, 2021, 3:03 AM

#

tender hawk How would I go about doing this? or would I use something besides Pandas?

you can use the groupby method

df.groupby('Box Type').sum()```

tender hawk Aug 21, 2021, 3:04 AM

#

random solar you can use the groupby method ``` df.groupby('Box Type').sum()```

Would this keep them separated by employee ID too?

random solar Aug 21, 2021, 3:04 AM

#

tender hawk Would this keep them separated by employee ID too?

if u want that as well do
df.groupby(['Employee ID', 'Box Type']).sum()

lapis sequoia Aug 21, 2021, 3:08 AM

#

Guys I need help.. I want to learn data science and ai I really want to learn but the problem is i don't have anyone around me interested in programming at all.. When I make new project or solve the problem i faced for two weeks there is no one who can celebrate with me.. At least I want someone I can make projects with.. I know it's silly problem I have.. But I just can't do it all by myself..
What do you guys think? Have you ever faced my situation? Any advice?

tender hawk Aug 21, 2021, 3:08 AM

#

random solar if u want that as well do ``` df.groupby(['Employee ID', 'Box Type']).sum()```

Thank you so MUCH! I had gotten "Employee ID" but i didn't even think about adding "Box Type" to the groupby #TrueHero lol thank you!

random solar Aug 21, 2021, 4:16 AM

#

tender hawk Thank you so MUCH! I had gotten "Employee ID" but i didn't even think about add...

lmaoo np

random solar Aug 21, 2021, 4:16 AM

#

lapis sequoia Guys I need help.. I want to learn data science and ai I really want to learn bu...

theres plenty of passionate people in this disc u can celebrate with

lapis sequoia Aug 21, 2021, 4:45 AM

#

i have a question about tree based models like random forest and gradient boosting

#

do any of these methods have the same kind of persistence that ANNs have?

#

like an indefinitely long window for training

vale hedge Aug 21, 2021, 6:37 AM

#

anyone know why tensorflow recommends installing from pip? it seems like a lot of pain to install it. i am wondering why not use a conda install?

ripe forge Aug 21, 2021, 7:09 AM

#

use a conda install if you have it.

serene scaffold Aug 21, 2021, 8:43 AM

#

vale hedge anyone know why tensorflow recommends installing from pip? it seems like a lot ...

Pip is the standard package installer so the default instructions are going to be for installing stuff using it.

serene scaffold Aug 21, 2021, 8:45 AM

#

ripe forge use a conda install if you have it.

Have you used conda? I never have, but I don't really see the advantage. It might be because virtual environments have existed the entire time I've been using python.

ripe forge Aug 21, 2021, 8:58 AM

#

serene scaffold Have you used conda? I never have, but I don't really see the advantage. It migh...

i use conda extensively. it's been absolutely fantastic, however at the same time, it's not "necessary" to use. You can easily do "most" of what you need without conda as well.

#

Where conda truly shines however, is two things: 1. it's not just a python package installer. it's a binaries installer. for any binary that's built for it. That means that conda makes trivial some installs that would be absolutely miserable without it. (actually, if im being completely honest, conda has a branding issue. this tool is so insanely good at what it does, but doesn't get enough emphasis on this aspect). the 2: closely related - because conda is a binaries installer, it can control python installations itself. This means your environments made in conda encompass multiple python versions too, and make it trivial to have different versions of python in different environments with zero friction.

#

Ofcourse, this is all on top of also supporting pip installs. So there's genuinely zero downsides in that sense that i can think of.

#

to keep it simple though, if someone is already using conda, they should use conda installs before pip installs.

vale hedge Aug 21, 2021, 9:02 AM

#

Like what darr said conda is able to install some lower level binaries that support some of python packages. I dont think i have needed to install manually although i dont know if this is optimal for your system. I would assume if you take the time and find the most optimal libraries for your architecture you could get better performance

serene scaffold Aug 21, 2021, 9:08 AM

#

ripe forge Where conda truly shines however, is two things: 1. it's not just a python packa...

I see. Thanks!

novel elbow Aug 21, 2021, 9:21 AM

#

specially when working with different versions of cuda & cudnn

mortal dove Aug 21, 2021, 9:48 AM

#

lapis sequoia Guys I need help.. I want to learn data science and ai I really want to learn bu...

I just babble about stuff with my family, even if they don't care or understand.

flat hollow Aug 21, 2021, 9:52 AM

#

lapis sequoia Guys I need help.. I want to learn data science and ai I really want to learn bu...

I second Guitar's response, even if people around you normally don't care about your field, the ability to explain what you're doing to a layman audience and get them excited about it is a great skill to have. If you talk to them about the project as you're slowly building it, they will also be ready to understand the mode involved things you needed to have done.

novel elbow Aug 21, 2021, 9:57 AM

#

lapis sequoia Guys I need help.. I want to learn data science and ai I really want to learn bu...

if you want to do team projects, it's not necessary that the other person knows python. Try to exploit the domain knowledge of the other person and as a bonus you end up learning a lot from them.

onyx drum Aug 21, 2021, 10:20 AM

#

Does anyone know how I can scale the x-axis labels with matplotlib.pyplot.hist()? I want the histogram to look exactly the way it does now, except that I want to display my x-axis ticks in % rather than fraction, so I would need to scale by a factor of 100

#

Got it. Since it seems common enough, for anyone else:
scale_x = 100
ticks_x = ticker.FuncFormatter(lambda x, pos: '{0:g}'.format(x*scale_x))
ax.xaxis.set_major_formatter(ticks_x)

opal loom Aug 21, 2021, 2:35 PM

#

can someone experienced in machine learning help me in #help-bagel pls, I'm very new to machine learning

shadow gate Aug 21, 2021, 5:40 PM

#

Hey, I made a Telegram bot with Python on VSC, but now I have only one question: does it work 24h/7?

surreal elm Aug 21, 2021, 5:54 PM

#

https://www.youtube.com/watch?v=rW12GgjAY18

YouTube

Jacob Merrill

Smart Project Shrink wrap - Surface normal version

I made this math up from scratch

▶ Play video

#

😄

austere swift Aug 21, 2021, 5:56 PM

#

shadow gate Hey, I made a Telegram bot with Python on VSC, but now I have only one question:...

if you host it yes

#

the bot will work as long as the program is running, so just set it to run on a system that's on 24/7

#

you can use a VPS, a raspberry pi, or any system that you're willing to always leave on

dawn crown Aug 21, 2021, 9:51 PM

#

can someone link me pandas vectorisation with numpy docs or something like that

serene scaffold Aug 21, 2021, 10:08 PM

#

dawn crown can someone link me pandas vectorisation with numpy docs or something like that

I'm not sure I understand the dilemma. Each DataFrame is a layer on top of a numpy array, so pretty much all of its operations are vectorised.

dawn crown Aug 21, 2021, 10:08 PM

#

serene scaffold I'm not sure I understand the dilemma. Each DataFrame is a layer on top of a num...

no I am asking like the numpy methods which we use to vectorise data

serene scaffold Aug 21, 2021, 10:09 PM

#

dawn crown no I am asking like the numpy methods which we use to vectorise data

can you give me an example of some data and what transformation you're thinking of?

#

are you sure you're not talking about encoding?

dawn crown Aug 21, 2021, 10:11 PM

#

serene scaffold can you give me an example of some data and what transformation you're thinking ...

like imagine if i have a.csv

string
aaaa
aaaabbbb
cccccdddd
dddddeeee

now i want i have some substrings like ddee, aa i want to get a output like this

ddee       aa
dddddeeee  aaaa
NaN        aaaabbbb

like how will i vectorise this type of data

#

the first row is the column name

serene scaffold Aug 21, 2021, 10:12 PM

#

I don't see the connection between these two.

dawn crown Aug 21, 2021, 10:13 PM

#

serene scaffold I don't see the connection between these two.

alright here what i would do to approach this problem is basically iterate through the columns and check if the conditon matches, but as i have read iterating is bad practise and we should use vectorisation

serene scaffold Aug 21, 2021, 10:14 PM

#

dawn crown alright here what i would do to approach this problem is basically iterate throu...

What is the condition you are trying to check? I can't tell you how got from the first example to the second example. What is ddee and why is it the first value?

dawn crown Aug 21, 2021, 10:15 PM

#

serene scaffold What is the condition you are trying to check? I can't tell you how got from the...

it is the column name

#

like you see

name
bob
john

#

here name is the column name

mortal dove Aug 21, 2021, 10:15 PM

#

I believe they want to specify column names, and then rows in the column has to have the column name as a substring?

serene scaffold Aug 21, 2021, 10:16 PM

#

can you explain what is happening here? what do these two have to do with each other?

#

this looks meaningless to me.

dawn crown Aug 21, 2021, 10:18 PM

#

serene scaffold can you explain what is happening here? what do these two have to do with each o...

ddee     |  aa                   -> these are column names
---------|-----
dddddeeee| aaaa
NaN      | aaaabbbb

serene scaffold Aug 21, 2021, 10:18 PM

#

dawn crown ```py ddee | aa -> these are column names ---------|-----...

what is it that happens that you go from

aaaa
aaaabbbb
cccccdddd
dddddeeee

to

dddddeeee   aaaa
NaN         aaaabbbb

#

why are you going from four rows to two? what does any of this mean?

#

why is one of them NaN?

dawn crown Aug 21, 2021, 10:23 PM

#

serene scaffold why are you going from four rows to two? what does any of this mean?

Imagine I have two sub-strings ddee,aa right and I have this csv

now I want a output like this where the sub-string are the column names and the strings are in the column and then

so we will get two match for aa but only one for ddee
so we will populate the ddee column with a NaN value

#

serene scaffold Aug 21, 2021, 10:24 PM

#

Alright, give me a moment

dawn crown Aug 21, 2021, 10:25 PM

#

serene scaffold Alright, give me a moment

no i dont want the code, like can just you explain how we will approach this with numpy

serene scaffold Aug 21, 2021, 10:25 PM

#

dawn crown no i dont want the code, like can just you explain how we will approach this wit...

numpy is an implementation detail.

#

you don't have to think about how numpy is involved.

dawn crown Aug 21, 2021, 10:26 PM

#

serene scaffold you don't have to think about how numpy is involved.

?

serene scaffold Aug 21, 2021, 10:26 PM

#

what did you read that made you think to ask this question? I think something is confusing you.

dawn crown Aug 21, 2021, 10:28 PM

#

serene scaffold what did you read that made you think to ask this question? I think something is...

serene scaffold Aug 21, 2021, 10:28 PM

#

In either case, when you're working with numpy and pandas, you should avoid using them with for loops or calling methods like .apply and .map as much as possible, as the other methods are optimized and do all the looping internally.

serene scaffold Aug 21, 2021, 10:28 PM

#

dawn crown

Yes, that is right. You should avoid iterating over rows as much as you possibly can.

#

You just have to look in the docs for what method does what you are trying to do.

dawn crown Aug 21, 2021, 10:29 PM

#

serene scaffold Aug 21, 2021, 10:30 PM

#

The methods that you'd be calling are usually vectorised. you don't have to do extra work to make them vectorised.

#

those things they list: arithmetic, comparisons, reductions, etc. Those are already implemented in pandas. You just have to use them.

mortal dove Aug 21, 2021, 10:30 PM

#

Have always been able to use .apply, .map, and to a smaller extent, .rolling instead of ever having to iterate over rows

serene scaffold Aug 21, 2021, 10:31 PM

#

mortal dove Have always been able to use `.apply`, `.map`, and to a smaller extent, `.rollin...

.apply and .map is the same as iterating over them as far as optimization is concerned. Those are not vectorised.

dawn crown Aug 21, 2021, 10:31 PM

#

serene scaffold those things they list: arithmetic, comparisons, reductions, etc. Those are alre...

so how i will approach that question that i asked without like iterating

serene scaffold Aug 21, 2021, 10:32 PM

#

dawn crown so how i will approach that question that i asked without like iterating

Is there a real example of something you're trying to do? The substring thing that you mentioned seems very obscure.

dawn crown Aug 21, 2021, 10:32 PM

#

serene scaffold Is there a real example of something you're trying to do? The substring thing th...

umm not really just trying to learn this

serene scaffold Aug 21, 2021, 10:33 PM

#

In [45]: s
Out[45]: 
0         aaaa
1     aaaabbbb
2    cccccdddd
3    dddddeeee
dtype: object

In [46]: s.str.contains('aaaa')
Out[46]: 
0     True
1     True
2    False
3    False
dtype: bool

In [47]: s[s.str.contains('aaaa')]
Out[47]: 
0        aaaa
1    aaaabbbb
dtype: object

#

see how s.str.contains('aaaa') gives you True or False for each row as one operation. No looping required.

#

s[s.str.contains('aaaa')] then selects only those rows for which the condition is True.

dawn crown Aug 21, 2021, 10:34 PM

#

ooh

serene scaffold Aug 21, 2021, 10:35 PM

#

dawn crown ooh

so, "vectorised" isn't something you have to do. it's a design concept where a given operation is applied to all the data.

#

Here's a similar concept with arrays

#

In [52]: a
Out[52]: 
array([[3, 2, 0],
       [4, 0, 2]])

In [53]: b
Out[53]: 
array([[3, 3, 1],
       [4, 3, 2]])

In [54]: a + b
Out[54]: 
array([[6, 5, 1],
       [8, 3, 4]])

In [55]: a * b
Out[55]: 
array([[ 9,  6,  0],
       [16,  0,  4]])

#

@dawn crown the different operations are applied element-wise, but syntactically, it looks like you're just adding two things. This is also vectorised.

dawn crown Aug 21, 2021, 10:38 PM

#

alright

serene scaffold Aug 21, 2021, 10:38 PM

#

Works with regular numbers, too.

In [56]: a / 2
Out[56]: 
array([[1.5, 1. , 0. ],
       [2. , 0. , 1. ]])

In [57]: 2 / a
Out[57]: 
array([[0.66666667, 1.        ,        inf],
       [0.5       ,        inf, 1.        ]])

dawn crown Aug 21, 2021, 10:40 PM

#

thanks i will try some pandas general problems

serene scaffold Aug 21, 2021, 10:44 PM

#

dawn crown thanks i will try some pandas general problems

I recommend this: https://www.kaggle.com/learn/pandas

Learn Pandas Tutorials

Solve short hands-on challenges to perfect your data manipulation skills.

lapis sequoia Aug 22, 2021, 12:17 AM

#

Anyone familiar with R and blogdown? Not necessarily a datasci question at this point.

serene scaffold Aug 22, 2021, 12:58 AM

#

lapis sequoia Anyone familiar with R and blogdown? Not necessarily a datasci question at this ...

R questions are out of scope for this server.

lapis sequoia Aug 22, 2021, 1:17 AM

#

Hi guys, I understand that in SVM the regularization term Ccontrols how a complex a model is. For example a high C will tolerate misclassified data points

#

But how does this apply to support vector regression? For example the epsilon tube controls the width of the tube. As such, a wider tube will fit more data points and minimize the slack variables. But how about C here? How does it balance this because now were are trying to fit data points inside the tube

#

I.e regression

desert oar Aug 22, 2021, 2:16 AM

#

lapis sequoia But how does this apply to support vector regression? For example the epsilon tu...

in SV regression, C can be interpreted as controlling the number of points that can fall outside of a pre-defined ±ε error bound, instead of the number of points that can be misclassified. see http://www2.cs.uh.edu/~ceick/ML/SVM-Regression.pdf

in case the link dies, the citation is:

"A Tutorial on Support Vector Regression"
Alex J. Smola and Bernhard Schölkopf
September 30, 2003

lyric ermine Aug 22, 2021, 2:31 AM

#

hey guys, got a small problem with pandas

fifa["Weight"] = fifa["Weight"].astype(str).apply(lambda x: x.replace("lbs", "")).astype(float)

light = fifa.loc[fifa["Weight"] < 140].count()[0]
light_medium = fifa.loc[fifa["Weight"] >= 140] & fifa.loc[fifa["Weight"] >= 155]

error code:

unsupported operand type(s) for &: 'float' and 'float

anyone know how to solve this please? 😦

serene scaffold Aug 22, 2021, 2:44 AM

#

@lyric ermine you only want one call to loc in that last part

#

You shouldn't be using the ampersand in between two calls to loc

#

Though I'm not really sure what the intended logic is

lyric ermine Aug 22, 2021, 2:46 AM

#

can i pm you?

#

i wanna get values of weight between 140 and 155

serene scaffold Aug 22, 2021, 2:46 AM

#

I'm going to bed soon, but even if I weren't, it's better to put your question where everyone can get to it

#

This channel is specifically for this kind of question

lyric ermine Aug 22, 2021, 2:47 AM

#

light_medium = fifa.loc[fifa["Weight"] >= 140] & fifa["Weight"] >= 155

#

you mean like this?

serene scaffold Aug 22, 2021, 2:47 AM

#

Yes. Can you state in English what this is intended to do?

lyric ermine Aug 22, 2021, 2:48 AM

#

i have a series of values with weights

i wanna get the amount of weights between 140 and 155

serene scaffold Aug 22, 2021, 2:49 AM

#

So both of those comparisons need to be inside the call to loc

#

Look where you have your closing ] for loc

#

Also one of the comparison operators is wrong. I think you can figure out which one is wrong.

lyric ermine Aug 22, 2021, 2:50 AM

#

yeah

#

okay ill try some more

#

ty for tips 🙂

serene scaffold Aug 22, 2021, 2:50 AM

#

lemon_hyperpleased farnsworth

lyric ermine Aug 22, 2021, 2:51 AM

#

i never ran into this error before so iam kinda confused haha

#

have a good night

gentle epoch Aug 22, 2021, 3:23 AM

#

trying this

#

from numpy import arange

b = round(-5/2,1)
c = round(5/2,1)

a = list(arange(b,c,0.1))
print(a)```

#

outputting this

#

[-2.5, -2.4, -2.3, -2.1999999999999997, -2.0999999999999996, -1.9999999999999996, -1.8999999999999995, -1.7999999999999994, -1.6999999999999993, -1.5999999999999992, -1.4999999999999991, -1.399999999999999, -1.299999999999999, -1.1999999999999988, -1.0999999999999988, -0.9999999999999987, -0.8999999999999986, -0.7999999999999985, -0.6999999999999984, -0.5999999999999983, -0.4999999999999982, -0.39999999999999813, -0.29999999999999805, -0.19999999999999796, -0.09999999999999787, 2.220446049250313e-15, 0.10000000000000231, 0.2000000000000024, 0.3000000000000025, 0.4000000000000026, 0.5000000000000027, 0.6000000000000028, 0.7000000000000028, 0.8000000000000029, 0.900000000000003, 1.000000000000003, 1.1000000000000032, 1.2000000000000033, 1.3000000000000034, 1.4000000000000035, 1.5000000000000036, 1.6000000000000032, 1.7000000000000037, 1.8000000000000043, 1.900000000000004, 2.0000000000000036, 2.100000000000004, 2.2000000000000046, 2.3000000000000043, 2.400000000000004]```

#

what do I do

royal crest Aug 22, 2021, 3:24 AM

#

floating point error

gentle epoch Aug 22, 2021, 3:24 AM

#

even after rounding?

royal crest Aug 22, 2021, 3:26 AM

#

the problem isn't the rounding

#

it's the arange

gentle epoch Aug 22, 2021, 3:27 AM

#

anything I can do?

royal crest Aug 22, 2021, 3:27 AM

#

since 0.1 isn't exactly 0.1000000000000000000

#

i was going to suggest rounding the arange to whatever decimal points you want

gentle epoch Aug 22, 2021, 3:28 AM

#

I can't figure out that module

gentle epoch Aug 22, 2021, 3:28 AM

#

royal crest i was going to suggest rounding the arange to whatever decimal points you want

like this? a = list(round(arange(b,c,0.1))),1)?

#

right

#

how to use it

#

so, basically, every mathematical operation I want to do with those numbers, I'll have to use decimal?

velvet thorn Aug 22, 2021, 3:51 AM

#

gentle epoch so, basically, every mathematical operation I want to do with those numbers, I'l...

what are you doing?

#

is it an operation that really requires precision?

#

if not, I wouldn't bother

velvet thorn Aug 22, 2021, 3:52 AM

#

gentle epoch ```cmd [-2.5, -2.4, -2.3, -2.1999999999999997, -2.0999999999999996, -1.999999999...

this amount of imprecision is like...eyeballing it, 10^-15 or something?

glossy moth Aug 22, 2021, 7:35 AM

#

Hi all, I've created an sns map of my p-values and all works as intended. I am trying to modify the heatmap coloring to be based around my alpha, as right now 1 is being colored the most, and 0 the least, whereas I really want to highlight significance. Any suggestions on a more significance based coloring method?

shadow gate Aug 22, 2021, 8:33 AM

#

austere swift if you host it yes

Could I host it by myself?

pearl heart Aug 22, 2021, 8:50 AM

#

Hello

shadow gate Aug 22, 2021, 9:48 AM

#

pearl heart Hello

Hi!

mortal dove Aug 22, 2021, 11:16 AM

#

glossy moth Hi all, I've created an sns map of my p-values and all works as intended. I am ...

You can append _r to the end of any colormap to reverse the colors

stuck karma Aug 22, 2021, 11:36 AM

#

hello!
I have an error index 13 is out of bounds for axis 1 with size 1
and i dont understand because i already runned it before and i did work. I removed my new lines and restored the old version

### PLSR ####
#OUVRIR LE CSV
data=pd.read_csv(r'C:\path\donnees_grece.csv')
datalist_x=data.values.tolist()
data=np.array(datalist_x)
print(data.shape)
data=np.random.permutation (data) #mélanger les lignes
print(data.shape)

data_x= data[1:,65: ].astype(float)
data_y=data[1:, 13 ].astype(float)      #13: colonne de cible  clay

#DEFINIR VARIABLES x ET CIBLE y
X=data_x
y= data_y

#DIVISER test set ET train set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

#CHOISIR LE MODELE PLSR ET FIXER LES PARAMETRES
pls = PLSRegression(n_components=15,  max_iter=500)


#CROSS VALIDATION
scores = cross_validate(pls, X_train, y_train, cv=5, scoring="r2", return_train_score="true")
print(scores)  
print (scores["train_score"].mean())       #tableau avec scores pour chaque paquet

#resultats chaque paquet pour test
scores1=cross_val_score(pls, X_train, y_train, cv=5, scoring='r2')
#moy des r² des cv
print (scores1)


#DEFINIR HYPERPARAMETRE NB VARIABLES LATENTES (composantes?)
i= np.arange(1,20)
train_score, val_score = validation_curve(pls, X_train, y_train,
                                          'n_components', i, cv=5)


print(val_score.mean())         #score moyen de CV à toutes iterations jusqu'à 50:S

plt.plot(i, val_score.mean(axis=1), label='validation')
plt.plot(i, train_score.mean(axis=1), label='train')
plt.ylabel('score')
plt.xlabel('n_components')
plt.legend()

devout zodiac Aug 22, 2021, 11:42 AM

#

hello, could anyone refer to me a pytorch specific discord server?

neat cedar Aug 22, 2021, 11:43 AM

#

Hey everyone, I hope this is the right channel to ask this data visualization question:
I'm looking for a library that can produce a graph similar to this Flourish line graph race: https://app.flourish.studio/@flourish/horserace/8
Ideally, I'd like to make it interactive in my react ts frontend, alternatively I'd like to display it as a gif or video.
So far I've only found bar graph races, for example made with plotly express or matplotlib (like those: https://pypi.org/project/bar-chart-race/, https://towardsdatascience.com/bar-chart-race-in-python-with-matplotlib-8e687a5c8a41)
I'm very thankful for any advice and hints, also let me know if I should move this question elsewhere! 🙂

dawn crown Aug 22, 2021, 1:53 PM

#

serene scaffold I recommend this: https://www.kaggle.com/learn/pandas

thanks

rigid fable Aug 22, 2021, 3:37 PM

#

hey guys

#

i want to know if the track of data science in python at datacamp is worth it or not

indigo skiff Aug 22, 2021, 3:39 PM

#

hey guys just a general question about hardware for text generation task. So if i have a basic LSTM with attention layers and beam search algorithm which i want to train and evaluate on multiple datasets ranging between size 500mb to 4gb size (before pre-processing) whats the hardware i would need? For example within cloud how much ram and what kind of gpus i would need for quick training (ideally within 4/5 hours)
2) For fine tuning GPT Models (124M layer) on a 2.6gb dataset (before pre-processing) what kind of gpu + ram i would need. Goal is to finetune and evaluate within 10 hrs.
3) GPT NEO Model. For this how much ram and computational power i would need considering dataset size is 8GB. Goal is to fine tune within 24 hrs

serene scaffold Aug 22, 2021, 3:49 PM

#

indigo skiff hey guys just a general question about hardware for text generation task. So if ...

am I to understand, then that you'll have up to 4gb worth of tensors in memory?

indigo skiff Aug 22, 2021, 3:52 PM

#

yes for 1) LSTM's

serene scaffold Aug 22, 2021, 3:53 PM

#

I'll defer to someone else as I don't want to lead you astray.

indigo skiff Aug 22, 2021, 3:58 PM

#

no worries. Thanks tho

lapis sequoia Aug 22, 2021, 4:12 PM

#

Question about grouping data by datetime64[ns].. I want to know can I group by day/hour/min from? I have been looking for an example but haven't made any progress

#

*using pandas

mortal dove Aug 22, 2021, 4:18 PM

#

!d pandas.DataFrame.resample

arctic wedgeBOT Aug 22, 2021, 4:18 PM

#

pandas.DataFrame.resample


DataFrame.resample(rule, axis=0, closed=None, label=None, convention='start', kind=None, loffset=None, base=None, on=None, level=None, origin='start_day', offset=None)```
Resample time-series data.

Convenience method for frequency conversion and resampling of time series. The object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or the caller must pass the label of a datetime-like series/index to the `on`/`level` keyword parameter.

mortal dove Aug 22, 2021, 4:19 PM

#

Would this be what you're looking for?

lapis sequoia Aug 22, 2021, 4:26 PM

#

that might be it, thanks. I am trying groupby, will look into the above as well

vestal agate Aug 22, 2021, 5:36 PM

#

alpha = 0.01
b0, b1, b2 = 0, 0, 0
x = [4, 7, 5, 7, 27]
y = [4, 3, 7, 7, 4]
z = [8, 10.5, 17.5, 24.5, 54]
error = []
for i in range(50000):
    idx = i % 5
    p = b0 + (b1 * x[idx]) + (b2 * y[idx])
    err = p - z[idx]
    b0 = b0 - alpha * err
    b1 = b1 - alpha * err * x[idx]
    b2 = b2 - alpha * err * y[idx]
    error.append(err)
error = list(map(abs, error))
error.sort()
print(error[:1])
test = float(input())
test1 = float(input())
pred = b0 + (b1 * test) + (b2 * test1)
print(pred)

Im trying to make machine learning to predict this problem,
it is trying to caclulate the area of a triangle without the equation

#

what do i do

#

please ping me

serene scaffold Aug 22, 2021, 5:47 PM

#

@vestal agate if I understand correctly, you're trying to make a model that can predict the area of a triangle given the length of its sides?

#

You shouldn't use machine learning for things when there's a simple, known solution. But if this is just for practice, I guess you could do it with regression.

vestal agate Aug 22, 2021, 5:58 PM

#

serene scaffold <@!872298880082006087> if I understand correctly, you're trying to make a model ...

yes

#

this is linear regression and gradient descent

#

but maby im doing more than 2 variables wrong

#

i only really know how x and y works with b0 and b1

#

not up to b2 or infinity

serene scaffold Aug 22, 2021, 5:59 PM

#

vestal agate this is linear regression and gradient descent

I would put your x, y, and z data into a matrix (ie a numpy array) and look into the regression tools in sklearn

vestal agate Aug 22, 2021, 5:59 PM

#

serene scaffold I would put your x, y, and z data into a matrix (ie a numpy array) and look into...

yeah but its pratice

#

i want to the equation myself

serene scaffold Aug 22, 2021, 6:00 PM

#

in that case I'd use numpy but not sklearn.

grim orbit Aug 22, 2021, 6:16 PM

#

any1 here know how to create histogramms?

serene scaffold Aug 22, 2021, 6:17 PM

#

grim orbit any1 here know how to create histogramms?

yes; is the data in a DataFrame or an array or what?

grim orbit Aug 22, 2021, 6:17 PM

#

yes

serene scaffold Aug 22, 2021, 6:17 PM

#

which one?

grim orbit Aug 22, 2021, 6:17 PM

#

lemme send u the code

serene scaffold Aug 22, 2021, 6:17 PM

#

Post it in this channel as text

#

!code

arctic wedgeBOT Aug 22, 2021, 6:18 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

grim orbit Aug 22, 2021, 6:19 PM

#

import pandas as pd
import seaborn as sns
import scipy.stats
import matplotlib.pyplot as plt
url = "https://covid19.who.int/WHO-COVID-19-global-data.csv"
df = pd.read_csv(url)
filt = (df['Date_reported'] == '2021-08-20')
df1 = df[filt] 
filt = (df1['Cumulative_cases'] >= 2000000)
df2 = df1[filt] 
df2

serene scaffold Aug 22, 2021, 6:19 PM

#

Great, now do print(df.head().to_csv()) and paste that text into this chat the same way.

grim orbit Aug 22, 2021, 6:20 PM

#

,Date_reported,Country_code,Country,WHO_region,New_cases,Cumulative_cases,New_deaths,Cumulative_deaths
5363,2021-08-20,AR,Argentina,AMRO,9764,5106207,247,109652
17283,2021-08-20,BR,Brazil,AMRO,41714,20457897,1064,571662
26223,2021-08-20,CO,Colombia,AMRO,3154,4877323,93,123781
43507,2021-08-20,FR,France,EURO,23102,6384773,105,111839
47083,2021-08-20,DE,Germany,EURO,9280,3853055,13,91956

serene scaffold Aug 22, 2021, 6:20 PM

#

Great, this makes sense. What are you trying to convey with your histogram?

grim orbit Aug 22, 2021, 6:21 PM

#

countries over 2m cases, the new and recoverd per country

serene scaffold Aug 22, 2021, 6:21 PM

#

these are two histograms, then?

grim orbit Aug 22, 2021, 6:22 PM

#

i was thinking can do in one but wouldnt look good so yes 2 would be the better approach i guess

serene scaffold Aug 22, 2021, 6:27 PM

#

grim orbit i was thinking can do in one but wouldnt look good so yes 2 would be the better ...

I did this

df.loc[df['Cumulative_cases'] >= 2_000_000, ['Country', 'New_cases', 'Date_reported']].plot.hist('New_cases', 10)

#

And I got this

#

you can probably mess with it from there

#

!docs pandas.DataFrame.plot.hist

arctic wedgeBOT Aug 22, 2021, 6:28 PM

#

pandas.DataFrame.plot.hist


DataFrame.plot.hist(by=None, bins=10, **kwargs)```
Draw one histogram of the DataFrame’s columns.

A histogram is a representation of the distribution of data. This function groups the values of all given Series in the DataFrame into bins and draws all bins in one [`matplotlib.axes.Axes`](https://matplotlib.org/stable/api/axes_api.html#matplotlib.axes.Axes "(in Matplotlib v3.4.3)"). This is useful when the DataFrame’s Series are in a similar scale.

grim orbit Aug 22, 2021, 6:28 PM

#

Bildschirmfoto_2021-08-15_um_23.09.49.png

#

i wanted it like this

serene scaffold Aug 22, 2021, 6:29 PM

#

ahh, let me think

#

@grim orbit df.loc[df['Cumulative_cases'] >= 2_000_000].plot.barh('Country', 'New_cases')

#

grim orbit Aug 22, 2021, 6:33 PM

#

I see

serene scaffold Aug 22, 2021, 6:35 PM

#

!docs pandas.DataFrame.plot.barh

arctic wedgeBOT Aug 22, 2021, 6:35 PM

#

pandas.DataFrame.plot.barh


DataFrame.plot.barh(x=None, y=None, **kwargs)```
Make a horizontal bar plot.

A horizontal bar plot is a plot that presents quantitative data with rectangular bars with lengths proportional to the values that they represent. A bar plot shows comparisons among discrete categories. One axis of the plot shows the specific categories being compared, and the other axis represents a measured value.

serene scaffold Aug 22, 2021, 6:35 PM

#

I'm not sure how you'd stack the two types of cases or change the colors.

grim orbit Aug 22, 2021, 6:38 PM

#

what package did u use?

#

just pandas?

serene scaffold Aug 22, 2021, 6:38 PM

#

yes, but it's calling matplotlib under the hood

grim orbit Aug 22, 2021, 6:38 PM

#

wym?

serene scaffold Aug 22, 2021, 6:39 PM

#

these are just dataframe methods, but those methods are calling matplotlib functions

#

you don't have to import matplotlib but you do have to have it installed.

grim orbit Aug 22, 2021, 6:40 PM

#

I understand

#

Bildschirmfoto_2021-08-22_um_20.40.47.png

#

why does it look like this

vestal agate Aug 22, 2021, 6:42 PM

#

is there a way to stop nan and inf from showing in pycharm

#

and limit the numbers shown

grim orbit Aug 22, 2021, 6:43 PM

#

u can filter them out?

#

=!

serene scaffold Aug 22, 2021, 6:54 PM

#

grim orbit =!

comparison operations don't work that way for NaN

grim orbit Aug 22, 2021, 6:54 PM

#

i see

#

for my exapmle

#

filt = (df2['Country'] != 'France, Spain, United States of America, England,');

#

how do I write this properly i see no differnce

serene scaffold Aug 22, 2021, 6:57 PM

#

grim orbit ```py filt = (df2['Country'] != 'France, Spain, United States of America, Englan...

You would do ~df2['Country'].isin(['France', 'Spain', 'United States of American', 'England'])

grim orbit Aug 22, 2021, 7:00 PM

#

serene scaffold You would do `~df2['Country'].isin(['France', 'Spain', 'United States of America...

where would it be inserted?

#

when i define?

serene scaffold Aug 22, 2021, 7:01 PM

#

grim orbit where would it be inserted?

the expression I gave you evaluates to a boolean series, you can use it as an indexer in loc

grim orbit Aug 22, 2021, 7:03 PM

#

 df2.loc[df['Cumulative_cases'] >= 2_000_000].plot.barh('Country', 'Cumulative_cases',df2['Country'].isin(['France', 'Spain', 'United States of American', 'England'])))

#

not like this

serene scaffold Aug 22, 2021, 7:07 PM

#

grim orbit not like this

selected = df2.loc[(df['Cumulative_cases'] >= 2_000_000) & ~df2['Country'].isin(['France', 'Spain', 'United States of American', 'England'])]

selected.plot.barh('Country', 'Cumulative_cases')

You also deleted the ~, which was important.

#

~ is a negator. You can read (df2['Cumulative_cases'] >= 2_000_000) & ~df2['Country'].isin(['France', 'Spain', 'United States of American', 'England']) as "where Cumulative_cases is >= 2 million and Country is not in France, Spain, etc."

grim orbit Aug 22, 2021, 7:09 PM

#

I see

#

what does the ~ do?

serene scaffold Aug 22, 2021, 7:09 PM

#

It flips true and false values.

grim orbit Aug 22, 2021, 7:13 PM

#

I see thanks

forest arrow Aug 22, 2021, 8:31 PM

#

is it possible to retrieve info from a enjin forum? (without having to emulate a web browser)

glossy moth Aug 22, 2021, 8:51 PM

#

Hi! I have a dataframe with three columns (df1):
column1: unique Identifier, column 2: 0 or 1, column 3: 0 or 1. I have a second dataframe (df2) with the same three columns, but with 1s and 0s in different rows. I want to join the two, so that if df1 has a 0 for a unique ID in the relevant column where df2 has a 1, df1 gets updated to be a 1. But if df2 has a 0, df1 stays as 1 for that ID, and nothing is done to df1 at all.

Importantly, the lengths of the two dataframes are not the same, and the IDs are not in the same rows, though df2 will always be a subset of IDs in df1.

In reality, my actual databases are around 30 columns as opposed to the 3 in the above example though.

lapis sequoia Aug 22, 2021, 9:07 PM

#

does anyone have a code for the face recognition (opencv)

quiet vault Aug 22, 2021, 9:16 PM

#

lapis sequoia does anyone have a code for the face recognition (opencv)

https://machinelearningmastery.com/how-to-develop-a-face-recognition-system-using-facenet-in-keras-and-an-svm-classifier/

Machine Learning Mastery

How to Develop a Face Recognition System Using FaceNet in Keras

Face recognition is a computer vision task of identifying and verifying a person based on a photograph of their face. […]

#

This isn’t open cv but it is still a really high level with keras

serene scaffold Aug 22, 2021, 9:47 PM

#

@glossy moth it would be easier to follow what you're trying to do if you provided a minimal example (potentially with mock data), but it sounds like you need to merge

#

Also if the "IDs are not in the same rows" then you need to make sure you've set an appropriate index for each frame.

vestal agate Aug 22, 2021, 10:05 PM

#

what library visualizes data like this

serene scaffold Aug 22, 2021, 10:10 PM

#

@vestal agate matplotlib

arctic wedgeBOT Aug 22, 2021, 10:17 PM

#

Hey @near aspen!

It looks like you tried to attach a Python file - please use a code-pasting service such as https://paste.pythondiscord.com

mortal dove Aug 22, 2021, 10:20 PM

#

glossy moth Hi! I have a dataframe with three columns (df1): column1: unique Identifier, co...

Different lengths but df2 indexes are a subset of df1, so if I'm understanding your problem correctly, you want to update column1 to the maximum value of the ID from either df1 or df2 for column1.

idx = df2.index.intersection(df1.index)
df1.loc[idx, 'column1'] = np.where(df2['column1'] > df1.loc[idx, 'column1'], df2['column1'], df1.loc[idx, 'column1'])

#

I doubt it's the best way, but that's what I likely would have done

#

As Stelercus mentioned, you'll want to make sure that you set your index as the ID column

glossy moth Aug 22, 2021, 10:28 PM

#

serene scaffold <@616838159354953749> it would be easier to follow what you're trying to do if y...

Thank you for the help! Sure, see mock example below:

glossy moth Aug 22, 2021, 10:29 PM

#

mortal dove Different lengths but df2 indexes are a subset of df1, so if I'm understanding y...

So in this case, I would want Identifier 3, Group 1 and 2 in the first table to be updated to 1. For identifier 1, I would want nothing to be changed in the first table at all

mortal dove Aug 22, 2021, 10:31 PM

#

Still not sure, do you want the column to change to the max of the two columns, or does it update based on something else? If it's something else could you explain when it should be updated and when it should not be updated?

glossy moth Aug 22, 2021, 10:32 PM

#

Sure sorry. All columns contain binary values of either 1 or 0. If the df2 which is a subset of df1 has a 1 for an ID where df1 has a 0, I want df1 for that ID to change to 1. If df1 has a 1 and df2 has a 0, I want it to do nothing

#

So I want to update df1 just in cases where the ID in df2 has a value of 1 and the ID in df1 has a value of 0, in all other cases, do nothing

mortal dove Aug 22, 2021, 10:33 PM

#

Yea, then the code snippet I provided should work

glossy moth Aug 22, 2021, 10:36 PM

#

mortal dove Yea, then the code snippet I provided should work

Thank you! Just so I understand your code:
idx = df2.index.intersection(df1.index)
df1.loc[idx, 'column1'] = np.where(df2['column1'] > df1.loc[idx, 'column1'], df2['column1'], df1.loc[idx, 'column1'])

line 1 sets idx as the Identifier values that are shared between the two sets. Then line 2 looks at those IDs in column 1 where df2 is greater than those IDs in column 1, and if greater, sets df1 to that value, otherwise, leaves as is?

vestal agate Aug 22, 2021, 10:36 PM

#

hey can someone teach me how to code multi variable linear regression

#

no tutorials that make sense

#

100% know how linear regression and gradient descent works

#

but without multiple variables

mortal dove Aug 22, 2021, 10:37 PM

#

glossy moth Thank you! Just so I understand your code: idx = df2.index.intersection(df1.ind...

Yea, that's correct

glossy moth Aug 22, 2021, 10:37 PM

#

mortal dove Yea, that's correct

Thank you! I'll give it a shot

glossy moth Aug 22, 2021, 10:38 PM

#

mortal dove Yea, that's correct

So in my real situation, I have ~30 columns I need to do this for. Can I just loop through the snippet you provided updating the column as I go, or will this apply all at once to every column?

#

Same number of columns in df1 and df2 and they are named the same and everything if that matters

mortal dove Aug 22, 2021, 10:41 PM

#

Snippet I gave needs to loop over the columns, so

for i in range(1, 30):
        col = f'Group{i}'
        idx = df2.index.intersection(df1.index)
        df1.loc[idx, col] = np.where(df2[col] > df1.loc[idx, col], df2[col], df1.loc[idx, col])

#

I think it should be possible to change it to just apply once, but my 1am brain has stopped functioning at 100%

glossy moth Aug 22, 2021, 10:44 PM

#

Thank you! 🙂

serene scaffold Aug 22, 2021, 10:47 PM

#

@mortal dove it looks like you're recomputing idx every time for no particular reason

mortal dove Aug 22, 2021, 10:48 PM

#

Oh yea, can probably take that out of the loop

serene scaffold Aug 22, 2021, 10:48 PM

#

Though I wonder if this can be accomplished without any loops

mortal dove Aug 22, 2021, 10:48 PM

#

As I said, 1am brain is running on its last fumes

#

I'll probably have a look tomorrow

glossy moth Aug 22, 2021, 10:49 PM

#

serene scaffold Though I wonder if this can be accomplished without any loops

Thank you both! If you know of a way to do it without loops, I'd definitely be super curious to hear! 🙂

serene scaffold Aug 22, 2021, 10:49 PM

#

For your reference, @glossy moth, you want to avoid loops and apply as much as possible in the context of numpy and pandas.

mortal dove Aug 22, 2021, 10:50 PM

#

vestal agate no tutorials that make sense

If they don't make sense you probably don't understand the fundamentals. Have a look at Introduction to Statistical Learning(available for free), Section 3.2 p 71

#

Goes much more in depth than any tutorial will, but it explains the math/stats behind Multiple Regression, not how to code it

serene scaffold Aug 22, 2021, 10:52 PM

#

glossy moth So I want to update df1 just in cases where the ID in df2 has a value of 1 and t...

I think this can be done with one statement if both dataframes have overlapping indices.

lapis sequoia Aug 22, 2021, 10:52 PM

#

Hey guys, how can I turn this final model so that I get cross-validation score of MAE and RMSE with scikit-learns cross_val_score

svr = SVR(kernel = 'rbf',C=100, epsilon=0.1, gamma = 100)
svr.fit(X_train, y_train)

y_pred_train = svr.predict(X_train)

y_pred_test = svr.predict(X_test)

#Metrics - if squared = True returns MSE value, if squared = False returns RMSE value.

#Performance on training set
mae_train = mean_absolute_error(y_train,y_pred_train)
rmse_train = mean_squared_error(y_train,y_pred_train, squared = False)

#Performance on testing set 
mae_test = mean_absolute_error(y_test, y_pred_test)
rmse_test = mean_squared_error(y_test, y_pred_test, squared = False)

print(f'SVR completed in : {round((time.time() - start_time), 2)} seconds...')```

glossy moth Aug 22, 2021, 10:52 PM

#

serene scaffold I think this can be done with one statement if both dataframes have overlapping ...

Yeah the identifiers for df2 are a subset of the df1 identifiers, though order is not the same

#

if there is a way to .apply() and .where() to do this, that would be awesome to learn

#

thanks again for the help!

serene scaffold Aug 22, 2021, 10:53 PM

#

glossy moth Yeah the identifiers for df2 are a subset of the df1 identifiers, though order i...

The order doesn't matter, the identity of the index does.

glossy moth Aug 22, 2021, 10:53 PM

#

yes all identities of df2 are found in df1 somewhere

serene scaffold Aug 22, 2021, 10:53 PM

#

Problem is I'm on mobile so I can't experiment

serene scaffold Aug 22, 2021, 10:54 PM

#

glossy moth if there is a way to .apply() and .where() to do this, that would be awesome to ...

You don't want to use apply if you can get away with it. And you usually can.

glossy moth Aug 22, 2021, 10:55 PM

#

serene scaffold You don't want to use apply if you can get away with it. And you usually can.

Ah ok. I'm just not sure how to get this to apply to every column vs index:
idx = df2.index.intersection(df1.index)
df1.loc[idx, col] = np.where(df2[col] > df1.loc[idx, col], df2[col], df1.loc[idx, col])

velvet thorn Aug 22, 2021, 10:56 PM

#

glossy moth Ah ok. I'm just not sure how to get this to apply to every column vs index: ...

doesn’t really seem like the right approach to me

#

I feel like you should do a join

#

on the index

#

that’s my gut feel anyway

serene scaffold Aug 22, 2021, 10:57 PM

#

@velvet thorn they just need a boolean series from the second dataframe to index the first, but there are indices missing in the second dataframe

#

Beyond that it's just df.loc[...] = 1

velvet thorn Aug 22, 2021, 10:58 PM

#

serene scaffold <@171929073063297024> they just need a boolean series from the second dataframe ...

yeah, so Series.where over the columns on the joined dataframe

#

no?

#

🥴 I read a bit up there

#

but I could be wrong I just woke up

serene scaffold Aug 22, 2021, 10:58 PM

#

I've never used Series.where tbh

velvet thorn Aug 22, 2021, 10:58 PM

#

is the same as the numpy version

#

with a default left

mortal dove Aug 22, 2021, 10:58 PM

#

!d pandas.Series.where

arctic wedgeBOT Aug 22, 2021, 10:58 PM

#

pandas.Series.where


Series.where(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=<no_default>)```
Replace values where the condition is False.

glossy moth Aug 22, 2021, 11:00 PM

#

velvet thorn is the same as the `numpy` version

Thanks for helping as well! It sounds like where will indiscriminately take the two dataframes I've joined and replace 0 values with 1s? I only want replacement went index position of df2 is 1 and index position of df1 is 0, no other times

#

Sorry super new to python in general, I'm sure im misunderstanding!

velvet thorn Aug 22, 2021, 11:00 PM

#

glossy moth Thanks for helping as well! It sounds like where will indiscriminately take the...

nope

#

that’s the point of the condition

#

what it does is basically

#

if condition is true, take from left, otherwise take from right

#

here left is df1 and right is df2

glossy moth Aug 22, 2021, 11:01 PM

#

But haven't I already joined them at this point?

velvet thorn Aug 22, 2021, 11:01 PM

#

yes, but join means

#

both columns are present

#

so more accurately

#

left is the relevant column from df1

#

and right from df2

#

okay it’s a bit hard to visualise

#

if your current solution works

#

just go with it

desert oar Aug 22, 2021, 11:02 PM

#

lapis sequoia Hey guys, how can I turn this final model so that I get cross-validation score o...

Can you clarify the question

#

Same as stelercus im not at a computer but maybe i can answer later

velvet thorn Aug 22, 2021, 11:05 PM

#

you know what really sucks

#

backticks on a phone

#

😔

lapis sequoia Aug 22, 2021, 11:06 PM

#

desert oar Can you clarify the question

So I have done gridsearch to find the best hyperparameters for my SVR but I want a cross-validated score of MAE and RMSE because right now the model is overfitting The complete thing looks like this:

'epsilon': [0.001,0.01,0.05,0.1,1,10],
'gamma': [0.01, 0.1, 1, 10, 100]},cv=5, verbose=0, n_jobs=-1)

gsc = gsc.fit(X_train, y_train)```

```svr = SVR(kernel = 'rbf',C=100, epsilon=0.1, gamma = 100)
svr.fit(X_train, y_train)

y_pred_train = svr.predict(X_train)

y_pred_test = svr.predict(X_test)

#Metrics - if squared = True returns MSE value, if squared = False returns RMSE value.

#Performance on training set
mae_train = mean_absolute_error(y_train,y_pred_train)
rmse_train = mean_squared_error(y_train,y_pred_train, squared = False)

#Performance on testing set 
mae_test = mean_absolute_error(y_test, y_pred_test)
rmse_test = mean_squared_error(y_test, y_pred_test, squared = False)```

desert oar Aug 22, 2021, 11:06 PM

#

@velvet thorn you'd be surprised at how many code blocks I've written on my phone hah

glossy moth Aug 22, 2021, 11:06 PM

#

velvet thorn yes, but join means

Ok let me see if I understand this correctly:
So basically I'm joining them based on df1, which I've stated df2 is a subset of. So I'll now get a dataframe that goes from 30 columns to one with 60 columns, with a lot of the new 30 columns having a bunch of NaN rows. Then I do series.where() and say if columndf1 < corresponding columndf2, replace with columndf2 value, otherwise leave alone?

velvet thorn Aug 22, 2021, 11:07 PM

#

glossy moth Ok let me see if I understand this correctly: So basically I'm joining them base...

kind of

#

this gives you a bunch of Series

#

then you pd.concat them

velvet thorn Aug 22, 2021, 11:07 PM

#

desert oar <@!171929073063297024> you'd be surprised at how many code blocks I've written o...

I occasionally help here on my phone

desert oar Aug 22, 2021, 11:07 PM

#

@lapis sequoia look up "nested cross validation" perhaps, sklearn has it. Although the grid search should already be using cross val

velvet thorn Aug 22, 2021, 11:07 PM

#

it's a bit 🥴

desert oar Aug 22, 2021, 11:07 PM

#

velvet thorn it's a bit 🥴

I definitely don't have patience for it anymore

lapis sequoia Aug 22, 2021, 11:08 PM

#

desert oar <@456226577798135808> look up "nested cross validation" perhaps, sklearn has it....

Yes but I need the score in MAE or RMSE, cross validation score does not say anything about the errors between actual and predicted value.

desert oar Aug 22, 2021, 11:08 PM

#

You can tell it which scorers to use

#

You can even tell it to compute multiple different scores (although it will only use one for selection)

#

The cross validation score is whatever score you request

#

The default i think is rmse for regression and accuracy for classification but you can change it

#

This is described in the reference docs for the various CV classes

lapis sequoia Aug 22, 2021, 11:09 PM

#

desert oar You can even tell it to compute multiple different scores (although it will only...

Yes but I don't understand how to implement it in the code block

desert oar Aug 22, 2021, 11:10 PM

#

I think the parameter is scoring= or scorer=, something like that

velvet thorn Aug 22, 2021, 11:12 PM

#

lapis sequoia Yes but I don't understand how to implement it in the code block

you should read the docs again

#

I have generally found sklearn docs quite clear + comprehensive

lapis sequoia Aug 22, 2021, 11:14 PM

#

velvet thorn you should read the docs again

I have done that. 'mean_absolute_error' is not a valid scoring value. Use sorted(sklearn.metrics.SCORERS.keys()) to get valid options. although they say that that is what to use in scoring

velvet thorn Aug 22, 2021, 11:16 PM

#

lapis sequoia I have done that. ```'mean_absolute_error' is not a valid scoring value. Use sor...

where do "they" say that?

#

because...

#

if you want the MAE scorer...

#

you need to use 'neg_mean_absolute_error'

#

and that is because

#

higher error is worse.

#

this information is available here

#

https://scikit-learn.org/stable/modules/model_evaluation.html

#

so my question is

#

did you read that?

velvet thorn Aug 22, 2021, 11:18 PM

#

lapis sequoia I have done that. ```'mean_absolute_error' is not a valid scoring value. Use sor...

and where was this said?

#

because if it was, then that's a documentation bug, so I would like to see it

stuck karma Aug 22, 2021, 11:19 PM

#

hello guys ~ can you help me with using Grid serch from scikit learn on a pls regression? here is my code:

#

from sklearn.metrics import *
from sklearn.model_selection import GridSearchCV
from sklearn.cross_decomposition import PLSRegression

pls = PLSRegression(n_components=15,  max_iter=500)

n_components= np.arange(1, 100) 
max_iter=[500]   
 #gridsearch : create a dictionnary withhyperparametres                
param_grid = {'n_components':n_components, 'max_iter':max_iter ,
              'metric': ['r2', 'neg_root_mean_squared_error']}
grid = GridSearchCV(PLSRegression(), param_grid, cv=5)

#train the grid
grid.fit(X_train, y_train)

#

it keeps saying Invalid parameter metric for estimator PLSRegression(). Check the list of available parameters with `estimator.get_params().keys()`.

lapis sequoia Aug 22, 2021, 11:21 PM

#

velvet thorn did you read that?

Yes, please have a look, nan values

desert oar Aug 22, 2021, 11:34 PM

#

stuck karma it keeps saying ```Invalid parameter metric for estimator PLSRegression(). Check...

The grid parameters only apply to the model "inside", not the grid search process itself. It also makes no sense to "search" over different metrics. Are you trying to calculate multiple metrics for each step of the search?

#

There is a separate parameter for that, not part of the parameter grid

#

As with the other person, this sounds like a case of needing to read the user guide and reference documentation more carefully

stuck karma Aug 22, 2021, 11:35 PM

#

mh, i already saw code with different metrics

#

desert oar Aug 22, 2021, 11:35 PM

#

lapis sequoia Yes, please have a look, nan values

Are you sure those are the right arguments to cross_val_score?

stuck karma Aug 22, 2021, 11:35 PM

#

in this tuto for example

#

which line? ccross val score is before

desert oar Aug 22, 2021, 11:36 PM

#

stuck karma in this tuto for example

That's the distance metric for the KNN model that I assume is being searched over, there is no "metric" parameter for PLS Regression as per the error and as per the docs

#

Unrelated to the scoring metric

stuck karma Aug 22, 2021, 11:36 PM

#

i see, but in my case im interested by two metrics, does it mean i should apply grid search twice?

#

rmse and r²

#

in the doc you can see the different metrics for pls

grave frost Aug 22, 2021, 11:40 PM

#

why is doing cp on individual files sooo slow 😦

stuck karma Aug 22, 2021, 11:41 PM

#

i dont really get the problem . is the line before metric correct tho?

desert oar Aug 22, 2021, 11:42 PM

#

stuck karma i dont really get the problem . is the line before metric correct tho?

Do you understand what the "parameter grid" is for?

desert oar Aug 22, 2021, 11:42 PM

#

stuck karma i see, but in my case im interested by two metrics, does it mean i should apply ...

You can do that, but I am pretty sure there is an option to add additional scorers or use a list of several scorers

stuck karma Aug 22, 2021, 11:42 PM

#

yes its to select the best parameters to get the best score

desert oar Aug 22, 2021, 11:42 PM

#

Check the GridSearchCV docs

stuck karma Aug 22, 2021, 11:43 PM

#

i think i got the idea?

desert oar Aug 22, 2021, 11:43 PM

#

stuck karma yes its to select the best parameters to get the best score

So do you understand why it doesn't make sense to try to use it to pass multiple parameters to GridSearchCV itself?

stuck karma Aug 22, 2021, 11:44 PM

#

i know that pls has few parameters : ones of them the number of iteration that needs the machine to learn, and the number of components

desert oar Aug 22, 2021, 11:44 PM

#

grave frost why is doing `cp` on individual files sooo slow 😦

Because it's literally copying the data byte for byte, it only goes as fast as it goes. Use rsync maybe?

stuck karma Aug 22, 2021, 11:44 PM

#

by parameter you mean score?

desert oar Aug 22, 2021, 11:44 PM

#

stuck karma by parameter you mean score?

No, I mean parameter. As in, parameters that control how the model behaves. The score/loss function is unrelated

stuck karma Aug 22, 2021, 11:45 PM

#

oh.. so i can use multiple metrics but only one parameter?

grave frost Aug 22, 2021, 11:45 PM

#

desert oar Because it's literally copying the data byte for byte, it only goes as fast as i...

I guess that might me due to other latencies ¯_(ツ)_/¯

#

for row in tqdm(train_df['Image_ID']):
    tgt_img = row + '.jpg'
    !cp ../input/dataset/Train_Images/Train_Images/$tgt_img ./FiftyOneDataset/data/

stuck karma Aug 22, 2021, 11:46 PM

#

stuck karma oh.. so i can use multiple metrics but only one parameter?

if is that so it doesnt bother me to make 2 grid search because i honestly think it is interesting to know for both parameters

#

or maybe i just plot a validation curve

btw it doesnt change, i have the same error Invalid parameter metric for estimator PLSRegression(). Check the list of available parameters with `estimator.get_params().keys()`.
the error is somewhere else

param_grid = {'n_components':n_components,
              'metric': ['metrics.r2_score', 
                         'metrics.mean_squared_error']}```

grave frost Aug 22, 2021, 11:48 PM

#

@desert oar no difference with rsync 😦

desert oar Aug 23, 2021, 12:05 AM

#

grave frost <@!389497659087650836> no difference with `rsync` 😦

On first run there won't be, but subsequent runs should be much faster

#

I recommend using DVC + rsync for this kind of thing

#

Or a Makefile with a wildcard

#

./bar/%: ../input/foo/%
    cp $< $@

I used 4 spaces instead of a tab because mobile, but that's the idea

desert oar Aug 23, 2021, 12:07 AM

#

desert oar ```make ./bar/%: ../input/foo/% cp $< $@ ``` I used 4 spaces instead of a ta...

./bar/%: ../input/foo/%
    dvc run -d $< -o $@ cp $< $@
    git add $@.dvc $(dir $@)/.gitignore

#

Rsync I think does more intelligent file diffing or deduplicating or something

desert oar Aug 23, 2021, 12:09 AM

#

stuck karma or maybe i just plot a validation curve btw it doesnt change, i have the same e...

Because you're still misusing it in the same exact way

stuck karma Aug 23, 2021, 12:10 AM

#

desert oar Because you're still misusing it in the same exact way

mh...

desert oar Aug 23, 2021, 12:10 AM

#

stuck karma oh.. so i can use multiple metrics but only one parameter?

Grid search lets you search over a grid of parameters for the model using one or more scorers. Choosing which scorers to use is not part of the "parameter grid", it's a completely different setting

#

The "metric" in that one example screenshot was unrelated to the scoring

#

In that very specific example, the model happened to have a parameter called "metric"

stuck karma Aug 23, 2021, 12:11 AM

#

nooo i know i want both score, its not which score?

#

you said using one or more scores?

desert oar Aug 23, 2021, 12:11 AM

#

stuck karma nooo i know i want both score, its not which score?

This is a different parameter in GridSearchCV, do not use the parameter grid for this

#

Please read the docs and the user guide

stuck karma Aug 23, 2021, 12:11 AM

#

okay.. maybe you give me a simple example maybe?

desert oar Aug 23, 2021, 12:11 AM

#

Stop guessing based on examples

stuck karma Aug 23, 2021, 12:11 AM

#

i already did

#

im not fluent in english so i read but sometimes it tooks time to understand

vestal agate Aug 23, 2021, 12:12 AM

#

why yall mad

stuck karma Aug 23, 2021, 12:12 AM

#

lmao

#

its 2 am im soooo tired

desert oar Aug 23, 2021, 12:12 AM

#

stuck karma im not fluent in english so i read but sometimes it tooks time to understand

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html read the "scoring" parameter doc

stuck karma Aug 23, 2021, 12:12 AM

#

okay

desert oar Aug 23, 2021, 12:12 AM

#

But go to sleep first

stuck karma Aug 23, 2021, 12:13 AM

#

no i have a meeting tomorrow

#

im stuck . I take too much time to correct my errors

#

lemme read

desert oar Aug 23, 2021, 12:13 AM

#

I see. I understand it's probably not easy to read these docs if you don't read English well

stuck karma Aug 23, 2021, 12:14 AM

#

yes, also im sorry if i ooks mad lmao

#

i'm happy you answered tbh

desert oar Aug 23, 2021, 12:14 AM

#

I apologize for sounding annoyed. We get a lot of people in here who don't want to learn and just want people to do their homework for them

stuck karma Aug 23, 2021, 12:15 AM

#

yes i know, i really wanna learn i think i m improving my english skills progressivly

desert oar Aug 23, 2021, 12:16 AM

#

Yes, unfortunately programming tends to be very English-centered

stuck karma Aug 23, 2021, 12:17 AM

#

to have an idea about how i read the documentation: my eyes look to key words because i know them, but sometimes the verbs or the synthax makes me confuse

#

yes but its a good training !

#

it just needs time

#

okay this part is about multiple scoeres i guess?"If scoring represents multiple scores, one can use:"

#

isnt what i tried to do? with 'metrics ':[...]

#

okay so i tried to read but i dont think i got more than what i previously thought :x

glossy moth Aug 23, 2021, 12:29 AM

#

velvet thorn then you `pd.concat` them

Hi, I was asking a bit about this earlier but I realized some info that simplifies things:
I have two dataframes of equal size. All cells other than an identifier column contain 0 or 1. The only difference between the two is that cells differ on which rows have 0 and which have 1. I want to compare equivalent cells in the two, and if there are any occurences of a 0 in one df where the other has a 1, replace it with a 1. If both have 0 leave 0. If both have 1, leave 1. Is there a quick way to do this without goign column by column?

near aspen Aug 23, 2021, 12:32 AM

#

anyone here familiar with pandas?

#

what method is used for something like that

fossil idol Aug 23, 2021, 12:35 AM

#

Please I need help

#

Why do I keep getting this error

#

#

This is jupyter notebook

#

I keep getting this error "AxesSubplot:

rough mountain Aug 23, 2021, 12:42 AM

#

fossil idol

https://stackoverflow.com/questions/41898485/pandas-dataframe-error-matplotlib-axes-subplots-axessubplot

Stack Overflow

Pandas dataframe error: matplotlib.axes._subplots.AxesSubplot

import pandas as pd
import matplotlib.pyplot as plt

file = 'd:\a\pandas\test.xlsx'
data = pd.ExcelFile(file)
df1 = data.parse('Link')
df2 = df1[['dataFor', 'total']]
df2
returns:
print (type...

#

this might be useful

fossil idol Aug 23, 2021, 12:43 AM

#

I imported them already @rough mountain

#

@rough mountain it worked. I typed %matplotlib online and the graph showed. Thanks

rough mountain Aug 23, 2021, 12:48 AM

#

welcome

#

I'm just going to use this as an image processing channel as it's the closest thing.

#

When I floodfill this image with cv2

serene scaffold Aug 23, 2021, 12:59 AM

#

kitty!!!!!!!!

rough mountain Aug 23, 2021, 12:59 AM

#

I get this strange result