#data-science-and-ml

1 messages · Page 370 of 1

safe elk
#

Yes Linear Algebra is extensively used in graphics libraries (esp 3D) and the graphics accelerator libraries in turn helps ML (cuda)

cobalt hearth
#

here is input

#

so you see its not a basic page with full 4 edges

#

because its a book

#

so you cant simply use the biggest contour

#

my approach is was to first use canny canny resulted in lines with BIG gaps

deft pollen
cobalt hearth
#

so i use contours with convex haul

cobalt hearth
cobalt hearth
#

and got this

deft pollen
#

wowza

#

looks cool, but near useless as is 😛

cobalt hearth
#

now what i want to do is use just the outer layer of it as a contour so i can get 4 points so i can warp it up

deft pollen
#

Heiroglyph-creator xD

cobalt hearth
#

but how to do that

cobalt hearth
deft pollen
#

One sec, I saw a similar tut on Murtaza's Workshop, might help.

cobalt hearth
#

my phone may shut down in any second btw

cobalt hearth
deft pollen
#

In this video we are going to learn how to create an Augmented reality application using opencv. We will use feature detection to find our Target image and then overlay a video on top of it. We will write the code from scratch going step by step so it is easy to follow.

Code and Text Version:
https://www.murtazahassan.com/courses/opencv-proje...

▶ Play video
#

Maybe something from this project can help do what you're doing, not sure tho. Cropping squares on angles n such

cobalt hearth
#

yeah may help me thanks

deft pollen
#

thats part 1 of 3

deft pollen
# cobalt hearth yeah may help me thanks

What if you could detect the book contour, then replace the background with the same majority color, transparify the outer edge, and re-contour the paragraphs?

#

But I see the page you showed has red bleeding to the page's edge.

inland zephyr
#

G'day all
i would like to ask if anyone in here have proper reading material about embedding analysis using T-SNE

#

and how to do proper visualization with the result

inland zephyr
#

for example, let said i have 100 class/user where each user have n embedded data. Should i set the perplexity to the number of user or other else

mossy stratus
#

How do I fix this pytorch error?

 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  row_ids = torch.arange(past_length, input_shape[-1] + past_length,
Failed with forward() missing 1 required positional argument: 'has_cache'```
visual spear
#

I keep getting this error, I can't figure out why:

  row_ids = torch.arange(past_length, input_shape[-1] + past_length,
Failed with forward() missing 1 required positional argument: 'has_cache'```

The code is in this: https://colab.research.google.com/drive/15vFLeepkSTr1qd4xs31g9kMEiwkWP0sh
How do I get rid of that problem so that it will actually fine tune the AI?
final field
#

'/usr/lib/libtensorflow_io.so' (no such file) in python3.9 m1 mac

fleet wedge
#

Hi Im starting in AI. Do you know any tutorials or books for me to read?

last echo
#

What is the difference of categorical cross entropy and sparse categorical cross entropy? is about the way images in classes is labeled? Like 1,2,3,... or corn1,corn2...?

lapis sequoia
fleet wedge
lapis sequoia
iron basalt
#

Straight from the source, " Sutton is considered one of the founders of modern computational reinforcement learning,[1] having several significant contributions to the field, including temporal difference learning and policy gradient methods. "

cobalt hearth
#

anyways good idea im gonna try more than one approach and see the fastest one and most reliable

last echo
cobalt hearth
#

@last echoim there ill try to help

cobalt hearth
last echo
cobalt hearth
#

ok

#

you just need your cnn to be able to categorize a picture of a food right?

cobalt hearth
#

i have used something very good before one second

last echo
#

the prediction ends up so bad, I do not even know how to improve

cobalt hearth
#

ok nvm what you built i just what i had what i used was a python module that does all the job of the cnn for you just need to work on the data

#

but first of all how many training and testing images you have?

lapis sequoia
#

im borrrrrrreeeeeeed so give me stuff that i can make any sgigestion?????

last echo
cobalt hearth
#

this is it

#

you dont have enough data.

#

here is what i used

last echo
#

I split the data as I do not know how to use the test folder without class

cobalt hearth
#

i was building a automobile classifier and it worked really well

#

so you need to scrap images from google

#

resize them all to one size

last echo
#

the dataset source is only from kaggle provided sir.

cobalt hearth
#

minimum is 2000 training images for each class and 1000 testing for each class

#

images need to be in all rotations and angles of the foods you have

hollow gust
#

Hey guys, I am relatively new to coding and i wanna get into machine learning

#

where should i start

#

i have been doing a python course recently

last echo
upper spindle
#

why cant i iterate over a dataframe?

cobalt hearth
#

do you mind me editing some of your code on the google colab?

last echo
last echo
cobalt hearth
#

no ill just code it and you run the tests

#

check your dms

hollow sentinel
#

lmao attribute errors make sm more sense when you understand oop

#

omg i remember groupby from sql

young dock
#

I'm having trouble with updating conda

#

$ conda install -n base conda=4.11.0


UnsatisfiableError: The following specifications were found to be in conflict:
  - conda=4.11.0
  - nbpresent
Use "conda info <package>" to see the dependencies for each package.```
#

$ conda update -n base -c defaults conda



==> WARNING: A newer version of conda exists. <==
  current version: 4.5.11
  latest version: 4.11.0

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.```
#

it's telling my to update conda with the command i just ran...

serene scaffold
#

@young dock are you sure that you have to use anaconda? I don't recommend it to newcomers to Python, even though a lot of data science resources treat it as the default assumption.

young dock
#

i don't have to, no

serene scaffold
#

then I would just use "regular python"

young dock
#

i was using it since i do like jupyter

serene scaffold
#

you can use jupyter without conda

young dock
#

oh

serene scaffold
#

the point of conda is that you can install pre-built binaries, but the need for that is increasingly rare.

#

I work for a research company, and we're actually moving away from it as much as we can.

young dock
#

Conda (the old version) + jupyter was working fine for me for a while, it's just there was an error when I tried installing pygraphviz and stackoverflow said to update conda.

I will try jupyter without conda, thanks.

serene scaffold
#

you can just install jupyter with pip, start it from the terminal, and open it in your browser

#

I'm not sure if that's different from the workflow to start it with conda, though

young dock
#

i think the only difference is you start it from the anaconda terminal

acoustic forge
#

Hey guys - So I am trying to develop an automatic way of doing meta text generation. I am scraping a couple of websites, that I have gotten permission to, but the issue is in my data fetching process. Either I:

  1. Get a ton of text, a lot of it useless like "Previous Page" or "Related Products" because I target all text on a webpage.
  2. Get very little text, because I only target h1 and h2 tags.

I was wondering if there's a model that can tell whether a specific sentence is informative or not. Does my problem make sense? Otherwise I'll try to explain better

cobalt hearth
#

is thier a way i can find our how many vertices are there in a contour?

hollow sentinel
#

oh by the way, i was doing some googling and found a linear algebra course for ML on youtube. should i link it in here?

#

i found two that seemed decent to me

#

i also found one for multivariate calculus (calc 3), but unfortunately i'm still at calc 1 and learning as we speak

safe elk
serene scaffold
grave frost
#

sir, every matrix is a linear mapping. Which is what a tensor is. And it is a vector in CS - not physics.

desert oar
civic stone
#

Good Afternoon Everyone,

I have one more question please, as i am using K-Means for Clustering Documents, i would like to evaluate my cluster results,
my question is which evaluation measure i should use to know the accuracy of my cluster results?

Thanks for your support

desert oar
#

maybe there are some deep learning approaches to identifying "interesting" sections of html, but that seems like a much harder problem and probably requires a huge labeled dataset + clever feature engineering

acoustic forge
desert oar
desert oar
#

i wonder if there is a way to encode the html hierarchy in a clever way such that you can compute some kind of tf-idf score on it, and thereby automatically filter out chunks of the page that are high-frequency/low-importance

#

humans are good at that kind of thing... machines, maybe not yet

acoustic forge
desert oar
#

maybe a graph neural network might be useful for this

#

maybe if you smash all the text into a single "document", BERT could recognize that sentences like "Next page" are meaningless and learn to ignore them

acoustic forge
lapis sequoia
#

hello everyone i wonder if there is anyone here is with knowldge of deepfake videos TS_supershyy

young dock
#

So I installed pygraphviz by first downloading and running the exe for graphviz and then by running the pip command they said to run in the pygraphviz docs (https://pygraphviz.github.io/documentation/stable/install.html#windows-install), in a venv.

Then I made a jupyter kernel for that venv and tried importing pygraphviz in a notebook with that new kernel, which didn't work.

Then I tried installing pygraphviz through jupyter directly using sys.executable as jakevdp suggested in his blog:

In [7]: !{sys.executable} -m pip install --global-option=build_ext --global-option="-IC:\Program Files\Graphviz\include" --global-option="-LC:\Program Files\Graphviz\lib" pygraphviz

then I tried importing pygraphviz, and got this error

     11 # Import the low-level C/C++ module
     12 if __package__ or "." in __name__:
---> 13     from . import _graphviz
     14 else:
     15     import _graphviz

ImportError: DLL load failed while importing _graphviz: The specified module could not be found.```

getting this error trying to import pygraphviz even though both pygraphviz and graphviz appear to be installed
arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @gloomy holly until <t:1643044011:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

desert oar
desert oar
#

did you get any errors during the pip install? normally you would have to run that command in a "developer prompt", which is part of visual studio

lapis sequoia
#

I just finished this

#

I'm not sure where to go next

#

This guy takes over 5 months for a single episode

#

and I'm not gonna wait

#

What free resources can I learn neural networks from scratch with

desert oar
lapis sequoia
#

I do not want to pay 30 pounds for a book, I would but I can't

lapis sequoia
desert oar
#

Also the 100 Page Machine Learning Book by Burkov believe is "pay what you want"

lapis sequoia
#

I just don't know the rest of the code

#

the coding stuff

desert oar
lapis sequoia
desert oar
#

I recommend getting familiar with reading software documentation, and then spend time just doing projects

lapis sequoia
#

like weather prediction

desert oar
#

Pytorch has decent tutorials in their docs

acoustic forge
#

It's defined in that meta tag

lapis sequoia
#

on back propagation and all the neural network stuff

#

I just need a free resource for neural networks 😩

desert oar
lapis sequoia
#

That costs money tho

#

Need something for free

desert oar
desert oar
lapis sequoia
#

I'm 13 no job cant get one

desert oar
#

Also the pytorch docs are free

#

As is Towards Data Science as long as you open a private browsing tab

lapis sequoia
young dock
acoustic forge
young dock
desert oar
desert oar
desert oar
young dock
#

No, just the normal one

acoustic forge
desert oar
desert oar
#

but in all seriousness, you should definitely get a sense from your advisor of what they think your capabilities are

#

If your advisor is an active machine learning researcher, it might be doable. If not, you might struggle for lack of good resources and help

#

Because you are kind of inventing something new, even if it means you are composing pre-existing tools together

#

I think it's a great project idea and it does sound like it should be doable with off-the-shelf machine learning stuff, but if you don't have good support for the project then I would consider scaling down your ambitions

#

Not because you aren't capable, but because doing things for the first time is always difficult, and having somebody experienced looking "over your shoulder" can help a lot

#

Consider also that it will take a lot of time to put together the HTML processing heuristics

acoustic forge
desert oar
#

And if you haven't scraped all the webpages that need to scraped yet, that might take several hours, it might need debugging, etc

desert oar
#

I don't know what the state of the art is in "supervised" caption generation

#

That's where a good actively practicing technical supervisor will really help, they can point you in the right direction and steer you away from dead ends

lapis sequoia
#

@desert oar can u send me a link to ML by burkov

desert oar
#

maybe it was free when it first came out

lapis sequoia
#

im gonna kill myself

#

what about the other one

#

the data science one

acoustic forge
lapis sequoia
#

🍞

desert oar
#

@lapis sequoia check out OpenIntro textbooks on Statistics, Probability, etc. they don't cover neural networks, but you will benefit from studying stats and related topics

lapis sequoia
#

i need neural nets tho

#

that's what im most interested in

hollow sentinel
#

try the o'reilly book

#

machine learning

lapis sequoia
#

no point of a benefit if you cant use it

#

does it cover neural nets

hollow sentinel
#

i believe so yes

#

do you know the basic algos

#

if you don't it's not benefical to really jump into neural nets

#

i wouldn't recommend it

acoustic forge
stone marlin
# grave frost sir, every matrix *is* a linear mapping. Which is what a tensor is. And it *is* ...

Yes, every matrix is a linear mapping. A tensor (of full rank, if you want to be specific) is a multilinear mapping. Even in CS, a vector is a tensor of rank one[1] --- a trivial tensor, which should not be representative of the entire set of tensors. My point was/is: if you're saying that it's just a list of scalars, you're throwing out most of the important structure.

[1] https://www.tensorflow.org/guide/tensor

lapis sequoia
#

I just followed the series from the start and i understand it

lapis sequoia
acoustic forge
lapis sequoia
#

im doing a lin alg course

#

just started it

#

but do i actually need to know it before i can make some progress

stone marlin
#

You can make "progress", but you'll probably be bound by doing simple things which are using pre-existing (or common) models, unless you know the theory around NNs.

lapis sequoia
#

as for the second one i do but i just dont know back propagation and what to actually do with the model

stone marlin
#

Actually, I'm not sure why I'm noting this --- this is true in ML in general, but I'm not a NN expert, so someone else should weigh in on that.

lapis sequoia
#

yall can judge the series as you want but seems to me that it covers the basics

#

im taking edX course

#

on lin alg

stone marlin
#

That sounds good to me. As long as you're learnin' towards a goal, sounds good.

lapis sequoia
#

issue is with resources for the ML and NN

#

i dont have those

#

literally have nothing and i really want to avoid paying

stone marlin
#

The thing with ML (idk as much as others here about NN) is that it divides pretty hard into two sets of classes: those which are very, very math-heavy (Andrew Ng) and those which are much more light-weight and focus on just doing the code and not worrying too much about the math.

#

Either one is fine, but I'm assuming the second one is better for you right now, at least?

lapis sequoia
#

Yeah but I want to go into math heavy

#

stuff

cobalt hearth
lapis sequoia
#

and study lin alg

stone marlin
#

I agree, I think Linalg is really important to pick up. I think a good thing might be to take a "lightweight" DS class now, just to see how the things look.

lapis sequoia
#

and then convince my dad to buy the 30 pound neural network book which I need

cobalt hearth
#

first thing first you should be atleast a 11th grader so you dont struggle with going math heavy

lapis sequoia
#

Because he obviously will buy programming resources for me (he got me into coding nearly 2 years ago) but he doesnt think im ready for the maths

stone marlin
#

Yeah, I'd say at least calculus is required. Linalg will 100% help, and stats should be there as well.

lapis sequoia
#

lmfao

#

I'm 13

#

I'm aware that lin alg is 🥶

stone marlin
#

I don't remember what grade that is --- it's been a while since I was 13.

lapis sequoia
#

and not easy but I got an edX course on lin alg which my dad can help me with

cobalt hearth
#

may good be with you you will have to learn about alot of things alone but then when you go to 9th grade or 11th you will find everything easier because you already grinded alone and learned it

lapis sequoia
#

The maths isnt easy but il manage it

lapis sequoia
stone marlin
#

Cool. That's a great start if you're interested in this stuff. I think the math in the NN book might be a bit much, but it's prob not bad to start it up, and having help will be useful.

lapis sequoia
#

defo

cobalt hearth
#

i think as a good start to with text NeuralNetworking since it includes less math or easier math

lapis sequoia
#

simply wanted something new so chose ML and nn

cobalt hearth
#

if you go with images or audio you will need to learn things you will only need in uni

#

but why not right?

cobalt hearth
#

yeah audio is a little easier than images

#

not like i have done audio before but thats what i think

stone marlin
#

Despite workin' in the field for a while, I'm quite new at NNs --- I've used autoencoders for work, but that's really all I've needed --- and I was surprised at how effective they were for some things. I'm messing around with the LSTM models for text and I'm both excited for the results and disappointed that all of it is pretty much a black box.

LIME has not been kind to me so far, I'm going to be checkin' out SHAP either today or tomorrow.

odd meteor
# lapis sequoia literally have nothing and i really want to avoid paying

You don't wanna pay now or you don't even have the intention of ever paying for a course? 😀

Well, say no more. Check this thread I made on twitter
https://twitter.com/itsDonmonc/status/1483388714344714245?t=Hd70vHV_oKV_JgCqqp-13g&s=19

For the ♥ of open source, I'm excited to share some of the best & most recent Machine Learning courses available on YouTube.

No doubt, learning ML on YouTube could trigger Information Fatigue. However, it's my hope that this curated playlist solves that

https://t.co/6XawsNntee

stone marlin
#

Dang, this makes me wanna start one of'em up!

cobalt hearth
#

you should work in an Ad Agency @odd meteor

#

or a Marketing Agency you can nail it

#

that was smooth 😂😂

odd meteor
lapis sequoia
#

thanks

#

I want to pay but eh sometimes you cant

#

rn if i could id get neural networks from scratch book, data science in general book

odd meteor
lapis sequoia
#

ty

#

@odd meteor

#

is this good?

#

its pretty long 😵‍💫

#

43 hours

#

this is math heavy right

acoustic forge
#

Cause the rest is the same content from previous years

lapis sequoia
#

first 12 are good for learning right

#

they will teach me how to use back propagation and optimizers and how to train a model

acoustic forge
lapis sequoia
#

But I'm pretty good at other coding

#

language developement

#

good general knowledge of python

desert oar
#

i still think "youtube" is not a good goal for learning. videos are an adjunct to reading and practice problems

lapis sequoia
#

if i learn lin alg then my dad could buy me the book i need

#

but until then this will do

acoustic forge
lapis sequoia
#

Im trying lin alg lol

desert oar
#

fair enough. also keep in mind that you are very young and have literally years of time before you are expected have even started thinking about these topics

lapis sequoia
#

as long as a machine learns

desert oar
#

if you are wanting to learn about the code, i still think you should look at the pytorch documentation

acoustic forge
desert oar
#

also the MIT Deep Learning course publishes its course materials, not just the video lectures

hollow sentinel
#

look up a linear regression model on kaggle and play around with it

#

if you haven’t already

#

look at logistic regression as well

#

learn the basics of how a linear regression model works with stat quest

#

learn how a linear regression model is limited and cannot handle something like e-mail spam

#

before everything i just stated get the basics of supervised and unsupervised learning down

#

you gotta know the alphabet and other stuff before you start reading and that really applies to learning ML… fundamentals are very important

#

not only that but also you should build a basic understanding of pandas and numpy … corey schafer has excellent vids on that

cobalt hearth
hollow sentinel
#

i like to read the documentation after i learn a thing from the video series

#

and then write down the key points

#

that’s how it makes sense to me

#

but medium articles are good

cobalt hearth
#

i just read those artcles they usually containe information on how every function works under the hood and if not wikipedia usually explains it for me

hollow sentinel
#

true

cobalt hearth
#

also sometimes if i still cant get it i take an example code an play around with it and check each time whats the difference in the output

hollow sentinel
#

that is good too

#

as long as you can understand what you've written and you are able to explain it

sweet prism
#

Hi you all!

#

I would like to show the most common elements across groups. Does anyone can help?

#

So I have a time series which I grouped by year and month. In this timeseries there is an id column and I want to veerify which are the ids that "repeat" the most across the groups

desert oar
#

but yes i agree, video lectures are good for conceptual understanding mostly

grave frost
cobalt hearth
#

something as simple as a numpy array can be considered as a tensore as far as i know

stone marlin
#

Yes. A tensor can represent a single linear mapping, as I noted. It is a rank-two in this case. It can also be a rank-one and represent a vector. It can also be rank-zero and represent (technically not a scalar, but a 1-d vector).

grave frost
#

@stone marlin look at this way, torch.tensor([1]) is a scalar, no?

#

but it is inside a tensor

grave frost
stone marlin
#

So, you're literally saying that you don't like the pictorial representation of what a vector tensor signifies in the mathematics because, when you put it into a programming language, it "looks like" an array.

grave frost
#

no...because it is an array in CS

#

its only in science (physics/math) where it diverges significantly, evolving their own systems of computations

stone marlin
#

It is the same interpretation if you are using it in CS vs. mathematics --- especially if you're doing work with tensors in NNs.

grave frost
#

perhaps it might be - I am not educated in Riemann Tensors and stuff, which is a different branch of mathematics. but in CS, its kept simple

stone marlin
#

I'm not trying to be pedantic here about things "technically not being scalars because they need to be in a vector space" and other pure-math nonsense, I'm saying that to use a tensor in the sense that tensorflow does, is the same as using a tensor in terms of multilinear mappings in linear algebra.

grave frost
#

then a tensor is something that holds values, some kind of table or array

stone marlin
#

Yes, this is how it is implemented.

cobalt hearth
#

i always though that a scalar is a 1D array

grave frost
stone marlin
#

A scalar is a 0d array.

cobalt hearth
#

ok i might be confusing thing

#

i always though that the following is a scalar

grave frost
cobalt hearth
#

[[128], [1827]]

stone marlin
#

I strongly disagree that I am confusing concepts here, but we can agree to disagree if you'd rather.

grave frost
#

linear, or even non-linear mappings do not have to be vectors per-se. They are written as matrices, (multidimensional vectors, set of vectors, choose your pick) but they can be represented in other forms.

Tensor arised out of simply having a framework-agnostic array which we can backprop through. So in the CS realm, they aren't inspired by physics at all. They're just called that to differentiate from arrays, and vectors/matrices was already taken by many\

#

tensors are mappings, but they don't have to be. They can store any arbitrary information that even though can be translated to vector space, doesn't matter for our use (unless it can help us in some way, like PCA)

#

The reason tensors are different is because TF attaches a computation graph to those (which can be detached in pytorch) to keep track of ops for autograd to backpropogate. Which is why you have to convert to numpy for any other use outside its built-in methods

cobalt hearth
#

sorry for interrupting but can anyone explain to me what a 2D kernel filter is?

cobalt hearth
stone marlin
#

For the sake of everyone else here, I will say that we are on two different pages. I "get" what you are noting here, but in the specific case of ML and NN, as was above, tensors are made to be mutlilinear maps. If they were not, auto-differentiation would not make sense. I'm not sure where you get that the CS realm, as a whole, wasn't inspired by the (same structure) tensors as in the mathematics, but this is fine.

desert oar
#

yes, a tensor must be a multilinear map by definition

stone marlin
#

I don't think it is, I think that I'm saying that in this case we interpret them as multilinear maps --- I think they note that they are "actually" just arrays of arrays.

desert oar
#

it's like how every matrix is a linear mapping + a coordinate basis

#

just because you didn't have a linear map + basis in mind when you wrote the matrix, doesn't mean it doesn't imply the existence of one

stone marlin
#

Right. I am simply saying: there exists more structure, and to neglect that structure puts one at a disadvantage for understanding the process vectorally.

grave frost
stone marlin
#

If, for example, you had a "tensor" which was not a valid multilinear mapping (let's say [[0, 1], [1]]), does this still qualify as a tensor?

#

It is an array of arrays.

grave frost
#

because essentially, there is no linear mapping of [4] @desert oar but it still qualifies as a tensor

desert oar
#

sure it is

#

it's a linear map right?

grave frost
#

is it?

desert oar
#

[4] * [x] = [y]?

stone marlin
#

Yeah, that is a linear mapping. Trivial, but linear.

grave frost
#

yes, its a scalar then

hollow sentinel
#

dumb question but

desert oar
#

no, it's a 1x1 matrix

hollow sentinel
#

what does trivial mean in linear algebra

desert oar
#

4 is a scalar

stone marlin
#

It's not exactly a scalar.

hollow sentinel
#

just curious

cobalt hearth
desert oar
hollow sentinel
#

i see

stone marlin
#

Trivial just means "the base case", usually like, "0-" or "1-dimensional" versions of something. It's something mathematicians say a lot that's frustrating to everyone else. I do it a lot, ugh.

desert oar
#

"degenerate" might be a better use for something like 0-dimensional

hollow sentinel
#

ok bc i heard it in like a lot of strang’s lecture

#

i had a feeling it meant like simple or something

desert oar
#

meanwhile a "trivial" proof is one that is immediately obvious

grave frost
#

ok mb

stone marlin
#

It also may seem like we're using tensors in a slightly different way: in my mind, a tensor is a full-rank tensor (though I should qualify this!) and in the other case, it could be m-rank tensors, for m < n.

#

So, I think when I was saying tensor it may have been confusing, which I'll take the blame for.

grave frost
#

what I meant is that tensors are just numpy arrays with a computation graph attached to it. That's final. Period.

stone marlin
#

Is np.array([1], [2, 3], [4, 5, 6]) a tensor?

grave frost
desert oar
#

@stone marlin it's not a tensor, but it's convertible to a tf.Tensor

grave frost
#

you have to understand in the context of DL, tensors are just framework specific numpy arrays

desert oar
stone marlin
#

If you're familiar with NumPy, tensors are (kind of) like np.arrays. -- tensorflow docs.

grave frost
#

because its much more convenient internally to convert everything to one type for autograd, rather than sorting through the mess

grave frost
desert oar
#

but that's more general than what i said

hollow sentinel
#

man people are screwed in the lin alg spring sem course i’m gonna take

#

they are going all the way to markov processes and trying to apply lin alg to game theory

grave frost
#

Well, I don't know how tensors are used in other places 🤷‍♂️

stone marlin
#

They are used in nearly exactly the same way.

cobalt hearth
stone marlin
#

Except they must be well-defined as multilinear mappings.

hollow sentinel
#

good question

#

i don’t know the answer 😀

#

apparently there is a direct application

cobalt hearth
#

trying to proof his theories with an algorithm?

desert oar
#

if you can represent the solution to a problem as a system of linear equations, you can solve it with linear algebra 🙂

hollow sentinel
#

yep

#

same w linear programming

#

lmao we linked the same one

#

🤣

desert oar
#

hah yes

#

i wouldn't even call linear programming an application

hollow sentinel
#

oh

#

really?

#

i don’t know what linear programming is

cobalt hearth
#

oh this makes sense but i though you meant the youtuber game theory

desert oar
#

well, yeah i guess it is. but linear algebra more or less exists purely because we often need to solve systems of linear equations

hollow sentinel
#

right

#

row ops and stuff

desert oar
hollow sentinel
#

uhhh

#

i am not there yet lmao

#

i will be

stone marlin
#

Wait, can you actually do tf.Tensor(np.array([[1], [2,3], [4, 5, 6]])) and have it be valid? That would be really wild to me.

desert oar
#

imagine you have a light that randomly changes color between red, green, and blue. you can set up a matrix with the probability of transitioning from any state to any other state

cobalt hearth
#

i noticed something schools in general tend to keep teaching the same thing again and again until about 8th grade they will start flooding the student with new things

stone marlin
#

Markov is awesome, it's super legit.

desert oar
hollow sentinel
#

this class played right into my hands i mean i wanted to use lin alg for ml anyways

desert oar
stone marlin
#

Yeah, I'm not sure if these can actual be "tensors" qua tensors, as we were noting.

#

Which makes sense. I keep getting shape errors.

hollow sentinel
#

yep ^ to what salt rock lamp said

#

i’m up to A = LU

#

and i did like the inverse of a matrix product too

stone marlin
#

That's a good decomp method.

hollow sentinel
#

i think i’m in a much better spot than a lot of the other kids in the class

#

if all things go right

desert oar
stone marlin
#

I think the one that I use frequently is SVD. That one is pretty cool.

hollow sentinel
#

singular value decomposition?

#

strang mentioned it

desert oar
hollow sentinel
#

oh yeah i saw qr decomposition when i was looking up supplemental videos

desert oar
stone marlin
#

Oh, I meant that I don't think they can even be used as tensors in the tensorflow sense. Seems like you need to pad'em. Hm. Welp, either way.

#

I don't think I know Cholesky.

desert oar
hollow sentinel
#

ah eigenvectors and eigendecomposition

#

not there yet unfortunately but i will be soon

stone marlin
#

Yeah, I'm not great with tensorflow, so I just wanted to mess around a bit with it. Who knows.

#

Huh, Cholesky is pretty neat. What's that usually used for?

#

Oh, Kalman filters, okay, cool.

desert oar
#

i used it in my bayesian stats class to help mcmc converge

#

it turns out that you can reparameterize the gaussian distribution to use the cholesky decomposition of the covariance matrix instead of the original covariance matrix, which has better numerical properties... or something like that

hollow sentinel
#

bayesian stats

#

i should defo look at that

stone marlin
#

Huh, never heard of that. That's cool. I don't think I've ever checked convergence "in the math way" for MCMC models.

hollow sentinel
#

i did finish the stats course i have been watching for a while

desert oar
# hollow sentinel i should defo look at that

check out Statistical Rethinking. course lectures from 2017 and 2019 are on youtube, and the 2022 course is going on right now. the book is great too and worth buying, reading, and keeping on your bookshelf

hollow sentinel
#

so i’m revising stuff

desert oar
#

the book is Statistical Rethinking by McElreath

desert oar
# stone marlin Huh, never heard of that. That's cool. I don't think I've ever checked converg...
grave frost
#

except that's a ragged tensor I suppose

stone marlin
#

Interesting. I'll check those both out.

desert oar
#

oh yeah, another interesting thing we used in bayesian stats @stone marlin https://jakejing.github.io/posts/2021/08/blog-post-2/ this "LKJ distribution" apparently has good properties as a prior distribution for covariance matrices

stone marlin
#

Dang, definitely haven't heard of this one. Though, to be fair, my bayesian stats stuff is fairly weak relative to my other stuff.

desert oar
#

yeah i took a bayesian stats course w/ some people who were active early stan contributors

stone marlin
#

I feel like at every place I work there's always one person REALLY into bayesian stuff, and then everyone else is like "oh... neat."

desert oar
#

so i got exposed to a bunch of cool stuff that almost kind of went over my head

#

@hollow sentinel @stone marlin actually the stats server i am in just started a book club for Statistical Rethinking

#

i think you both should be able to DM me, i will reply with a link to join if you are interested

lapis sequoia
#

in england thats true

stone marlin
#

Unfortunately, I prob won't have time now. :'[ This is something I will be interested in in the future, though. I've gott'a focus on some devops + timeseries stuff.

desert oar
#

fair enough. the book club is pretty casual, you are welcome to listen in

warm raven
#

Hello everyone quick question, can I be guided to the best channel to ask questions regarding pandas?

warm raven
#

perfect

#

Okay do you mind if I send an image and ask my question

cobalt hearth
#

Perfection

cobalt hearth
stone marlin
#

Alright, let me send you a message then, salt rock, no harm in listening in.

desert oar
stone marlin
#

I think you'll have to add me as a friend or accept messages, Salt. But I'm in.

warm raven
#

Okay so i’m trying to identify recurring vs non recurring income. I want to basically add 1s and 0s to the pipeline data frame which will be used in another step in my processing

desert oar
#

screenshots are very difficult to help with

#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

desert oar
#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

desert oar
#

@warm raven ☝️

warm raven
#

ahh my bad

cobalt hearth
#

this is the best channel to talk in tbh its easy to keep up on one context and nobody is doing 1k messages per minute and its actually active

warm raven
stone marlin
#

Can you copy your code, so that we can debug it? Using the things salt noted above?

warm raven
#

yeah one second

#

,,, pipeline['rec'] = pipeline.apply(lambda x: '1' if (gfs[(gfs['prod_code_name'] == x['prod_code_name'])
& (gfs['PRODUCT_ID_MAP'] == x['PRODUCT_ID_MAP'])
& (gfs['Sector'] == x['Sector'])
&(((gfs['Product_Code'] == 'USF33') |
(gfs['Product_Code'] == 'USF34') |
(gfs['Product_Code'] == 'US756') |
(gfs['Product_Code'] == 'USF37') |
(gfs['Product_Code'] == 'USF40') |
(gfs['Product_Code'] == 'USF29') |
(gfs['Company_Code'] == 'US05') |
(gfs['Company_Code'] == 'US1B') |
(gfs['Company_Code'] == 'USM6')).any())
]).bool() else '0', axis=1)
,,,

cobalt hearth
#
pipeline['rec'] = pipeline.apply(lambda x: '1' if (gfs[(gfs['prod_code_name'] == x['prod_code_name'])
                                                        & (gfs['PRODUCT_ID_MAP'] == x['PRODUCT_ID_MAP']) 
                                                        & (gfs['Sector'] == x['Sector'])
                                                        &(((gfs['Product_Code'] == 'USF33') |
                                                          (gfs['Product_Code'] == 'USF34') |
                                                          (gfs['Product_Code'] == 'US756') |
                                                          (gfs['Product_Code'] == 'USF37') |
                                                          (gfs['Product_Code'] == 'USF40') |
                                                          (gfs['Product_Code'] == 'USF29') |
                                                          (gfs['Company_Code'] == 'US05') |
                                                          (gfs['Company_Code'] == 'US1B') |
                                                          (gfs['Company_Code'] == 'USM6')).any())
                                                        ]).bool() else '0', axis=1)
stone marlin
#

Close, the mark isn't a comma it's a backtick: it should be by the ~ key on your keyboard. But this is fine for now.

warm raven
#

thank you

#

sorry everyone lol

stone marlin
#

It's all good, we're all learnin'!

#

Is gfs a dataframe?

warm raven
#

Yes they both are

stone marlin
#

Okay, I'm gonna clean this up a bit too. Gimme a sec.

#

It's hard to test without the dfs, but you can simplify a lot of this by using something like:

relevant_product_codes = ["USF33", "USF34", "US756", "USF37", "USF40", "USF29", "US05", "US1B", "USM6"]
product_code_mask = gfs["Product_Code"].isin(relevant_product_codes).any()
#

There's a lot to keep track of here, so the more you can debug separately the better.

warm raven
#

Sure i understand what you’re saying

#

I can tell you for sure that I know there’s no problem with the product code name, ID or sector comparisons

#

I think the error is created by the series of booleans returned in the larger or statements used

stone marlin
#

Yeah, I think that's why I'd prefer the thing I posted above, since that'll give back a single boolean.

warm raven
#

the issue is, some of these are coming from the Product_Code column, and some of them are in the Company_Code column

stone marlin
#

Oh, I missed that.

#

Either way, similar deal:

#
relevant_product_codes = ["USF33", "USF34", "US756", "USF37", "USF40", "USF29"]
product_code_mask = gfs["Product_Code"].isin(relevant_product_codes).any()

relevant_company_codes = ["US05", "US1B", "USM6"]
company_code_mask = gfs["Company_Code"].isin(relevant_company_codes).any()

product_and_company_mask = product_code_mask & company_code_mask
warm raven
#

Okay one moment

cobalt hearth
#

im imagining the devs of opencv before they released it and they are working in computer vision for a company for example

#

company: "we want you to build a face recognition app for us"

#

the dev: "ok sure"

desert oar
#

it's possible that it was a research project first

#

a lot of libraries start as academic projects

warm raven
#

so @stone marlin

cobalt hearth
#

the company: "what are you goin to use?"

warm raven
#

I guess i’m a bit confused how to apply this to my pipeline dataframe

cobalt hearth
#

the dev: "of course not your crappy framework im gonna use mine"

cobalt hearth
warm raven
# warm raven I guess i’m a bit confused how to apply this to my pipeline dataframe

the pipeline dataframe has products, IDs and Sector that will have recurring revenue in the future. the gfs dataframe also has those products, IDs and sector. The gfs dataframe has the company and product codes which help identify that. You can go ahead and assume that besides those factors the dataframes are quite different. So my overall goal is just to be able to identify which products do have recurring revenue at the product-sector-id level, using the codes found in the gfs data, so that when I iterate through the pipeline data I can easily identify what would be recurring.

stone marlin
#

Okay, without going through the whole problem, the way I'd write this to be able to debug it a bit easier is:

def get_rec_value(x):
    pcn_mask = gfs["prod_code_name"] == x["prod_code_name"]
    pidmap_mask = gfs["PRODUCT_ID_MAP"] == x["PRODUCT_ID_MAP"]
    sector_mask = gfs["Sector"] == x["Sector"]
    
    relevant_product_codes = ["USF33", "USF34", "US756", "USF37", "USF40", "USF29"]
    product_code_mask = gfs["Product_Code"].isin(relevant_product_codes).any()

    relevant_company_codes = ["US05", "US1B", "USM6"]
    company_code_mask = gfs["Company_Code"].isin(relevant_company_codes).any()

    all_masks = (
        pcn_mask & pidmap_mask & sector_mask & product_code_mask & company_code_mask
    )

    # Wouldn't this just be `all_masks`?
    if gfs[all_masks].bool():
        return 1
    return 0

pipeline["rec"] = pipeline.apply(
    lambda x: get_rec_value(x)
    axis=1,
)
#

This is sort'a what I meant. Transferring this to a function means you can put print statements in and see what's the goings on, and it allows you to add stuff / take out stuff easier.

#

I'm not sure that this'll be perfect, you might still get an error, but you'll be able to see exactly what gets returned for each thing.

#

For that error, it's usually something is producing either a trivial series or a weird series we didn't expect. So the job is to find that.

warm raven
#

Okay see my understanding of the bitwise operators convinced me that it was just a series of trues and falses that the a.any() couldn’t handle

#

thank you

stone marlin
#

It's possible! I mess this stuff up constantly, which is why I write it all in this verbose format. Good luck!

serene crystal
#

hello everyone, I am an amateur programmer and I want to get into data a bit more, and I figure I could do that by analyzing my gameplay in a game called Brawlhalla. The game saves files that allow you to view previous matches, and I basically want to make a program that pulls some simple info from those matches (mainly the gamemode, the legend I was playing, the legend the other person was playing, and whether I won or loss, and maybe down the road things like when I took the most damage and from what). anyone have any idea on how I should get started on that? I'm not quite sure how to approach the problem

warm raven
stone marlin
#

Try printing out all of the "masks" in the function, and see what they return (for a small subset of your pipeline). If they're returning a series, that's bad news.

#

Pollo: This is a cool idea. First, it might be good to go to a general room and figure out how to pull in your data from the files programmatically. Once you have the data, you should come back here and tell us what you think you wanna do with it, and we can help you analyze it!

warm raven
#

I already tried dropping the index of my shortened pipeline dataframe

#

Okay actually i needed to sort_index not reset_index

stone marlin
#

Yeah, reset_index weirdly just makes the index a column, haha.

warm raven
#

now when putting it back in the function i’m seeing that there must be a nonetype value in one of the dfs

#

likely gfs, i’ll check

stone marlin
#

Nice, that's a good find.

fervent zenith
#

i've compiled these questions to perform EDA

#

what else should i add?

deft pollen
#

Which topical help chat does face detection and trig fall under?

fervent zenith
#

topical?

deft pollen
#

Trying to use the green triangles from the center to each face to determine the rotations of the face

deft pollen
#

The displayed Pitch and Yaw aren't correct.

fervent zenith
#

i think ML AI

deft pollen
#

I don't see a chat with Machine Learning in the name. I meant from the list to the left <<

#

"TOPICAL CHAT/HELP"

stone marlin
#

This is prob the chat for it, but I know nothing about CV, so I'll let someone else go at it. :']

cobalt hearth
#

i do can you # it for me?

deft pollen
#

Why do these topical channels phase in and out of existence?

#

I swear to God discord is quantum software. It refuses to operate within human expectation, and behaves in extradimensional ways to do whatever it likes. Kinda like my ex gf.

deft pollen
cobalt hearth
deft pollen
#

gimme a min

cobalt hearth
#

thnx

deft pollen
cobalt hearth
#

looks funny for some reason, thanks

deft pollen
#

np

#

probably a different frame. its a clip

cobalt hearth
#

btw when you werent here i had a question for you

deft pollen
#

fire away or dm it

cobalt hearth
#

aight ill dm it

deft pollen
#

k

warm raven
#

it took me awhile to figure that out, but it’s because that was the last thing I expected, it doesn’t really make sense to me at least

deft pollen
#

For anyone interested in my face project, this is my end goal ish
https://www.youtube.com/watch?v=20fbXJ8fayM

This is an augmented reality Photoshop plugin I made. I connected it to a Unity app with Vuforia so I could pull images out of the screen. The plugin uses Adobe Generator Core and Javascript to send each layer of the document to a Unity app with WebSockets (C#). Then we can display the images in AR by combining Vuforia's image tracking with ARki...

▶ Play video
hollow sentinel
#

man the math for log regression is making my head spin

deft pollen
#

[excuse the U-word.]

hollow sentinel
#

unity

#

even after watching this entire thing all i can discern is

#

lin reg wouldn't work w binary classification

#

sigmoid transformation is the best way to go

#

ooh nvm i figured it out

#

i basically have it down

#

so cost entropy is a metric used for logistic regression?

#

it basically combines a linear combination of variables and the bernoul dist so the y values can go from 0 to 1 which would be the logit and then the inverse of that is the sigmoid

#

so the equation is like a e^b0 + b1x1 + bnxn

deft pollen
#

Wait, you got it to work?

hollow sentinel
#

no no

#

just understanding the math behind it

#

wanted a general mathematical intuition behind how exactly it worked

novel acorn
#

Hello!

May someone help me with a problem?

I have a DataFrame of all 50 states of US and its abbreviations, and another DataFrame where I have a column called "Origin State" where some rows have the full name of the state and others have the abbreviation. So in order to better visualize this column I have to fix this. I want to replace the abbreviations with the full name of the state and I wanted to use the dataframe containing the names and abbreviations of the states using an if-else statement but I haven't been able to do it.

This is what I have

test["State"] = np.where(test["Origin State"] == states.abbreviation, states.state, test["Origin State"])

The image is an example of how the data looks

#

This is the dataframe of the states with its abbreviations

#

Is there a better way to do it?

#

I thought of comparing each row to the abbreviations and if it matched, to replace it for the full name of the state.

But I haven't been able to put that in code

prime hearth
#

hello, im trying to do feature selection using pearson corelation technique, but just to understnad do i want to remove features with high corelations not related to the label or target? Or do i want to keep features that have high corelatino with target or label this is for linear regression

#

thanks

novel acorn
#

lmao, it ended up being really easy, used this:

test["Origin State"].replace(list(states.abbreviation), list(states.state))
novel acorn
#

Hey, so I have a question.

I kind of know python and been working on it for a while, but now that I'm into ML I'd like to know how important is knowing about OOP, because I literally don't even know what a class is. Is it very important or just something to have a little knowledge about?
Obviously related to ML and DL

warped turtle
#

how can I do the equiv of df["a"] - df.loc[0, "a"] but with df.groupby("s")? groupby doesn't have .loc so unsure what to do here

lilac garden
#

you have to know well oop classes inheritance etc

grave moat
#

Hey can someone tell me whats the difference and similarities between ML algorithms and pattern recognition?

#

I am having these subjects and have to decide which one to pick.

#

I tried doing google search but didn't really understood it.

final field
#

anyone with m1 chip? and got successful with object detection?

hazy panther
#

Hello guys,
I am kind of new to ML, and I was wondering if maybe someone can help me with an issue, and maybe suggest some tools.
The situation: I have an EDF file where I am trying to identify the R wave -range-. I have the peak point value.
I was wondering:
How do you approach this situation?
I'll try to explain what i had in mind and I will be happy to get some suggestions, tools, recomendations on new tech (consider me a total beginner):

  • The first developer instinct of mine was to go and compare points - write an algorithm that might work for most situations (but there are a whole bunch of edge cases for that, for instance, when the EDF file is a bit fuzzy but enough to work on. when R waves aren't that bold. etc)
  • The second thought was "hey there must be an already trained model to identify peaks including ranges in an almost accurate way".
  • I'm tried googling some keywords but I got a lot of useless information and result. How do you usually approach this?
  • The last thing I thought of is to run all my EFD data in a model and get it to figure it out itself. yet it's not my comfort zone.
lapis sequoia
#

https://youtu.be/iyxqcS1u5go
Is this video good for getting idea of the maths used in machine learning ?

This video on Mathematics for Machine Learning will give you the foundation to understand the working of machine learning algorithms. You will learn linear algebra, statistics, probability, and calculus with hands-on demonstrations in Python.

🔥Free Machine Learning Course: https://www.simplilearn.com/learn-machine-learning-basics-skillup?utm_c...

▶ Play video
nova pollen
#

you can get an overview but you're not going to understand it unless you study the content properly

#

video looks good though, but again, to get a proper understanding you need to study the content

lapis sequoia
lapis sequoia
#

I'm personally learning lin alg from a course and will do the same for calculus

#

I'm reading a book which teaches those concepts aswell

lapis sequoia
#

Does anybody know any (relatively new) good tutorial how to setup a real time object detection with custom images? There are so many of them, but many are outdated and just dont work for me

light scroll
#

i saw a latest video about that on youtube

#

let me grab the link for you

#

@lapis sequoia

lapis sequoia
#

that would be cool

light scroll
#

it helped me, hope it helps you too @lapis sequoia

lapis sequoia
#

thanks, so everything still works? i know this video and i'm on line 1h30 and not sure if i will get another error and won't know what to do

frozen elbow
#

Hi guys, I was wondering if there's anyone who would be willing to give me an opinion or an advice on ML use cases on a little project I'm working on to try develop my skills. It's connected to Food delivery, order data, user data, restaurant data > to be used with python and machine learning. 🙂 Thanks in advance

unique flame
#

I wanted to create an object detection algorithm in python for a company to help with their monitoring duties. I spent a bunch of times preparing everything (I am not a computer science student so a lot of self learning) and how it would look like and they suddenly pulled out of the project. Pretty devasted...

wide helm
#

Machine Learning Related Question. I have a Generator which is suppose to output a 24x24 output is there way I can arrange the parameters to do do?

deft pollen
#

Facial Orientation detection, anyone? 😛

#

I do need a body/pose detector capable of multiple bodies, though.

#

I think this one can only do a single person

errant mango
#

Hi guys, we making a webscrapping project where we take the names and stacks from companies, but the datas comes differente from the source ex:
node.js, nodejs.
what do you guys usually do for that?

velvet ginkgo
#

does anyone know if there is a nlp focused discord group

dusk tide
#

Can anyone suggest good projects related to Deep learning for college . But it should not be too comman or too hard as I am still a beginner in DL.

velvet ginkgo
#

hey yall wanted to know if anyone knew how to use regular expressions for a hashtag word in between non hashtag words, for example:

What I am trying to do is keep the word learning but get rid of the other hashtags at the end

glacial talon
#

hello Anyone well with pandas here need a help

serene scaffold
desert oar
#

start there

#

then you basically want to wrap it in a match for <anything that isn't a hashtag> <whitespace> <hashtag> <whitespace> <anything that isn't a hashtag>

#

!e ```python
import re
text = "I love #learning. It's so fun #selflearner #selfstarter #learn"
pattern = r"(?:^|\s+)(?P<hashtag>#\S+)\b"
print(re.search(pattern, text)["hashtag"])

arctic wedgeBOT
#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

#learning
desert oar
#

this is getting a bit beyond what you'd normally want to do with regex btw

serene scaffold
#

you should use it for xml parsing instead

desert oar
#

i'd consider tokenizing first, then you can loop over 3-grams and look for (not-hashtag, hashtag, not-hashtag) triples

#

heh

serene scaffold
desert oar
#

!e ```python
import re
text = "I love #learning. It's so fun #selflearner #selfstarter #learn"
pattern = r"[^#\s]+\s+(?P<hashtagBetweenWords>#\S+)\s+[^#\s]+"
print(re.search(pattern, text)["hashtagBetweenWords"])

arctic wedgeBOT
#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

#learning.
desert oar
#

have to make some adjustments for puncutation though

#

[^#\s]+\s+(?P<hashtag>#\S+?)[^#\w]*\s+[^#\s]+ this is starting to get both messy and fragile

velvet ginkgo
#

ya it got pretty ugly for me as well

desert oar
#

!e compare to tokenizing:

import re
text = "I love #learning. It's so fun #selflearner #selfstarter #learn"
punctuation = re.compile(r"[.']")
whitespace = re.compile(r"\s+")
tokens = whitespace.split(punctuation.sub("", text))
for t1, t2, t3 in zip(tokens, tokens[1:], tokens[2:]):
    if not t1.startswith("#") and t2.startswith("#") and not t3.startswith("#"):
        print(t1, t2, t3)
arctic wedgeBOT
#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

love #learning Its
desert oar
#

of course you might also want to handle sentence boundaires etc

desert oar
#

obviously don't copy and paste without considering what it does and adapting it to your needs 🙂

#

also @velvet ginkgo i recommend https://regex101.com for experimenting with, testing, and debugging regex

regex101

Regular expression tester with syntax highlighting, explanation, cheat sheet for PHP/PCRE, Python, GO, JavaScript, Java. Features a regex quiz & library.

#

don't forget to select "python" mode

velvet ginkgo
velvet ginkgo
hollow sentinel
#

so bias is basically assumptions a model makes about data to make the target function easier to create

#

variance is how much the target function changes depending on the training data?

#

meaning if there is high variance... that means a model performs poorly given new data?

#

i'm trying to develop the terminology needed for support vector machine

#

it seems like a support vector machine is a combination of soft margins and cross validation

#

looks like R^3

#

if you used something like mass, height, age, and BP then it would be R^4

#

woah this is so cool i know these lin alg terms

#

so further than r^4, it's a hyperplane or a "flat affine subspace"

#

yeah uh i can see why people would be screwed if they tried to do this without basic lin alg knowledge

velvet ginkgo
velvet ginkgo
hollow sentinel
#

just watching that for now, will delve into the math behind the model after

velvet ginkgo
#

so the total error = variance + bias

desert oar
# hollow sentinel variance is how much the target function changes depending on the training data?

not quite. "variance" in this context is how much the model loss and/or predictions change in response to changes in the training data. "bias" in this context is the amount of wrong-ness that is embedded in the model.

for example, adding L2 regularization adds some bias by producing a deliberately suboptimal (underfitted) result, but tends to reduce the sensitivity of the model training process to random variation in how the training set is sampled or constructed.

desert oar
desert oar
#

it's important to realize that higher-dimensional space is "bigger" than 2d or 3d space, in that distances between points are disproportionately larger.
pro: it's easier to separate points when you have higher dimensions
con: it's harder to actually learn a function in higher dimensions, because there's more "empty space" between points, so you need more data points to fill in those gaps

desert oar
#

the two models are equivalent, but i think all the stuff about hard margin & introducing slack variables isn't much more than historical trivia nowadays

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @kind robin until <t:1643148728:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

hollow sentinel
#

oh actually

#

hinge loss is not hard to understand

odd meteor
lapis sequoia
#

Hello I'll be participating in a datawarehouse project shortly that consists of grabbing data from different sources and store it in a hadoop cluster. I've been looking into the technologies that I'll probably use and I have a doubt related to the data proccessing. I wonder if I should use pyspark which is way more faster than map reduce so that I can take advantage of cool features like resource management (yarn), hdfs from hadoop combined with a more efficient way of proccessing my data with pyspark

stone marlin
#

Someone just asked me for this, so I figured I'd share it too with minimal context. This picture (well, one very similar to it) was what made the Kernel Trick click for me.

#

Possibly relevant to the above discussion of adding dimensionality to allow for "nonlinear" splitting.

iron basalt
stone marlin
#

I never really think about something being "more likely" to be able to be cut with a hyperplane, though, so that prob wouldn't click in my head. Like, even if there was one more dimension but it was extremely small, we could still cut stuff up.

#

One example of the blessing of dimensionality phenomenon is linear separability of a random point from a large finite random set with high probability even if this set is exponentially large: the number of elements in this random set can grow exponentially with dimension. I think the relevant part of the "Blessing of Dimensionality."

#

I dunno if this would make dimension-lifting and stuff click for me, but it's def important to know and think about.

iron basalt
# stone marlin ```One example of the blessing of dimensionality phenomenon is linear separabili...

Also "common-sense heuristics based on the most straightforward methods "can yield results which are almost surely optimal" for high-dimensional problems." is very important. Because it means that simple solutions can work and simple solutions are often nice in that they can often be scaled up (also run fast), are more interpretable, and can often have other easily understood and desirable properties.

stone marlin
#

True, for dimensionality-lifting, this is also a good point. For the kernel trick, I think I really needed to see the "picture" of the kernel before I understood what adding that dimensionality "meant".

deft pollen
velvet ginkgo
azure orchid
#

Heyya I'm working on an Ai voice assistant

somber prism
#

guys i have a doubt , does keras image generator automatically uses prefetch(autotune) and cache or we have to separately make a function to do so ???

white saffron
#

what does this range_iterator instance means??

viral pier
arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @glass dagger until <t:1643191750:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

lapis sequoia
odd meteor
acoustic forge
#

Is there a rule of thumb of how big of a dataset you'd need to fine-tune a BERT(BART) model?

hollow sentinel
#
data.replace("NA", np.nan)
#
data.head(5)
#

i tried slipping in an "inplace = True" kwarg

#

but it won't let me

#
TypeError                                 Traceback (most recent call last)
<ipython-input-14-88eddfccfdb7> in <module>
----> 1 data.replace("NA", np.nan, inplace= True)

TypeError: replace() got an unexpected keyword argument 'inplace'
#

idk how to fix this

#

i've read the documentation for this as well

lapis sequoia
arctic wedgeBOT
#

@lapis sequoia :white_check_mark: Your eval job has completed with return code 0.

001 |   b
002 | 0  
lapis sequoia
#

works in mine, I'm not sure but are you sure its a dataframe?

#

Series.str.replace does not have inplace, my assumption is that your data is actually a Series

hollow sentinel
#

unless i have to swap it with pd.read_csv

#

in order for it to work

#

or dask just reads things as series

#

no no that should work

lapis sequoia
#

thats dask.dataframe.replace then(what is dask?)

hollow sentinel
#

scalable machine learning

#

i found that pandas was taking too long

lapis sequoia
hollow sentinel
#

to read the csv

#

even with modifications of chunk_size

#

another kwarg from the pandas library used to break the amt of dataframes a csv is split into

hollow sentinel
#

shit

#

i got doc'd

#

ok, so what is the workaround?

lapis sequoia
#

I'm not sure what they mean by that, i mean why is it in doc if dask doesnt support.

hollow sentinel
#

bc dask is dumb

#

and lacks a lot of the functionality pandas has

#

which means i'm dumb

#

for using dask in the first place

lapis sequoia
hollow sentinel
#

anywayssss

hollow sentinel
lapis sequoia
#

hm, well you can do something like above
df = df.replace(..) is equivalent in terms of output to inplace.

hollow sentinel
#

definitely, yes

#

but i can't do that unless it is read into a dataframe

#

correct?

lapis sequoia
#

so why not?

hollow sentinel
#

yeah but dask is not liking replace

#

at all

#

actually

#

shit pandas works

lapis sequoia
#

uhm well 😄

hollow sentinel
#

whoopsie

#

nope

#

pandas is reading it as a "Series"

#

no it is not

#

ok i feel dumb

lapis sequoia
#

its alright.

#

i'm heading for study, let me know if you need help in this later.

hollow sentinel
#

right i will be here asking for help anyways

#

so thank you sm

lapis sequoia
#

😄

hollow sentinel
#

there must be something off

#

syntatically

#

oh shit i think i figured it out

#

who knew basic pandas knowledge makes this stuff sm easier 🤣

desert oar
hollow sentinel
#

i am getting better at reading docs every day ngl

#

bc i read it when i learn new attributes and things

#

helps the problem solving tons

desert oar
#

and now you see why i tell people that reading docs is a skill that requires practice!

hollow sentinel
#

i think i'm really maturing as a programmer

#

and i think it's about time

desert oar
#

it sounds like it! i really like seeing this kind of progress

#

and it's great to acknowledge your own progress too. that way you don't burn out and feel defeated all the time

hollow sentinel
#

yeah i keep the vibes positive and i make sure to take breaks

#

walk away from the computer and exercise play video game what not

#

i think since summer bc of all the big changes i made in how i do things i made progress

#

and i'm just glad to see it pay off

#

ok, so i am trying to replace some NaN values in my dataset

#

this could be an option... i could use it where in the dataframe the value is NaN

lapis sequoia
#

so you want to replace NAs with empty strings?

hollow sentinel
#

i'm gonna try explaining it with the first 3 rows

lapis sequoia
#

alr.

hollow sentinel
#

i see that there is a NaN value in that third row

lapis sequoia
#

hm yep

hollow sentinel
#

i want to replace it with the string "Missing"

#

sorry can't spell

lapis sequoia
#

okay

#

so you mean it's not thinking NaN as NA?

hollow sentinel
#

it may just be a string value

lapis sequoia
#

!e

import numpy as np
import pandas as pd
df = pd.DataFrame([[np.nan, 2, np.nan, 0],
                   [3, 4, np.nan, 1],
                   [np.nan, np.nan, np.nan, np.nan],
                   [np.nan, 3, np.nan, 4]],
                  columns=list("ABCD"))
df = df.fillna('missing')
print(df)
hollow sentinel
#

and not actually an np.nan

#

oh so you can use .fillna

#

dang i thought wrong ig

arctic wedgeBOT
#

@lapis sequoia :white_check_mark: Your eval job has completed with return code 0.

001 |          A        B        C        D
002 | 0  missing      2.0  missing      0.0
003 | 1      3.0      4.0  missing      1.0
004 | 2  missing  missing  missing  missing
005 | 3  missing      3.0  missing      4.0
lapis sequoia
#

hm good.

hollow sentinel
#

ok, so i applied it to my code with

lapis sequoia
hollow sentinel
#

oh smart

#

just check string equality

#

yeah, this may not be working simply because it is not an actual NAN value

lapis sequoia
#

!e

import numpy as np
import pandas as pd
df = pd.DataFrame([[np.nan, 2, 'yo', 0],
                   [3, 4, np.nan, 1],
                   [np.nan, np.nan, np.nan, np.nan],
                   [np.nan, 3, np.nan, 4]],
                  columns=list("ABCD"))
df.B[df.B == 'yo'] = 'how'
print(df)
arctic wedgeBOT
#

@lapis sequoia :white_check_mark: Your eval job has completed with return code 0.

001 | <string>:8: SettingWithCopyWarning: 
002 | A value is trying to be set on a copy of a slice from a DataFrame
003 | 
004 | See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
005 |      A    B    C    D
006 | 0  NaN  2.0   yo  0.0
007 | 1  3.0  4.0  NaN  1.0
008 | 2  NaN  NaN  NaN  NaN
009 | 3  NaN  3.0  NaN  4.0
lapis sequoia
#

hm.

#

i loc may be needed.

hollow sentinel
#

iloc? or loc

lapis sequoia
#

!e

import numpy as np
import pandas as pd
df = pd.DataFrame([[np.nan, 2, 'yo', 0],
                   [3, 4, np.nan, 1],
                   [np.nan, np.nan, np.nan, np.nan],
                   [np.nan, 3, np.nan, 4]],
                  columns=list("ABCD"))
df[df.B == 'yo'].B = 'how'
print(df)
arctic wedgeBOT
#

@lapis sequoia :white_check_mark: Your eval job has completed with return code 0.

001 |      A    B    C    D
002 | 0  NaN  2.0   yo  0.0
003 | 1  3.0  4.0  NaN  1.0
004 | 2  NaN  NaN  NaN  NaN
005 | 3  NaN  3.0  NaN  4.0
hollow sentinel
#

loc would be for cols, no?

lapis sequoia
#

shit.

#

gimmi a sec.

hollow sentinel
#

maybe it's something like

#
data = data.loc["phone_num_forward_from].fillna("Missing")```
#

it could be getting screwed up because i am trying to apply it to the entire dataframe instead of that one specific col with .loc

#

i am also heavily debating if i even need to clean the NaNs as i don't see how it will be particularly useful

#

but i do have an idea that can somehow incorporate the values in this specific col and in order to do that i'd definitely need a clean col

lapis sequoia
#

!e

import numpy as np
import pandas as pd
df = pd.DataFrame([[np.nan, 2, 'yo', 0],
                   [3, 4, np.nan, 1],
                   [np.nan, np.nan, np.nan, np.nan],
                   [np.nan, 3, np.nan, 4]],
                  columns=list("ABCD"))
df.loc[df.C == 'yo', 'C'] = 'hi'
print(df)
arctic wedgeBOT
#

@lapis sequoia :white_check_mark: Your eval job has completed with return code 0.

001 |      A    B    C    D
002 | 0  NaN  2.0   hi  0.0
003 | 1  3.0  4.0  NaN  1.0
004 | 2  NaN  NaN  NaN  NaN
005 | 3  NaN  3.0  NaN  4.0
hollow sentinel
#

df.loc[df.specific_col_name, == "string_to_replace"] = "string_to_replace_w_"

#

will try rn

lapis sequoia
#
df.loc[condition, "col_name_where_new_value"] = "string_to_replacew"
#

moreover Series.replace will work too.

hollow sentinel
#

that was my next idea

lapis sequoia
#

its good to note that loc will give you independence of putting condition over any column.

hollow sentinel
#

right

lapis sequoia
#

for more weird tasks apply is always my GOTO.

hollow sentinel
#

yeah i am looking at the doc for .apply

#

"KeyError: 'cannot use a single bool to index into setitem'"

lapis sequoia
#

it's a good tool but it can make code long when things can be done simply.

hollow sentinel
#
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-38-d81d9dfe8788> in <module>
----> 1 data.loc["phone_num_forwarded_from" == "Nan"] = "Missing"
      2 
      3 data.head(5)
      4 
      5 #df.fillna("Missing")

~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
    668             key = com.apply_if_callable(key, self.obj)
    669         indexer = self._get_setitem_indexer(key)
--> 670         self._setitem_with_indexer(indexer, value)
    671 
    672     def _validate_key(self, key, axis: int):

~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
    869         else:
    870 
--> 871             indexer, missing = convert_missing_indexer(indexer)
    872 
    873             if missing:

~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py in convert_missing_indexer(indexer)
   2339 
   2340         if isinstance(indexer, bool):
-> 2341             raise KeyError("cannot use a single bool to index into setitem")
   2342         return indexer, True
   2343 

KeyError: 'cannot use a single bool to index into setitem'
lapis sequoia
#

also your condition is....uhm..

#

its like
1==2

hollow sentinel
#

oh

#

my bad sorry

#

brain got overwhelmed for a second

#

data.loc[df["phone_num_forwarded_from"] == "Nan"] = "Missing"

data.head(5)
lapis sequoia
#

still missing something

#

df.loc[condition, col_name] = new_val

hollow sentinel
#
df.loc[df.C == 'yo', 'C'] = 'hi'
print(df)
#

oh i just got confused for a second

full kayak
lapis sequoia
#
data.loc[df["phone_num_forwarded_from"] == "Nan", "phone_num_forwarded_from"] = "Missing"
full kayak
lapis sequoia
hollow sentinel
#

hm

#

i have an idea to check if it actually worked

desert oar
# full kayak

!paste it helps if you post text and not a screenshot. see below:

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

hollow sentinel
#

df[col].value_counts("Nan")

lapis sequoia
hollow sentinel
#

is there a pandas function that can count how many times a certain value appears in a col?

hollow sentinel
#

yep that was it

desert oar
#

!d pandas.Series.value_counts

arctic wedgeBOT
#

Series.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)```
Return a Series containing counts of unique values.

The resulting object will be in descending order so that the first element is the most frequently-occurring element. Excludes NA values by default.
desert oar
#

but might as well do series.value_counts().at[val] otherwise

hollow sentinel
#
id                          0.0
phone_num_from              0.0
phone_num_forwarded_from    0.0
is_blocked                  0.0
created_at                  0.0
dtype: float64
#

good

#
data.loc[data["phone_num_forwarded_from"] == "Nan"].sum()
#

code to produce that ^

#

seems like it worked

#

i think it might be a good idea to rename these col names too bc they are a hassle to type and i know pandas has something in the docs for it

#

my next plan of attack is going to be to deal with strings like these "2021-12-01 00:00:00", convert them to datetime format

#

and try to extract things out of them by reading the pandas doc for datetime stuff along w the schafer vid i watched

still delta
#

Where Can I find, UCY data set?

serene scaffold
hollow sentinel
#

it's a NaN

#

man

#

rip

lapis sequoia
#

you're too restless lol.

#

be patient.

hollow sentinel
#

yeah ik

#

i'll come back to this later today

#

just gotta take a break for a bit

serene scaffold
# hollow sentinel it's a NaN

if it's a NaN, not only will it not be equal to the string "Nan", but it won't be equal to literally anything, not even another NaN.

hollow sentinel
#

because you can't check equality with a NaN

#

whoops

serene scaffold
#

NaNs are just hard-coded to always return False in comparisons

hollow sentinel
#

i see

#

ok well i used .isna()

#

and it gave me a dataframe of 52486231 rows × 5 columns of bool values

serene scaffold
#

sounds right.

hollow sentinel
#

my next plan of action is to use data.fillna()

serene scaffold
#

remember to make sure if it's in-place or not.

hollow sentinel
#

in-place would make your changes permanent, right?

#

that's what i remember from the vids and the doc

serene scaffold
#

in-place is like list.append

#

but most pandas methods return an entirely new instance, and it's your job to write over the existing one, if you want that.

lapis sequoia
lapis sequoia
#

and inplace is kinda very less used as much I've seen.

#

people usually reassign.
which also gives us an advantage of having historical dfs(assuming dfs are not TOO big)

hollow sentinel
#

i saw a couple articles

#

saying to not use inplace in pandas

#
data.fillna({"phone_num_forwarded_from": "Missing"}, inplace = True)
#

much better

#

ok, i'm gonna come back later and do more data cleaning

#

ty sm for the help so far guys

serene scaffold
#

I never use in-place=True

hollow sentinel
#

hm

#

i think i can save it into a new df

#

with just df =df.fillna({"phone_num_forwarded_from": "Missing"})

#

i can restore the df somehow i forgot tho

serene scaffold
serene scaffold
hollow sentinel
hollow sentinel
#

i think so

#

i see your point

#

so just new_df = df.fillna({"phone_num_forwarded_from": "Missing"})

#

don't use the same variable name

lapis sequoia
#

which gives us historic benifit while using notebooks.

#

as it's more of a trial and error alot of times

serene scaffold
#

a new DF isn't created for each replaced NaN. One is created in which all the NaNs are replaced.

hollow sentinel
#

so i was wrong

#

the problem was that i used the same variable name

#

thank you

#

for pointing that out

hollow sentinel
serene scaffold
#

I actually encourage people to stay away from notebooks, but maybe I'm the angry old man of data science.

hollow sentinel
#

you are the angry old man of data science

#

description fits pretty well ngl

#

jk jk jk just banter

#

i'm curious as to what you recommend instead of notebooks too

#

and do you dislike google colab as it is just notebooks as well?

#

hmmmmmm this looks familiar

#

looks a ton like lin reg

hollow sentinel
#

i really don't like rapidminer

#

not that it has anything to do with notebooks

#

but seriously rapidminer sucks

serene scaffold
#

notebooks (almost actively) discourage code reusability and modularity

hollow sentinel
#

what's modularity

serene scaffold
#

look it up.

hollow sentinel
#

what does separating the functionality of a program into modules have to do with notebooks?

#

oh so basically combining things?

#

i think i understand now

serene scaffold
#

anything that you want the notebook to display for you has to be done in the global scope

lapis sequoia
serene scaffold
lapis sequoia
#

I mean as long as we understand how scoping and variable management works over there, we don't face issues.

serene scaffold
#

they shouldn't be taught as the default environment to beginners.

hollow sentinel
#

what would you recommend instead? pycharm?

lapis sequoia
serene scaffold
lapis sequoia
serene scaffold
#

you can also do notebooks in pycharm, if you want

hollow sentinel
#

i see

lapis sequoia
#

also in vscode(but my experience was horrible with notebooks in vscode.)

hollow sentinel
#

vscode never agreed w my mac

#

i don't know why it just would never cooperate with me

modest shuttle
#

Hello,
I want to learn computer vision, where to start? OpenCV or TensorFlow?

hollow sentinel
#

i need some help with time series classification

#

i'm not sure if this could work

desert oar
hollow sentinel
#

so i'm basically forced to use tensorflow or keras

desert oar
#

not necessarily! but you might want to consider something like a 1-dimensional CNN

#

that, or you might want to find other ways to encode a time series as a vector

hollow sentinel
#

i was thinking time series regression

#

it seems like i'm dealing with univariate time series data

#

the thing is i cannot do linear regression with this as this is a classification problem

#

so maybe logistic regression is the way to go

desert oar
hollow sentinel
#

the encoder-decoder LSTM?

desert oar
#

you could use the encoder to produce "time series vectors"

hollow sentinel
#

sounds like this

desert oar
#

interesting

#

actually it looks like you can use LSTM layers directly in classification models

desert oar
hollow sentinel
#

what's the diff

desert oar
#

forecasting means trying to predict the future values of a time series

hollow sentinel
#

forecasting is future

#

yep

#

my b

desert oar
#

classification means predicting a label for an entire time series

hollow sentinel
#

i see