#data-science-and-ml

1 messages Β· Page 77 of 1

desert oar
#

didn't know about either, interesting

#

i remember seeing a blog post or something about it related to pymc3

iron basalt
#

The "Aesara" seems to be the pymc people.

desert oar
#

it might even have been pymc3 devs trying to keep the project alive in maintenance mode, at least enough to support pymc3 itself

#

i see, that tracks

fallow frost
#

I'll try this and see if its necessary, thank you

iron basalt
desert oar
#

iirc theano was more "lower level", more like numpy or jax than pytorch or tensorflow

#

might be interesting to see how pytensor stacks up against the newer frameworks

iron basalt
#

"Implements an extensible graph transpilation framework that currently provides compilation via C, JAX, and Numba."

#

Kind of neat to see that someone out there takes these seemingly dead projects and runs with them, at least in spirit.

desert oar
#

oh very interesting

#

lots of frameworks now all at various levels of abstraction, some python-specific and some general

fallow frost
#

I got 5 dataframes with the exact same shape, but the values are just sligtly different, how can I get the average of each 5 values? (the output should be another dataframe with the values changed)

#

basically each benchmark generates a dataframe, and I want to display the average

left tartan
#

Simplest is to concat then avg

left tartan
#

Or generate a series by combining the 5 columns and computing average over that.

#

Generally, what I’d do is create a single dataframe with all 5 tests: add a test column, and then do whatever I want with that. That’s the most natural

burnt saffron
dusk tide
#

Hi ,I am going through SpaceShip titanic competition notebook https://www.kaggle.com/code/samuelcortinhas/spaceship-titanic-a-complete-guide/notebook on Kaggle and having a doubt . While visualizing missing values of each row , the person has made this bar chart and wrote the Note/inference as Missing values are independent of the target and for the most part are isolated. What does this mean and what happens if the missing values are dependent on target class?

carmine mason
#

has anyone here worked with Plackett–Burman designs before, because I have some questions

tough radish
#

Hello everyone! πŸ™‚ I am new here and I was wondering if anybody knows where I can find some cool projects for beginners where I can start and learn coding from scratch? I've already gone through some data science courses on DataCamp but I need to process all the input I got there and really practise 🫑

tidal bough
#

(huh, or do you want data science practice specifically, rather than programming in general?)

tough radish
#

Thank you! πŸ™‚ programming in general is fine too! I will enroll for a BA in Data Science next year so I also looking for Data Science projects to be prepared but I want to learn programming in general aswell so that's perfectly fine!

lapis sequoia
#

Anyone implemented a transformer architecture from scratch?

full herald
#

Hi Data Gangs ❗ ❗ ❗

#

If you want to start your Data engineering learning journey and get your hands dirty , here is a roamdap for you

#

Join Our Data Tech Community for Data Engineers & Cloud Engineers

errant bison
#

how to deploy a deeplearning model from google colab? I want it to run on pyqt5, but do i need to have pkl file or how can i do so?

raw rapids
#

does keras's ImageDataGenerator only work for classification tasks or image to image translation tasks as well?

serene scaffold
errant bison
serene scaffold
slim bone
#

Bit of a weird question - I've been learning Pytorch up until now and it appears that most of the books in ML are in TF rather than Pytorch. How difficult is the transition between the two?

serene scaffold
errant bison
serene scaffold
cold osprey
errant bison
serene scaffold
slim bone
errant bison
serene scaffold
errant bison
slim bone
lapis sequoia
mellow grove
#

Anybody able to help me out here? I am in need of some code examples of how to take a value from a Shiny user interface (pull-down menu selection for example) and pass that into another body of code in another .py file I have already created to do data analytics with.

An example use case is that I have a Python program called policy_analytics.py that does analysis for an insurance company. The first part of my code runs an SQL query against some tables in Snowflake, and in my WHERE clause, I can filter on policies in a certain state. I have a Shiny application running where I have a "Select State" ui.input_select line of code that allows the user to select which state that the query filters on. How can I pass that state in as a string to my policy_analytics.py file to use when building the query for the desired analysis to be done?

covert vale
#

Hello everyone,

This package I created contains multiple abbreviated solutions for multiple sections. For EDA, I created a section where categorical and numeric data are analyzed separately and graphs are drawn according to different scenarios (Boxplot, barchart etc.). Apart from that, the best estimator selection I created for the predict part, which is the last step of the model created with the pipeline object, is hyper param contains the function whose optimization(optuna) is applied in a single line. I recommend you take a look. I published it after trying for 7-8 different cases. Never hesitate to submit bugs or ideas. Thanks.

https://github.com/kaansnmez/lazyauto

GitHub

lazyauto. Contribute to kaansnmez/lazyauto development by creating an account on GitHub.

hollow sparrow
#

Any good offline text to speech out there with some reasonable voices? I tried mimic and its good enough to make an intentional robotic voice, which works fine for the droid I'm building, but are there any more of the more realistic models anywhere?

dusky magnet
hollow sparrow
#

I see, hmm I am Swedish tho, maybe the german one will be close enough x)

late shell
#

Hello, I'm trying to use the llama-2-7b-chat-ggml, 8-bit quantized model by TheBloke (huggingface). I have 33.6 gb ram and Nvidia 1080Ti . But the model is extremely slow. I'm off loading 20 layers to the gpu (gpu_layer=20), but it still takes around 4-5 mins to generate a response and sometimes even hangs indefinitely before I kill it after 15 mins. I know it shouldn't be taking so long. Can someone please help me with this. My prompt looks something like this:

Use the given question and context to generate a detailed, authentic description about the machine. Make it sound as if you are a great salesman and are pitching this machine to a potential buyer. Use good formatting and the description should not be too long (About 200 words only). Try to make it as easy to read as possible. Most importantly, you must include all the information provided under the context in the description that you generate. Do not make up new information. It's a pre owned machine, therefore the description should not be like the launch of a new product.

Generate a description of the machine using the information provided under the Context.
 
Context: 
categoryName: Post Press
subcategoryName: Saddle Stitcher
subsubcategoryName: Conveyor belt
manufacturerName: Monotype
Year: 2001
MachineModelName: Boston Double Head Stitching
Location: Germany
Info: DOUBLE HEAD STITCHING MACHINE BOSTON
2 HEAD FLAT AND SADDLE STITCHING MACHINE
DOUBLE WIRE
lapis sequoia
# hollow sparrow Any good offline text to speech out there with some reasonable voices? I tried m...

Hi, there are few open source models for TTS which you could refer, the good voice cloning/generating ones are restrictive to use as lot of ethical issues comes into play. My personal favourites are Speech T5 & TacoTron2, T5 was pretrained for 6 different tasks hence I found it quite generalised, only downside I found is 600 token length limit as input.
Apart from them, there are few newer models which are based on GenAI like Valle, tortoise.

south edge
#

i have a doubt in this pic

young granite
south edge
#

are the summing junction and activation function considered as hidden layers?

young granite
#

depends on whom u ask i guess all between input and output is hidden layer for me

south edge
#

oh

#

and why are there so many weights

young granite
#

?

south edge
#

even though they give the same output

#

in this pic

young granite
#

normally u would combine this into one node

#

so to clarify myself a bit u see green, red and yellow -> 3 nodes

south edge
#

we would get the same output if we make it as a single node right?

young granite
#

this is just a more detailed schematic of something like this

south edge
#

no i mean

#

not the nodes

#

i have confused myself a little bit

#

the picture above is simliar to the picture i uploaded before right

#

and both of them would give the same output either way

#

then why make it complex in the first picture

#

or is it any method to make our computation faster

young granite
#

those are simply example pictures to give u an idea of what happens inside the NN

heady fulcrum
#

Hello everyone
I'm new here

And I just started learning python
I'm open to network and learn more

rough mural
#

anyone with some experience in generative ai

#

need some help

young granite
rough mural
#

ok

#

i have to geneerate some outfit after getting some result from dataset

#

whic is then given to the dall e for image generation

#

but this takes time and a lottt of memory

#

! pip install min-dalle -q

#

from min_dalle import MinDalle
model = MinDalle(is_mega=True, is_reusable=True)

#

seed = 6
grid_size = 1
display(model.generate_image(prompt, seed, grid_size))

#

this is the code that i am using this is open source but is very heavy on my laptop

#

any suggeston on how can i make it faster.

mild dirge
#

make sure it runs on gpu

#

or find a smaller model

rough mural
#

it is

mild dirge
#

Well, that is pretty much all you can do

rough mural
#

it takes aroung 10gigs og gpu space

#

any method in which i can save the progress

#

and then run the last line only

lapis sequoia
#

hello, does anyone know how to make images like this for my model?

rough mural
#

i can help

#

but what will be the final image

#

@lapis sequoia the image on the top right or the bottom

lapis sequoia
#

the architecture one, left with the backbone and the layers

#

i want to draw one for my model

rough mural
#

ok

#

can you expalin a little further

lapis sequoia
#

i made a model i need to explain to people and i want a tool to make something like this

mild dirge
#

probably made with tikz library in tex

#

Made similar sketches with it

lapis sequoia
#

i found many tools like this but they make a detailled architecture like the one you sent but the one i sent is pretty simple and many people have it so i thought its a tool i just don't know

mild dirge
#

Not sure if there is a no-effort solution. If you find one, sure let me know though

lapis sequoia
#

thank you anyway

mild dirge
#

That tikz library is quite a pain to use, so if there is an easy solution that would be great πŸ˜›

mild dirge
silk cipher
#

hey everybody, just a question, how do I load my own dataset, like I made one in JSON and it looks like this:

{"dataset":
  {
  "input": "some input",
  "output": ["outputs"]
  }, ...}```
how do I load this in pytorch
#

also it's my first time trying this with a custom dataset that i made, so i need to know how to process it and make it ingestable by the CRF model I'm trying to make

mild dirge
#

If it's not in a standard format, you make a custom dataset

#

In that you could just use json.load() or whatever to get the data, and then put it into tensors

#

You also need to look at what format the model takes the data

silk cipher
mild dirge
#

I haven't used a CRF myself, so I wouldn't know those specifics

#

That is more model dependent though I think

silk cipher
#

hey here are the docs of torchCRF

>>> seq_length = 3  # maximum sequence length in a batch
>>> batch_size = 2  # number of samples in the batch
>>> emissions = torch.randn(seq_length, batch_size, num_tags)
>>> tags = torch.tensor([
...   [0, 1], [2, 4], [3, 1]
... ], dtype=torch.long)  # (seq_length, batch_size)
>>> model(emissions, tags)
tensor(-12.7431, grad_fn=<SumBackward0>)```
does this help?
#

the docs

quartz wigeon
#

is there a way to teach an agent not to take an invalid action in reinforcement learning? I'm using stable-baselines3.

silk cipher
#

what do you consider as an invalid action

mild dirge
lapis sequoia
pastel cedar
#

Hi, guys

#

any1 here know about codebasics resume C7?

#

if yes hif any1 like to work on that then ping me

zealous hollow
#

i am using STL approach for a time series
i think this is wrong right?

#

trend past seems right
right?

quartz wigeon
quartz wigeon
lapis sequoia
#

i don't know what sb3 is, I had to write Deep Qlearning with python for my RL class so I don't know

timid kiln
#

In you guys' opinion, what would you say is the most widely used geoprocessing/plotting-type python library?

south edge
#

matplotlib

timid kiln
#

Reason I ask is I've headed down this road of putting data on maps and folium doesn't seem to have a lot of support out there.

timid kiln
# south edge matplotlib

Really? For maps? I had no idea. I just thought it was more for plotting for data analysis/spreadsheet type stuff.

south edge
#

oops

timid kiln
#

lol

south edge
#

i thought you asked me about plotting

#

im sorry lol

timid kiln
#

No worries. πŸ™‚

lapis sequoia
#

i tried leaflet

#

its good but in js

lapis sequoia
#

Heyyy, I have learns the basics of python but I'm specially intrested in data science and AI, does anyone know a good book / course to start on this topic??

west grail
lapis sequoia
#

thanks, will do

zealous hollow
#

got a question relating to time series forecasting

data = signal(trend+seasonality) + noise

what are some methods that are used to forecast the signal
like one is simple extrapolation

#

also can we try forecasting the noise as well?

wooden sail
#

due to how continuous probability distributions work, the probability of a single event is 0

hard shoal
#

Hi,guys

Where can I download datasets other than kaggle and the UCI Machine Learning Repository..?

zealous hollow
#

real life problem πŸ˜”
time series analysis and forecasting

rough mural
#

i need to make an AI chatbot for fashion recommendation can anyone help me what to use

#

or anyone has some previous code that'll help me

bronze vessel
#

I wanna be datascience dev but i dont know about hat

#

that*

vestal widget
#

If i want to train an existed language model for the use of conversation chatbot, should i use embedding or finetuning?

lusty raptor
#

Hey guys
I need unique machine learning project ideas that use cnn or NLP
I need to make a solo project for my course

brittle knoll
#

anyone has ideas to make a presentation more interactive and fun with AI

fluid spindle
#

Hey, I was thinking of creating a loop counter to preprocess MNIST

#

Like it should return 1 for handwritten "0" , "6", and "9"; 2 for "8" and 0 for others

fluid spindle
#

here's sample of it

agile cobalt
#

but some people do use a loop for 2

#

from your very example

fluid spindle
#

yes

#

i think preprocessing with it would swing both ways huh

agile cobalt
#

with deep learning it's oftentimes better to not overengineer features - convolutional layers could identify loops, and the network itself may count them if it deems that information useful

fluid spindle
#

this is the confusion matrix, maybe I could use it exclusively to recognize 8s

#

or rather to eliminate FP 8s

mild dirge
#

what are the values of that confusion matrix?

fluid spindle
#

rows are actual labels and columns are predictions, SGD classifier used

mild dirge
#

What is the model then? Because you were talking about a loop counter

mild dirge
fluid spindle
#

yes, I just want to reduce the FPR for predicted 8s by running a loop counter

#

I seem to need to find whether there're two closed areas (usually roughly circular) for it

#

ignoring there are a bunch of 8s not fully closed

mild dirge
#

I don't really see the point of this, as that is exactly what a CNN would try to do. If you want to go that route you can make custom convolutional kernels, but why bother?

fluid spindle
#

upgrading the model

#

or gathering more 8s to train on

#

dunno, just brainstorming

mild dirge
#

You have tried using a CNN on the data, and it gets 8s wrong?

fluid spindle
#

I haven't, am tracking a book, it gonna tackle on neural networks in chapter 2

#

anyways, thanks for the insight, am still learning what or whatnot would be appropriate in given cases

fringe vector
#

nachoPray ty

fluid spindle
#

ofc silly

hard shoal
#

because I need a case study to improve my skills

desert oar
#

that's really what kicked off the deep learning revolution imo, without CNN-on-MNIST we wouldn't have ChatGPT

vestal widget
#

If i want to train an existed language model for the use of conversation chatbot, should i use embedding or finetuning?

serene scaffold
#

"fine tuning" is the process of continuing to train an existing model, so if you are going to train an existing language model, that is necessarily fine tuning.

potent sky
#

has anyone figured out a way to add GPU support for GPT4All models? Or any other consumer grade models?

past meteor
potent sky
#

I've made a mini internet-connected chatbot for a hobby project with langchain, a vector db...the whole thing
but the inference gets pretty slow especially with increased context size
so I'm looking to run it on GPU

past meteor
#

Working with those methods also gives you an appreciation for the challenges that image recognition has.

potent sky
wooden sail
#

on the other hand, the post-CNN ages are also dark, just for different reasons

#

your task is now successful, but now you don't understand why

civic elm
#

How can Chatgpt survive? It has no monetization except the paid api. Can't go to enterprise because of data sensitivity, Can't have ads in the chat ui, has lawsuits to fight.. etc..

past meteor
#

The paid version of GPT is a lot better than the free tier. Several orders of magnitude

#

On top of that, many companies are building services that use GPT. I went to an AI "conference" a few months ago and that seemed to be the hype thing

#

Many of the things they were doing were basic (conversational) information retrieval but the GPT API makes doing that a lot easier than whatever topic modelling people were doing a decade ago. It's at the level at which software engineers can make a solution in a couple of days. The quality is a different discussion though....

young granite
#

Does anyone know of an open case study for ML applications?
Its for an job application so < 8h in total.

past meteor
#

But I'm not fully sure what your question is because there's a lot of degrees of freedom. Is it an application of AI or an application with AI

young granite
iron basalt
serene scaffold
strange elbowBOT
#

BuT ChatgpT Is COnSTantlY imPrOViNg supeRlINEArly

iron basalt
#

25,000 GPUs not enough.

#

Who would have guessed that dense operations and backpropagation don't scale in terms of performance per watt. >.>

#

I wonder why the brain has sparse activity...

serene scaffold
past meteor
#

But they come out a bit mangled

iron basalt
serene scaffold
iron basalt
#
  • Sparsity also untangles different things, which is needed, and dense networks end up learning this, but don't get the performance / energy benefit because they are still touching all the values / not branching.
misty flint
#

like a fever dream

dense crane
#

what is more recomended for pytorch DataParallel or DistributedDataParallel ?

#

i was testing the first one and i speed up the training by 5% which is good but not satysfying can 2nd faster?

late shell
urban knoll
#

I am searching for any 80 class .pt models compatible with pytorch --version 1.3.1(Python 2.7) on the internet. Thought I could ask on here as well. Any leads would be helpful.

verbal venture
#

can someone explain what this math means? Is that the equation when y=1, y=0 etc.

verbal venture
#

why that mode of python specifically?

#

@urban knoll sorry coco

small wedge
urban knoll
verbal venture
#

look up to see if coco works

#

if not find the year 2.7 came out and then look up 'x year pytorch dataset' or whatever

steady nacelle
#

What legit roadmap should I follow for landing a machine learning engineer entry position?

agile cobalt
#

a degree would be a rather safe choice

small wedge
#

can you send the full traceback please? also you can use code formatting to prevent discord from making silly mistakes like the thumbs down

#

!code

arctic wedgeBOT
#
Formatting code on discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

small wedge
#

looks like you are passing a VideoCapture object to cv2.rectangle

#

you used cap.read() to extract the frame later in the code

#

well that will create a tuple of the ret and the frame

#

I'm not super familiar with using cv2 in this way so I'd just assume you need to extract the frame like you did but earlier and pass it. What exactly is cv2.rectangle intended to be doing here?

#

yeah I think you might want the frame

#

I assume it returns something? either that or if it modifies you will need to make sure you use the same frame object when you try to imshow i.e. you could just move all this down to your while loop

lusty lotus
#

what does the video mean by "the direction of the negative gradient"? i thought lines w/ positive gradient was like / and negative gradients like \ but why in the video he's pointing at a / and moves his cursor downwards? isn't that positive gradient?
https://youtu.be/8d6jf7s6_Qs?t=169

signal lintel
#

i need images like that Perceptive transformed to normal view

# https://github.com/darklab8/darklab_peasant/blob/a383b5ee02a5a645bede4df53c1aaa572c7a1236/peasant/captch_solver.py#L90
pts1 = np.float32([left_top_corner,left_bottom_corner,right_top_corner,right_bottom_corner]) # type: ignore
pts2 = np.float32([[22,11],[22,33],[160,11],[160,33]]) # type: ignore
M = cv2.getPerspectiveTransform(pts1,pts2)
dst = cv2.warpPerspective(out,M,(200,50))
out2 = dst.copy()

in order for them to be properly recognized by Tessearct

I found it is possible to perceptive transform with this code, problem is in identifiying corners of a text. my current algorithm to identify corners sucks.
Could someone help me to do that πŸ™ˆ it is an open source project, made for pet project to register for a queue in updating passport πŸ˜„

https://github.com/darklab8/darklab_peasant/blob/a383b5ee02a5a645bede4df53c1aaa572c7a1236/peasant/captch_solver_tests.py#L24
I wrote expected tests too

πŸ™ˆ in general very accurately wrote the project. (hopefully)
mypy'ed everything
unit tested
as last step going to have it deployed into AWS as event bridged croned lambda to be servrless deployed
u would be a very great contributor if u helped with it. Could be part of portfolio if interested

arctic wedgeBOT
#

peasant/captch_solver.py line 90

# Perceptive transform```
`peasant/captch_solver_tests.py` line 24
```py
def test_captcha(img_num: int, expected: int) -> None:```
small wedge
#

i.e. we subtract the gradient from it (it being whatever we're optimizing, weights in the case of nn)

lusty lotus
small wedge
#

weights - learning_rate * gradient == weights + learning_rate * -1 * gradient

#

it's just a bit of a confusing way to say it

#

but people refer to it as the negative of the gradient fairly commonly, at least in my experience

lusty lotus
#

wtf

#

now it's worse

small wedge
#

XD

small wedge
#

how do we obtain the gradient, is that what you're asking?

lusty lotus
#

well considering the fact that im watching that video, which is "for dummies" im alr struggling with shit so minimum math jargon pls

small wedge
#

do you know any calculus? this is a type of function optimization that uses derivatives

lusty lotus
#

but idk how any of that ties in with nn, all i know is "oh you need calc for nn" and all that but idk anything outside of this

small wedge
#

ah the power rule my beloved XD. all that really matters is that you understand what a derivative is, it's a (function that describes a) rate of change. Using that rate of change for a function, we can tell how changing a variable in the function will effect it's output

#

we are optimizing the cost function, so we are trying to change all the inputs to that function (weights and biases) to lower the output of that function

lusty lotus
#

(hmm it seems the screenshot is not showing completely, press on the imgs lol)

lusty lotus
#

also, the idea of "MSE" is very arbitrary to me. why not just like an "absolute error" as a criterion? a bit like the MAE? wouldn't squaring it like scale it exponentially?

small wedge
small wedge
lusty lotus
lusty lotus
#

or a M x^1000 E criteron lmao

small wedge
lusty lotus
wooden sail
#

but the answer is "the cost function depends on what you know about the data model and its statistics"

#

if you don't know anything about that, you cannot choose a cost function and claim it is optimal

lusty lotus
#

i see

wooden sail
#

for IID additive gaussian distributed noise with equal variance per sample, MSE is the maximum likelihood estimator which yields asymptotic efficiency and unbiasedness

#

that's about as nice as an estimator can get

#

if your data doesn't follow those properties, there is no special reason to use the MSE

lusty lotus
#

like for a line x^2, any points on the line will have dy/dx = 2x right?

wooden sail
#

which depends on the value of x

lusty lotus
#

OH WAIT do i have to sub in the x-values into 2x? then it makes the gradients different? shitshitshit

wooden sail
#

there is a different tangent line at every point on the curve

lusty lotus
wooden sail
#

gradient descent works on the principle of making a tangent line at a point on the original function, following this linear approximation for a bit, and hoping it yields an improvement. if not, correct it

#

for nicely behaved functions, if the step size is small enough, this method is guaranteed to work (locally)

lusty lotus
wooden sail
#

that idk, i didn't check the numbers in your images

lusty lotus
wooden sail
#

i think you need to take a step back, i see the problem now

#

the questions you're asking say you haven't studied linalg and optimization

#

without that, none of this will ever make sense

#

yes, for differentiable functions evaluated in an open interval, all candidate extrema have a gradient of 0

#

that's a necessary but not sufficient condition

#

(if the interval is closed, the boundaries can also be candidates regardless of the gradient, but this usually involves constrained optimization anyway)

#

i recommend boyd's convex optimization books

small wedge
#

https://youtu.be/hfMk-kjRv4c?t=909 I agree that you need to study this stuff to understand the math, but I think you can intuit what is happening with the optimization here easily if you have just a bit of calculus knowledge

lusty lotus
#

is all of that needed? :/

small wedge
#

3b1b's 4 part series might also help you intuit it with what math you already have

wooden sail
lusty lotus
wooden sail
#

if you only want to use it, you can just memorize the rules

#

you might remember them from your HS or early uni calculus courses

lusty lotus
wooden sail
#

extrema when the derivative is 0, and then you check whether it's a maximum or minimum by testing the 2nd derivative

lusty lotus
#

i dont think the course has covered it yet though

wooden sail
#

do you already have a feel for what a derivative means?

lusty lotus
#

so that's all i have in mind

#

i know the product rule (sounds important) but idk what that has to do with anything

wooden sail
#

ok. imagine a line that touches the original curve only at 1 point

#

the slope of that line is the gradient

#

what it tells you is how quickly the function is increasing or decreasing at that point

#

a function that is very steep will have a large positive derivative

#

if the derivative is 0, there is no increase or decrease

lusty lotus
wooden sail
#

there isn't

lusty lotus
wooden sail
lusty lotus
wooden sail
#

and hence not the derivative

#

yeah, one sec

#

now think to your daily life

#

if you throw something upwards, that thing will trace a parabola

#

it'll go up, then come back down

#

that's the position of the object tracing a parabola

lusty lotus
#

sure

wooden sail
#

the important thing is that, in the transition from the object going up, to when the object is coming down, there is a point where the object's speed is 0

#

it starts with a large positive speed

#

that speed slowly decreases

#

then it becomes 0

#

then the speed becomes increasingly negative, and the object comes back down

#

this is your intuition as to why the derivative being zero is important

#

if the speed is positive and then it's negative, and it is changing "smoothly", it has to pass through zero

#

and it does so at the point where the speed changes sign from negative to positive or backwards

#

that point is a maximum or a minimum (in 1 dimension, at least. not generally true in more dimensions)

wooden sail
lusty lotus
#

so like up -> gets slower -> 0 -> negative velocity

wooden sail
#

yeah

#

here i drew it upside down

wooden sail
#

in the previous example, the speed (the derivative of the position with respect to time) is 0 exactly at the apex, the highest point the object reaches before it comes back down

#

that's the general idea

lusty lotus
wooden sail
#

hm?

lusty lotus
#

like x^2, x can be any value of +/-ve number?

#

as long as it's real right

wooden sail
#

mhm

lusty lotus
#

then why did you say min when x =0

wooden sail
#

that's why i was careful to tell you i drew it upside down

#

if you throw something upwards, it traces the curve -x^2

#

this is maximal at x = 0

#

in the drawing i made i drew x^2. this is minimal at x = 0

lusty lotus
#

right, ive got the graph

wooden sail
#

right. what's your question?

lusty lotus
#

surely 1 > 0?

wooden sail
#

x is not maximal

#

the function is

lusty lotus
#

f(x)? y? i see

wooden sail
#

what we found is the value of x, that when we put it into f(x), makes f(x) maximal

lusty lotus
#

i get it now

#

so we're not referring to x when maxor min just the func of it right?

wooden sail
#

.latex that's why we usually write this as $[
\text{arg} \min_x f(x)
]$

#

huh

wooden sail
#

we actually usually don't care about the max, just the argmax

#

(or argmin)

lusty lotus
wooden sail
#

arg meaning argument

#

like in code

#

x is the argument of f(x)

lusty lotus
#

as in argmax() refers to the argument (x) that leads to the maximisation of the output (y) or f(x)?

#

and returns said argument (x)?

wooden sail
#

mhm

lusty lotus
#

damn

#

now, onto the main problem

#

how tf is any of this have to do with correcting shit weight values

wooden sail
#

you look for the weights that minimize your cost function

#

argmin (weights and biases) cost(weights and biases)

lusty lotus
#

so you repeatedly find x values that minimises f(x)

wooden sail
#

not repeatedly

lusty lotus
#

got it, why not just like use the analogy of 3b1b of rolling down a ball instead of doing maths? roll ball down fun

#

like check if the gradient of moved pt is less than previous grad

wooden sail
#

that's not enough

lusty lotus
#

why?

wooden sail
#

you would need to know the math to understand why lol

#

especially in machine learning tasks that's not a very useful condition

#

they're non convex πŸ˜›

lusty lotus
#

and update x

wooden sail
#

you just made a huge assumption

#

that gradient descent will work in the first place

lusty lotus
#

wtf it doesn't?

wooden sail
#

it only works for very nicely behaved functions, and only for special choices of step sizes lol

lusty lotus
#

:/

wooden sail
#

almost no optimizer uses only gradients

lusty lotus
wooden sail
#

then you're done, that's the whole thing you're looking for

#

or wdym?

#

there's never any guarantee that the solution you found with a neural network is the best or even generally valid, if that's what you meant

#

reproducibility, verification and related things are entire fields of study

#

sorry for crushing your dreams πŸ˜›

lusty lotus
lusty lotus
wooden sail
#

optimizers compute update vectors BASED on the gradient, but they are not just the raw gradient

#

there's rescaling and redirecting to be considered

#

also the statistics of the problem, too

#

also the gradient contains the derivatives w.r.t. all of the parameters. each one gives you some info on how to update each parameter. how MUCH info is a separate question

#

very naively, large gradients mean you're far from the solution... but not really πŸ˜› not always

#

ah, there's also the trust region to consider, since you're linearizing (or otherwise approximating) the original problem at every iteration

lusty lotus
#

right im still slightly confused

wooden sail
#

what about?

lapis sequoia
#

Does anyone know why in the official yolo repository, they multiply the loss by the batch size? they say its to make it batch size agnostic but I don't really see why. I'm not finding any division after that but maybe I didn't see it.

lusty lotus
#

i think that would be very helpful

wooden sail
#

the gradient points in the direction that the function f(x) increases the most quickly

#

the negative gradient points in the direction f(x) decreases the most quickly

#

the gradient is a vector made of the derivatives of f w.r.t. its parameters (here, x)

lusty lotus
#

sure

wooden sail
#

so we adjust x by moving it in the direction that f(x) decreases the most

lusty lotus
#

like here https://youtu.be/8d6jf7s6_Qs?t=169 why does the guy say like "in the direction of the negative of the gradient" like my first question was like I thought positive gradients = / and negative gradients = \, surely if he were to retrace / downwards isn't that still positive gradient but less?

#

then it isn't the "negative of the gradient" then, it's merely saying like "to the direction closest to 0"

wooden sail
#

no, it IS the negative of the gradient

lusty lotus
#

wtf?!

wooden sail
#

remember each point on the curve has a different gradient

#

the gradient ALWAYS tells you the direction in which the function INCREASES the most

#

regardless of whether the gradient is negative or positive

#

if the gradient is positive, it means "if we move x to the right, the function increases"

#

if it's negative it means "if we move x to the left, the function increases"

lusty lotus
#

then it has to be this?

#

like isn't where the red dot is positive grad? then the negative of that must be \

wooden sail
#

i think that line tells you nothing, that's a really bad visualization

lusty lotus
#

then surely the negative must be something above

#

but less steep

wooden sail
#

the gradient is only the steepness of the line

#

not the line itself

lusty lotus
#

right so

#

uhh

lusty lotus
wooden sail
#

i'd draw it like this

#

the gradient is a vector where each entry corresponds to one of the variables. here we only have x, so the gradient is a vector that points only along the x axis. its direction tells us in which direction f(x) increases, and the negative tells us in which direction f(x) decreases

#

why is it pointing to the right? the right on the x axis is the positive direction

#

the length of the vector g is how big the gradient is

lusty lotus
# wooden sail

wtf, then why is the grad flat? i thought grad = change in y/change in x? then if it's flat then you mean the grad is 0?

#

gosh this sucks a bit man

wooden sail
#

i'm talking only about x here

#

about the slope of the derivative, not the derivative as a function

#

as i told you, we don't actually care about the tangent line

#

only its slope

lusty lotus
wooden sail
#

yes

lusty lotus
#

then why's it flat

#

:(

wooden sail
#

dude 2x is a line

#

we don't care about the line

#

remember again, you have to evaluate the derivative to get the slope

#

it's not 2x

#

substitute x with a specific value of x

lusty lotus
wooden sail
#

the value of x of the red point

lusty lotus
#

we care about 2x surely

wooden sail
#

no

#

we care about 2x evaluated at x

#

2x evaluated at x is the slope of the tangent touching f(x) at x

lusty lotus
#

wdym "eval at x"

wooden sail
#

literally that

#

the red point on the graph has coordinates (x,y)

#

for example (3,9)

lusty lotus
#

i thought the forward pass was eval()

#

like model.eval()

wooden sail
#

we'd want 2*3 = 6

#

you're mixing up everything

lusty lotus
wooden sail
#

because i arbitrarily chose the point (3,9), yes

lusty lotus
#

like just multiply stuff and you get the correct ans? not trying to be aggressive here but im just really confused

wooden sail
#

i was giving you an example of what i meant by evaluate

lusty lotus
wooden sail
#

multiplying what?

lusty lotus
#

like why does multiplying help

#

2*3

wooden sail
#

you said the derivative is 2x

lusty lotus
#

and what does subbing 3 do?

#

finding the steepness?

wooden sail
#

if x is 3, then the slope is 6 at the point (3,9)

lusty lotus
#

wait one more thing

wooden sail
#

no, i have to go

lusty lotus
#

then what's the diff between grad and slope then

wooden sail
#

the gradient is an extension of the idea of "slope" to arbitrarily many dimensions

#

you will never work with just 1 variable in machine learning. usually a couple tens of thousands or more

lusty lotus
# wooden sail

well i think i sorta get it, it's just the pic that's confusing me a bit

lusty lotus
wooden sail
#

that's why i said it was a good idea to learn all that other stuff

lusty lotus
wooden sail
#

hmmmmmmm

lusty lotus
#

πŸ€“

wooden sail
#

not really but ok

lusty lotus
#

then we make 2x where x should be ideally 0?

#

like we shift x to the dir where it gets to 0?

lusty lotus
wooden sail
#

no, you haven't even started climbing

lusty lotus
#

damn

wooden sail
#

these are the prerequisites

lusty lotus
#

im fucking screwed then

lusty lotus
wooden sail
#

it could, but that already confused you regarding the positive and negative slopes

#

maybe someone can help you out in vc

lusty lotus
#

but yes, in vc that's where i sort my problems out at lol, i realise i get nothing done in text channels

#

i find it difficult to read messages lol

wooden sail
#

that's a problem cuz all of this stuff is in books

lusty lotus
#

cant fucking cope with text

wooden sail
#

just for reference, every single advancement in AI has been made by mathematicians and is published in papers and in books. the rest of the stuff is mostly people using it

lusty lotus
#

:(

#

im aware of that

wooden sail
#

so if you wanna learn it right, the proper paths are reading, and uni if you can't read by yourself

lusty lotus
#

i can read textbooks reasonably well, except for math textbooks, perhaps that's where i fuck up

wooden sail
#

that's a different skill altogether

lusty lotus
wooden sail
#

the term they like using is "mathematical maturity" which is separate to other forms of development

lusty lotus
#

technicalyl im "proficient" in english alr but still

lusty lotus
wooden sail
#

my 2 cents is that those don't count

lusty lotus
#

i need things to be explained in front of me lol

lusty lotus
wooden sail
#

if you pass without a sweat and don't need to study, you haven't been challenged and have never needed to develop this skill

#

as evidenced by learning by watching and never picking a book up

#

you now have bad habits

lusty lotus
lusty lotus
wooden sail
#

common experience for most people, but they usually only find out after eating dirt a couple of semesters in uni

#

anyway, g2g

lusty lotus
steady nacelle
#

After finishing Andrew NG's specialization course on deep learning . What do you guys recommend to do to land an entry ML job next?

lapis sequoia
#

hey friends! does anyone have some good resources for learning data visualization?

steady nacelle
# twilit tundra Code your own project

Hm how many projects do you recommend Rose? My best project was analysing fish behavior with RNN family and spatio temporal architectures and will publish it on IEEE

#

Waiting for acceptance on IEEE4*

#

IEEE*

twilit tundra
oblique quarry
#

When you have a decision Tree that is tasked to differentiate between 2 classes you'd obviously hope for more than 50%. My decision Tree performs with 60 accuracy. Is there a common reason as to why that happens, couldn't figure it out for hours. Heres the code for those who'd like to help https://paste.pythondiscord.com/SEIA

#

The only explanation i have for that is that the tree must be highly sensitive to large values or values whose distance to the mean is big. I found to get orders of magnitute better results when using a standardized dataset such as a normal distrubution or whatever could be wrong tho

terse coral
#

Is there any way to initialize a dataframe in pandas by passing in a list of variables and automatically use the names of those variables as the column labels?

serene scaffold
twilit tundra
#

Technically, you can using globals() but that sounds rough

terse coral
#

Gotcha. That's what I thought but figured I'd check and see

serene scaffold
twilit tundra
#

You can filter the keys but yeah, that's not pretty

terse coral
#

Thoughts on pandas vs. polars?

serene scaffold
#

my reason for using pandas is that I already know how to use it, and my issues with pandas aren't significant enough for me to want to switch away from it.

twilit tundra
#

polars is better but no one uses it

twilit tundra
low relic
cobalt pecan
#

hi can i share my regex issue with someone? i have the two patterns and they work, but when i apply the custom function to the dataframe, one regex substition seems to replace the other

#

i'll post the code link

#

most of it is already fleshed out i just need help figuring out a better way to apply the function, tysm

serene scaffold
#

@cobalt pecan please delete that paste as soon as possible, as it leaks your AWS keys

cobalt pecan
#

ok how should i share the code then

serene scaffold
#

without your AWS keys

#

you need to go change those keys as soon as possible, as someone can now use your AWS account, and you will have to pay for whatever they do.

cobalt pecan
#

i took them out of the code

serene scaffold
#

you need to go to AWS and make sure that those keys can no longer be used.

cobalt pecan
#

i've changed the keys

serene scaffold
#

you went to AWS and did it?

cobalt pecan
#

i'm doing it rn

#

but also i was given the keys by someone else to do this code, and they said it was safe to use it

serene scaffold
#

it might be safe for you to use the code, but it's not safe for you to reveal the AWS keys. that's the same as posting your Discord password in this chat.

so once you're done with all that, you need to make a separate example that has every variable defined in the code (not as a result of API calls)

#

for example, if you have a df variable in the actual code, you would do print(df.head().to_dict('list')), and then put that in pd.DataFrame( ) in the code example.

cobalt pecan
#

ahh ok i have the log file that can be used to make the df i'm manipulating

serene scaffold
#

code examples need to be fully self-contained, so they can't involve reading files.

cobalt pecan
#

ahhh ok

serene scaffold
#

@cobalt pecan this is what a self-contained pandas example looks like

data = {'a': [1, 2, 3], 'b': [2, 3, 4], 'c': [3, 4, 5]}
df = pd.DataFrame(data)
#

and if you have a dataframe in your actual code, you can make {'a': [1, 2, 3], 'b': [2, 3, 4], 'c': [3, 4, 5]} out of it by doing print(df.head().to_dict('list')) and then copying the result into your example code.

charred light
#

QQ: If your benchmark a company against it's industry. Do you include that company in the industry aggregations?

cobalt pecan
#

i have an example of a date with numerical format and a written out date

#

i have to patterns to change them to the 'X/X/Year' format, but if i apply the custom function the df, one regex sub overrides the other

#

@serene scaffold with respect to the custom function, how would you code it so both regex subs are applied to the df

#

because it seems like the second one to convert a textual date to the desired format overrides the first one

#

wait i think i got it

#

give me a second to double check

cobalt pecan
#

is there a helper online i could message?

verbal venture
#

are the number of neurons in a keras dense layer the depth of the output feature maps?

serene scaffold
verbal venture
#

I don't understand how I would be able to find that out

#

it's a CNN task. A layer is (Dense(16, padding=1, kernel = 3, stride=1)) etc. So 16 (according to chatgpt) represents the number of input neurons for the layer. Also wondering if that is the Z dimension of the feature map

cobalt pecan
#

is there a way to do unit testing for regex string matching files?

serene scaffold
#

And you want to write tests for that expression?

cobalt pecan
#

so i've written the tests for the expression, i just need to figure out how to do self.assertEqual for a function multiple times

cobalt pecan
#

i got it

storm canyon
#

What sort of libraries/tools do people use to work with large datasets?

#

I'm trying to work with a large set of parquet files and am not sure how to work with the data without running out of ram

fluid spindle
#

Hello, someone has a linalg server I can ask questions?

wooden sail
#

you can tag me in either, but i may or may not be available

velvet rampart
#

Please what does count_vectoriser.fit_transform do

frail stream
#

hello,
so i'm working in jupyter notebook for a month and noticed this inconvenience, when i'm trying to import something some libraries have duplicates. previously i was a backend developer so have python and pycharm already installed,.
I would be really grateful for your help.

void sail
#

been a while but depending on the yolovx you might already use skip connections. Skip connections are (usually) added to the first couple of initial layers to preserve some of the features at the start with deep models

lapis sequoia
void sail
#

in that case youll have to add them yourself, note that this will require retraining the entire network

lapis sequoia
#

yes i know, i'm asking about how to choose at which level to add a skip connection and how to know what type to add (concat, multiplication etc)

void sail
#

there is no golden rule for this or some deterministic process

#

you could try to take a look at the loss process and weights difference at each backwards step and layer

#

if you want to be precise about it and run experiments on a metric

lapis sequoia
void sail
#

nope does not exist im afraid, next best thing is logging the delta in weights at each layer at each training step. if you see the gradients becoming very small at the first couple of layers you can decide to add skip connections there

#

please note that skip connections mainly exists for very deep networks and vanishing gradient problem which will give you context for my suggestion

placid cedar
#

hey guys, wld anyone mind helping me with some issues?

#

i have a main fact table, called Results. this table contains a foreign key, statusID.

i also have a status table, with the StatusID being the primary key, and each statusId has a status. there's around 130 statuses

is it necessary to merge these 2 tables together?

i have to create a regression/classification model based on f1 data and trying to do a regression model, predicting the fastest lap speed. now i am thinking about whether it is necessary to merge both the status and results table. because the statusid is the status itself

past meteor
#

How do you guys feel about blogging? I feel like there's already so much material out there so I'm not even sure it's worth it

mystic berry
#

Hello

past meteor
#

OTOH, it's free personal branding and I think I would enjoy it

sleek harbor
# past meteor OTOH, it's free personal branding and I think I would enjoy it

Why does it matter what anyone else thinks? If u enjoy it, then it's worth it - if u don't enjoy it, then obviously not. There is indeed a lot of material out there.. most (and I mean most, as in >50%) is either outdated, just some people documenting their journey (which often ends up in a lot of noobie articles with mistakes in them), or just plain bad. If you actually know what you're writing about, and you know how to write - blogging would be beneficial to both u and ur readers. If u don't know what ur writing about and/or u don't know how to write - blogging would likely still be beneficial for u, tho likely not for ur readers. That's a win-win situation, for u at least πŸ˜›

Tl;dr: do what u wanna do. I'd read it, if it was of interest to me (on a topic I understand, which it probably wouldn't be :3)

past meteor
#

Personally I wouldn't write stuff like "my data science journey" because it's boring. I just went to uni and did internships.

I'm thinking of stuff like making a tiny ML framework step-by-step (maybe in another lang than Python), how to structure projects, what data viz tool to use, when ML is appropriate, an actor model approach to genetic algorithms, ...

Pretty much a mix between code and organisational stuff. Are both relevant or should I consider dropping one of the two

serene scaffold
past meteor
serene scaffold
# past meteor Did my second message get through?

no, I just wasn't caught up with all the messages after the one I replied to. bad choice on my part.

making a tiny ML framework step-by-step (maybe in another lang than Python), how to structure projects, what data viz tool to use, when ML is appropriate, an actor model approach to genetic algorithms,
I'd be interested to see what you come up with.

#

personally, I'd be less interested in the data viz part.

past meteor
#

Personally I detest data viz. I ask about it in interviews and if it's a big component I bail.

The reason I'd write about it is moreso that people tend to struggle with picking the right tool in my opinion. Like, FOSS bi tools vs proprietary vs Python vs fully in JS . It sounds boring but our research group has burnt itself in the past by doing this

#

People went for custom solutions that didn't warrant the complexity of the project. Analytics people on the other hand only have a hammer called tableau so then everything is a dashboard shaped nail 🀣

weak mortar
#

Hello! Weird problem today with anaconda. It will only execute one line of code then exit the script. Ie do nothing, but if i put a print("whatever") on first line, it will print it and then not do more

#

It worked a few days ago and i didny change anything afaik

#

Also a handful of times it crashed VSC saying out of memory

weak mortar
#

Maybe easier i just install two versions of python and then can install the outdated libs i need in each specific python version πŸ€”

#

Its not really a datascience question but a guy in general said i should ask here pepeshrug

chrome ginkgo
#

Hello

quaint loom
#

Is there anyone here who is familiar with WPS and know how to copy equation that I have written there and now want to move to a word doc?

quaint loom
magic dune
quaint loom
velvet rampart
#

Please what does count_vectoriser.fit_transform do

mild dirge
#

@velvet rampart

rose agate
#

I have a data frame 'df' with columns TO and FROM which are numbers, TO is always greater than FROM, and I need to find which ranges in this frame overlap with ranges from a predetermined segment I'm looping through. The current statement to find this is
df[~(data.TO < segFROM) & ~(df.FROM > segTO)]
However with the scale of the data I'm working with this is very slow. Is there any way to make this statement faster by sorting the data frame to do a faster search for each of those conditions I'm filtering on?

tidal bough
#

Well, you could definitely do a searchsorted manually, but how to make it to it automatically, hmm...

#

(my mental model of some people in this channel says "use duckdb" :p)

tidal bough
#

I tried to just sort the indices, but it doesn't seem to make it measurably faster.

last ivy
#

Is there a good online reference for an intro to time series analysis ?

quaint loom
hasty mountain
#

Hey guys, I'm now making some experiments in a technique I'm trying to develop and I want to save my model metrics in an excel file.

Question is: is there a motive to why would I prefer CSV over classic xlsx excel files?

I just want to confirm this. It's actually the first time I'm saving my metrics to excel (in a desperate effort to make things more organized and easier to visualize...I had to re-run some tests in a previous research because of my poor organizing skill)

quaint loom
hasty mountain
#

Hm... I'll take a look

#

I may not reuse the variables in that file to extract data...I think. Unless I try to analyze the correlation using some algorithm, I think...

quaint loom
hasty mountain
quaint loom
hasty mountain
#

Thanks! πŸ‘

quaint loom
potent sky
wooden sail
#

maybe even of things you often see explained incorrectly or that people struggle with

#

when i discuss stuff with other doctorands or with students we supervise, once i notice a trend, i prepare some material to explain to/discuss with them, and it anyway ends up in a mixed format of latex/jupyter/drawings/video that already lends itself to just slapping onto a github repo with static site generation

#

why upside down πŸ˜”

potent sky
#

I mean it as a rueful smile. I've seen this (slapping everything on a github repo with SSG) often enough that it's relatable, and though I think I would like to do differently, the fact remains that it's actually a pretty good way of making all these resources accessible. πŸ™ƒ

past meteor
#

Yeah! I definitely think data science is more than just models so I'll cover some of those topics!

#

Although I also enjoy deep diving on esoteric ML stuff that I use at work like multifidelity gaussian processes

placid cedar
#

guys,in my dataset, i have this column called driversID. and it has 800+ unique values. should i drop the column entirely, or keep it?

i am doing a linear regression model here

sleek harbor
# past meteor Although I also enjoy deep diving on esoteric ML stuff that I use at work like m...

If I'm not mistaking u for someone else (and that's something I do a lot online), then u work with timeseries dataΒΏ If yes, then I'd be interested in reading about that. As u all know I'm a noob in everything, so I likely won't get most of the advanced stuff, but I am specifically interested in that stuff cus I plan on eventually trying to build a trading algorithm (got some domain knowledge in the area so to speak)

serene scaffold
past meteor
#

It's free. Issue is that it's R and has a very businessy forecasting perspective to time series but it's very comprehensive

#

I learnt a lot from time series because it was my master's thesis topic and I did a few kaggle projects on it

sleek harbor
past meteor
#

Yes

golden brook
#

Any one good with Pandas here?
I posted a question on Stackoverflow but no help so far.

serene scaffold
#

with pandas in particular, give a reproducable copy of the data with something like print(df.head().to_dict('list'))

golden brook
#

I was going to post the stackoverflow link here as it's easy to use to pd.read_clipboard with the formatting.

serene scaffold
golden brook
serene scaffold
#

@golden brook someone answered on SO with an answer that involves a for loop. That's probably the best solution, since pandas isn't great for mutations that require "memory"

grizzled carbon
#

Hi guys, I recently took a course on AI and wanted to build onto it. I ofcourse learnt about the MNIST dataset and wanted to implement a simple prediction app, where someone can draw a number on a black square and then get the models description. Now my problem is, that the model is super bad haha.
I have tried multiple models that all have had a reported accuracy of atleast 97% after having finished training. But once i try it out it gets like every 3 picture wrong :)) . I am pretty sure its on how i process the image but cant figure it out myself.

The image is received b64 encoded so first i do is decode and then process:
I use PIL to get the image and then turn it into a nparray with dimensions (1,28,28) so that it can be used in model.predict.

def retrieveB64(postRequest):
        image = base64.b64decode(postRequest,validate=True)
        decoded_string = io.BytesIO(image)
        img = Image.open(decoded_string)
        return img
def ImageForModel(image):
        image = image.convert("L")
        image = image.resize((28,28))
        array = np.array(image) 
        array = array / 255
        array = (np.expand_dims(array,0))
        return array

If anyone has any input would be great!
For debugging I also save the image after the resizing step to check if anythings wrong but they always look fine

#

this is how my input images look after resizing them

latent tendon
#

I am trying to get jupyter notebooks to work with my visual studio code. I am working on a project and it told me to download anaconda. I had anaconda downloaded however my projects were in visual studio code and I decided to work through visual studio code and master it a bit more and than work in anaconda so I deleted it.
A few months later I recognize that maybe data analyst jobs are more available right now than django jobs and I start studying data science.

I manage to use pip to install jupyter. When I do so, and run a cell such as import pandas as pd it acts like its never heard of pandas and will not import anything.

import pandas as pd

I am wondering if I this is a not downloading anaconda issue or a Jupyter issue.

Do I need to import things through visual studio code?

What have I tried and what am I expecting?

I have tried looking at the working with juypter notebook documentation and it says download anaconda. I have managed to miniconda.

Is still not recognizing pandas.

Error Message:

ModuleNotFoundError Traceback (most recent call last)
Cell In[2], line 1
----> 1 import pandas as pd

ModuleNotFoundError: No module named 'pandas'

left tartan
left tartan
left tartan
# golden brook So the dataframe is included in the post at the start. You can just copy and th...

Ok, this took me a few tries... I think this is what you're looking for. The idea is: separate the data into groups, based on whether there's a consecutive increase (start) or decrease (end). The first row is a "start". For each group, the "period" (the result you're looking for) is either a True if the group was a start, and False (since the group was an "end") ```py
import pandas as pd

data = {
'date': ['1993-01-29', '1993-02-01', '1993-02-02', '1993-02-03', '1993-02-04', '1993-02-05', '1993-02-08', '1993-02-09', '1993-02-10', '1993-02-11', '1993-02-12', '1993-02-16', '1993-02-17', '1993-02-18', '1993-02-23', '1993-02-24', '1993-02-25', '1993-02-26'],
'value': [0.44, 0.44, 0.45, 0.44, 0.44, 0.56, 0.59, 0.58, 0.57, 0.54, 0.53, 0.47, 0.42, 0.38, 0.35, 0.39, 0.43, 0.46]
}

df = pd.DataFrame(data)

df['start'] = ((df["value"] - df["value"].shift()) > 0) & ((df["value"].shift() - df["value"].shift(2)) > 0)
df.loc[0, "start"] = True

df['end'] = ((df["value"] - df["value"].shift()) < 0) & ((df["value"].shift() - df["value"].shift(2)) < 0)

df['group'] = (df['start'] | df['end']).cumsum()
df['first_start'] = df.groupby('group')['start'].transform('first')
df['period'] = df['first_start'].shift(1, fill_value=False)

print(df)

serene scaffold
#

@left tartan amazing

left tartan
#

And for giggles, a duckdb / sql version, I also could’ve done the cumulative sum here, but opted for an asof join: ```py
import pandas as pd
import duckdb
data = {
'date': ['1993-01-29', '1993-02-01', '1993-02-02', '1993-02-03', '1993-02-04', '1993-02-05', '1993-02-08', '1993-02-09', '1993-02-10', '1993-02-11', '1993-02-12', '1993-02-16', '1993-02-17', '1993-02-18', '1993-02-23', '1993-02-24', '1993-02-25', '1993-02-26'],
'value': [0.44, 0.44, 0.45, 0.44, 0.44, 0.56, 0.59, 0.58, 0.57, 0.54, 0.53, 0.47, 0.42, 0.38, 0.35, 0.39, 0.43, 0.46]
}
df = pd.DataFrame(data)
result = duckdb.execute("""
WITH input as (
SELECT date,
value,
ifnull(value - lag(value) over l > 0 and lag(value) over l - lag(value, 2) over l > 0, True) as cstart,
ifnull(value - lag(value) over l < 0 and lag(value) over l - lag(value, 2) over l < 0, False) as cend
FROM df
window l as (order by date)
),
boundaries as (SELECT date, cstart from input WHERE cstart or cend),
periods as (SELECT input.date,
boundaries.cstart
FROM input
ASOF JOIN boundaries on input.date >= boundaries.date)
SELECT date, ifnull(lag(cstart) over (order by date), True) as signal FROM periods
""").df()
print(result)

rustic snow
#

I am going to have a Machine Learning Interview in 2 days
Can you guys let me know what kind of questions does the interviewer ask about machine learning (besides algorithms and data structrues)

slim bone
rustic snow
#

@slim bone ye I've been coding for 5 years and I know algorithms and data structures, a lot of them at least

hasty mountain
#

Does someone know about a paper or article where the researchers have tried to combine Genetic Algorithms with Stochastic Gradient Descent to train neural networks?

#

I've tried to search about that and asked my professor about it, and got no results back then. But now that my research on that shows that my method is likely to fail, it may be interesting to double-check if someone tried more efficient methods

gilded kestrel
#

anyone with colab pro? Does high-memory give you 51gb cpu ram or 25gb?

lapis sequoia
#

What math do I need to know to start studying this field?

#

I’m currently studying precalculus

serene scaffold
placid cedar
#

hey guys, after doing winserisation, i still have some outliers, but it got reduced from 900 to 700

#

is that still bad?

left tartan
lapis sequoia
#

no way around it tho ig

serene scaffold
#

but that's setting yourself up for long-term issues.

lapis sequoia
lapis sequoia
#

hopefully 6 months is a reasonable time to learn all of that

serene scaffold
lapis sequoia
#

it’s just an arbitrary goal i set

serene scaffold
#

one doesn't "learn AI" in six months.

lapis sequoia
#

i meant the math

serene scaffold
#

how are you going to measure your progress?

lapis sequoia
#

not the field in general

#

idk good question

#

i’ll probably follow a course

left tartan
# lapis sequoia it’s just an arbitrary goal i set

fwiw, set reasonable goals... and, in my experience, I don't really know it until the second time through the material. So, it's reasonable to aim for familiarity with, say, the ideas behind calculus, linear algebra and statistics... but it's unrealistic to try to "know" them at the college course level. Doesn't mean you'll be proficient in any of them, but it'll be a good starting point.

lapis sequoia
left tartan
lapis sequoia
#

if it’s gonna take 2+ years i might as well just wait and take it in college 😭

left tartan
#

That's why i suggest just aiming for "familiarity" rather than "mastery" proficiency

wooden sail
#

also depends what you call mastery

placid cedar
#

hey guys, after doing winserisation, i still have some outliers, but it got reduced from 900 to 700
is that still bad?

left tartan
#

I should say "proficiency" (like: passing a college class)

left tartan
lapis sequoia
#

i see

#

alright thanks for the help

#

i’ll just work on getting familiar with the material

#

hopefully that’s enough to make some cool stuff πŸ˜‚

wooden sail
#

luckily for you, the basics of linalg, calculus and statistics can be learned independently of each other, so you could realistically try them at the same time

lapis sequoia
#

good to know, thanks πŸ‘€

umbral ermine
#

Hello everyone

lapis sequoia
umbral ermine
#

how are you

#

new here

lapis sequoia
#

I am full stact developer.

night wadi
potent sky
#

what is "pure math" and "not-pure-math"

slim bone
#

I think just looking up β€œpure math” gives pretty good results

potent sky
#

The definitions of "pure math" I could find online are all contingent on the motivation for application of that math, rather than specific qualities of the mathematical concepts themselves.
I can't see the strength in this definition. Different concepts of mathematics that might not be readily apparent as having "real-world" applications might find one shortly.
Context is usefulness of making a categorization for pure math

potent sky
#

Why isn't all math pure math? All math has qualities that lend themselves to rigor and generality, and aesthetic quality is rather subjective.
Applied mathematics should just classify some allowances that we make to a mathematical concept (such as sacrificing some degree of rigor or generality) in return for increased practical applicability.
It shouldn't classify a basic division of mathematical concepts themselves, with statements like "calculus isn't pure math".
I don't understand the usefulness or fairness of such a classification.

#

also this is pretty off topic by now I suppose, mb lol

forest lintel
#

whats the terms i mean

slim bone
# potent sky Why isn't all math pure math? All math has qualities that lend themselves to rig...

is rather subjective
As far as I understand the definition is entirely subjective. As in, there is nothing that makes a concept "pure" or "unpure".

I'd reckon there are probably few subjects that definitely fall into one category but most appear to be on the spectrum between "pure" and "unpure"(applied?) and its place on it will change between each individual.

tl;dr: I completely agree with you lol

#

Ah, I didn't see @ DarQ replying to anybody. Now I realize there was context. Apologies lol

potent sky
odd meteor
night wadi
#

or maybe computational math fits best but I dunno how popular that term is

dusty valve
#

just made an overloading decorator in native python

#

Took some black magic but it works

serene scaffold
dusty valve
#

Huh ye

#

Wrong

#

Thank stelersus

desert oar
#

i'd say it's slightly on topic because just about the only task domain where i think multiple dispatch makes a lot of sense is in numerical and mathematical code

#

otherwise it leads to confusing code, unless you're very disciplined or have a clear guiding framework like in haskell with its typeclasses

#

whereas it's kind of necessary with the zoo of different number and array types you might encounter

serene scaffold
desert oar
#

lol, it's not a particularly inspired demo to put in the readme

#

fwiw i think this project predates .format and i'm fairly sure it predates f-strings

#

ok it doesn't predate .format, its first pypi release was 2014

serene scaffold
#

I figured as much, in part because it doesn't use type hints.
also format predates f-strings

#

I hated .format from the start

desert oar
#

really? i switched to .format from % as soon as i learned about it

#

i still use it from time to time

serene scaffold
#

I started using python right when 3.6 came out

desert oar
#

i think i started on 3.3 in school, then 3.4 -> 3.6 at my first job

serene scaffold
#

I sometimes use modulo formatting for strings that have a lot of curly braces in them that are part of the string. .format occupies a weird middle ground that is never useful for me.

desert oar
#

had a very forward-thinking professor starting us on python 3.x that early in its development when many people were still clinging to 2.7

#

i use .format for templating

#

i know we also have string.Template but that seems particularly not-useful by comparison. would be interesting to know the history behind that one

serene scaffold
#

one time in the python bot I did .format (without calling it) and assigned the method to a variable. that was fun

desert oar
#

yeah why not?

#

i basically use it as a lightweight no-dependency alternative to jinja

#

admittedly not a very common use case, but it happens

sharp quest
#

Is it okay to ask about Pandas here?
I have a CSV file where two column, date and amount, are interesting.
I'd like to filter out all rows that match YEAR and MONTH, then sum the amount.
Do I need to add an index or create a new dataframe?

desert oar
#

with date as part of the index it's sometimes more elegant, although in this case it's mostly the same

#

it's worth spending the time to understand each piece of the above example. i think it demonstrates several important principles about how pandas works and how to use it effectively

sharp quest
#

Thanks mate, it'll give me a lead to work on. Pandas seem so cool but it's really not simple and easy to get discouraged when poking around it.

#

Or it's just me being dumb πŸ˜…

lapis sequoia
#

could someone pls help me with replacing values in a pandas dataframe with dates pls?

#

2023-08-15 00:00:00 177.449997
2023-08-16 00:00:00 176.570007
2023-08-17 00:00:00 174.000000
2023-08-18 00:00:00 174.490005

this is what my dataframe looks like. the length of this is 5946

#

predictions_dataset.loc[5944] = 300

but if i use this line to replace one of the last values

#

it just adds it to the end and results in this

2023-08-15 00:00:00 177.449997
2023-08-16 00:00:00 176.570007
2023-08-17 00:00:00 174.000000
2023-08-18 00:00:00 174.490005
5944 300.000000

#

predictions_dataset is the name of the dataframe btw

cobalt salmon
#

Hello, hope it's ok to ask a question about conda, jupyter notebooks and installing a package. I've got conda version 23.7.2 and am trying to install the pyclustertend package from within a jupyter notebook using !pip install pyclustertend however it tries to compile sklearn from source code. On a Mac this fails by default and the recommendation is to compile sklearn from source instead, which I am trying to do using the instructions here:
https://scikit-learn.org/dev/developers/advanced_installation.html#compiler-macos

However, when I do conda activate sklearn-dev it doesn't accept the command and says activate is an invalid choice, which is just weird. I couldn't find much on Google about this. Here's the output:

usage: conda [-h] [--no-plugins] [-V] COMMAND ...
conda: error: argument COMMAND: invalid choice: 'activate' (choose from 'clean', 'compare', 'config', 'create', 'info', 'init', 'install', 'list', 'notices', 'package', 'remove', 'uninstall', 'rename', 'run', 'search', 'update', 'upgrade', 'doctor', 'debug', 'pack', 'content-trust', 'repo', 'verify', 'index', 'build', 'env', 'metapackage', 'develop', 'convert', 'inspect', 'render', 'server', 'skeleton', 'token')

Note: you may need to restart the kernel to use updated packages.
#

Weirdly, from the command line, conda activate works just fine. The versions of Conda between the jupyter notebook and the command line are the same, I'm not sure what the difference is

odd meteor
# cobalt salmon Hello, hope it's ok to ask a question about conda, jupyter notebooks and install...

The exclamation notation to install package from JNB isn't advised. Use anyone of these instead.


import sys
!{sys.executable} -m pip install package_name

Or better still,

%pip install package_name
%conda install package_name

In version 7.3 and above of Jupyter you should always use the line magic commands %pip or %conda to install a package into a current kernel instead of using !pip (which installs the package into the instance of python that launched your JNB)

If the above doesn’t work, you might wanna confirm if you can install packages directly from JNB, or if you need to allow some sort of access for the package to be installed.

odd meteor
lapis sequoia
#

i dont have a column called scored but i figured it out with a bit of messing around

#

predictions_dataset.loc[index:index+1] = value

#

this line replaces the value at index with the value of value

#

i have no clue why this works but it does

#

so i can change a range of values (kindof?) but i cant change a single value without adding a new row

patent tree
#

Hello community members,

I have been assigned a project to create an audio-book app as part of my curriculum. To enhance the app's features, I am planning to implement a content-based recommendation system. This recommendation system will provide users with suggestions for audiobooks based on their clicks and listening time.

For instance, if a user listens to adventure category books more frequently than autobiographies, the algorithm will prioritize recommending adventure books over other categories. I hope this clarifies the concept.

Given that I have only couple of weeks days left to complete this project, I intend to focus solely on the essential aspects (algorithms) required for building this recommendation system.

I would greatly appreciate guidance on the specific machine learning algorithms or techniques that are suitable for developing such a recommendation system. Additionally, I'm unsure whether this recommendation system necessitates deep learning or neural networks. If they are indeed required, could you please suggest the relevant algorithms?

Currently, I am familiar with numpy and pandas, and I possess a basic understanding of supervised machine learning (though not at an advanced level).

Thank you in advance for your assistance.

odd meteor
# lapis sequoia the index is the date

.loc doesn't use the 0-indexed ordering but iloc does.

Better still, for more flexibility, convert the index of your panda's dataframe to a DateTimeIndex

!e


import pandas as pd

data = {'Value': [10, 15, 20, 25]}
dates = ['2023-08-01', '2023-08-02', '2023-08-03', '2023-08-04']

datetime_index = pd.DatetimeIndex(dates)
df = pd.DataFrame(data, index=datetime_index)

# Adding a new row using loc
new_date = '2023-08-05'
new_value = 30

df.loc[new_date] = new_value

print(df)
lapis sequoia
#

i need to do this eventually

cobalt salmon
#

@odd meteor thanks, I've tried %conda install pyclustertend. That gives me:

Collecting package metadata (current_repodata.json): done
Solving environment: unsuccessful initial attempt using frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: unsuccessful initial attempt using frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

  - pyclustertend

Current channels:

  - https://repo.anaconda.com/pkgs/main/osx-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/r/osx-64
  - https://repo.anaconda.com/pkgs/r/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.
odd meteor
# patent tree Hello community members, I have been assigned a project to create an audio-book...

Well, for your task, if the data is in tabular form, you don't necessarily need to Deep Learning to build a Recommendation Engine.

Although you'd still have to decide which type of recommender system you want to build.

  1. Content-based
  2. Collaborative Filtering
  3. Hybrid (combines 1 & 2)
  4. Neural Collaborative Filtering (NCF)
  5. Etc

This could be of help

  1. https://www.youtube.com/watch?v=GWFC2_9_iVk

  2. https://github.com/topspinj/recommender-tutorial

GitHub

An introduction to recommendation systems in Python - GitHub - topspinj/recommender-tutorial: An introduction to recommendation systems in Python

odd meteor
cobalt salmon
#

@odd meteor I'm on a Mac, I assume the powershell reference you mentioned was for Windows?

I have already tried the simple %pip install pyclustertend methods - they all fail because it tries to compile scikit-learn from source code and then fails because OpenMP isn't available:

Collecting pyclustertend
  Using cached pyclustertend-1.6.2-py3-none-any.whl (7.1 kB)
Requirement already satisfied: matplotlib<4.0.0,>=3.3.3 in /Users/kjaleel/anaconda3/lib/python3.11/site-packages (from pyclustertend) (3.7.1)
Requirement already satisfied: numpy<2.0.0,>=1.19.1 in /Users/kjaleel/anaconda3/lib/python3.11/site-packages (from pyclustertend) (1.24.3)
Requirement already satisfied: pandas<2.0.0,>=1.2.0 in /Users/kjaleel/anaconda3/lib/python3.11/site-packages (from pyclustertend) (1.5.3)
Collecting scikit-learn<0.25.0,>=0.24.0 (from pyclustertend)
  Using cached scikit-learn-0.24.2.tar.gz (7.5 MB)
  Installing build dependencies ... \

This is why I was trying to compile scikit-learn from source using the instructions at: https://scikit-learn.org/dev/developers/advanced_installation.html#building-from-source

However, I'm having 2 problems - one is that after I create an environment in Conda, it refuses to activate it and says that there's no such command defined. I have given an example of this above already.

From the command line (outside Conda), the activate command works fine and I'm able to switch into the environment and I can compile a new version of sklearn, but then what do I do to use that inside Conda? Since I can't run conda activate I'm stuck. This is a silly circular problem

#

Maybe I should just give up on this package and use something else? I'm just trying to follow an example from a Udemy course that is using the hopkins module from pyclustertend to do a K Means Clustering example. Maybe there's a different Python library to do that?

odd meteor
#

Yeah it's for Windows. I don't use Mac but you can still try installing directly from anaconda prompt. I suppose you have the normal anaconda prompt, yeah?

patent tree
odd meteor
cobalt salmon
odd meteor
odd meteor
cobalt salmon
lapis sequoia
#

can someone help me debug my code pls?

#
# there are roughly 250 working days a year, which means 750 new values need to be predicted

future_predicted_values = []

# Predict the next day's price
historical_data = yf.download(ticker, '2000-01-01', current_date)
historical_data = historical_data['Close']
historical_data_values = historical_data.values

for i in range(750):
    # Only keep the last 60 days
    #print(historical_data[-3:])
    historical_data = historical_data[-60:]
    print(historical_data[-5:])
    

    #print('historical_data')
    #print(historical_data)
    #print()

    reshaped_historical_data = np.reshape(historical_data_values, (-1, 1))

    # Scale the data to be between 0 and 1

    normalized_historical_data = (reshaped_historical_data - np.min(reshaped_historical_data))/(np.max(reshaped_historical_data)-np.min(reshaped_historical_data))
    # Store the data to reverse normalization later
    historical_data_max = np.max(reshaped_historical_data)
    historical_data_min = np.min(reshaped_historical_data)

    #print('normalized_historical_data')
    #print(normalized_historical_data)
    #print()

    # Create an empty list
    new_x_test = []
    new_x_test.append(normalized_historical_data)
    new_x_test = np.array(new_x_test)

    #print('new_x_test')
    #print(new_x_test)
    #print()

    # Reshape the data
    new_x_test = np.reshape(new_x_test, (new_x_test.shape[0], new_x_test.shape[1], 1))

    # Get the predicted scaled price
    predicted_price = model.predict(new_x_test)
    predicted_price = predicted_price * (historical_data_max-historical_data_min) + historical_data_min

    # Print the predicted price
    #print(historical_data[-1])
    print(predicted_price)
    historical_data = np.append(historical_data, predicted_price)
    #print(historical_data[-1])
    future_predicted_values.append(predicted_price)
    

#

this generates a new value as expected

#

i then add that new value to the array historical data

#

but then it generates the same value again

#

why is that?

odd meteor
# cobalt salmon <@519319496868233227> yup, that is mentioned on that course too. It's just this ...

You can install packages from command line (just like you do with VSCode via the terminal)

You can also install directly from your Jupyter Notebook.

You can as well install packages from your anaconda prompt (if you have anaconda installed on your pc)

The last suggestion I made was installing it via anaconda prompt.

I don't use Mac, but I sure know it's possible for you to still get that package so long as it's available on PyPi and can work on any OS.

  1. Just search "Anaconda Prompt" on your Mac and open it.
  2. Activate your desired anaconda environment (this step is not compulsory)
  3. Then type pip install pyclustertend to install the package
  4. Go back to your JNB, restart your kernel.

That should do the magic.

odd meteor
lapis sequoia
#

bruh is it even possible to predict stock prices with ai

daring sphinx
humble shore
#

damnnn

#

you making $$$$

#

this is a simple regression module right?

#

and is this upwork?

#

@daring sphinx ??

daring sphinx
humble shore
#

ya but the model it self was a regression model

#

btw were you the employer or the employy

daring sphinx
#

The entire model was trained in sagemaker as well. Optimizing hyperparameters of xgboost. All with built in sagemaker libraries

daring sphinx
cold osprey
#

r u doing someone else's homework or smth?

sacred stirrup
#

Hey everyone, I would like some guidance based on what i'm currently trying to do

I got few thousand images of cars with visible license plates, some have multiple cars, some have none, and for each image four pairs of integers which represent bounding corners of the license plate quadrilateral. In order to use that data to train a custom model, what is the preferred way to start? This is different from other machine learning models I've used in the past because usually there's a set of "objects" (animal, plant, human, building, ...) and the goal was to classify the image to something from that set, but this one's different

Is tensorflow / keras even suitable for this? Appreciate any feedback

dusty valve
pale basalt
#

Guys I am trying to extract text from PDF using ocr to excel. I need help of pro coder in ocr. DM me

dusty valve
#

But looking deeper it something for subclasses, ill take that into account later

dusty valve
#

Or u can use

#

!pypi pytesseract

arctic wedgeBOT
dusty valve
#

Just download the executable and ur done

pale basalt
#

Okay thanks

#

Can we directly bring it into excel using pytesseract?

odd meteor
#

At surface level, yeah it is, but don't hold your breathe on its efficacy `cos relying solely on the model's predictions for investment signal is an all expense-paid high ticket to bankruptcy. Same thing with using ML for Bitcoin prediction.

potent sky
south crow
#

Hi do yall know the max supported verison of keras? I saw on the documentation Keras's website that it supports up to 3.10, but dose like 3.10.9 works?

fickle dew
slim bone
#

People pay 50 bucks for this stuff?

slim bone
twilit tundra
#

50 sounds very cheap no matter how simple the model is

slim bone
#

Seriously?

#

Reminds me of stories about the late 90's where web developers would get goofy amounts of money for virtually nothing

twilit tundra
#

If it's for a company, 50 is basically nothing

#

Freelancers often have a rate going up to around 1.5k/day or more

slim bone
#

And surely you have to factor in how much work went into the actual model

#

I could program that right now and I consider myself a complete pleb

twilit tundra
#

Yeah probably, still pretty cheap considering it would probably be more than 1 hour of work

slim bone
#

Damn, I'm tempted to just open a few freelancing profiles and seeing where this goes

twilit tundra
#

You have to take into account that someone paying for this kind of service can't do it themselves

#

Most people have no idea how to train/deploy a model, it's not their role

slim bone
#

Oh. Of course
If they have the faintest idea they probably wouldn't pay 50 bucks for this

twilit tundra
#

The cheaper option is hiring a free intern I guess

slim bone
#

Again, I'm assuming some basic regression model
I'm sure this could get extremely complicated very fast

slim bone
twilit tundra
#

At my company, it's cheaper than an intern if it would take said intern more than 2 / 3hours

#

And according to the description, it's more than the model: you have a pipeline, an interface and it's hosted on AWS

#

Which is again, not a very complicated task but the value for a company that would need that is way more than 50

slim bone
#

I suppose
I guess I just assumed the freelancing market would naturally reduce the price to naught

#

It’s almost disappointing to realize just how easy it is to implement a half-decent model

twilit tundra
#

There are some that put very low prices for exposure but the lowest I've found were still around 300€/day

slim bone
#

That’s crazy tbf

#

Again, just knowing how little work this could actually be
Especially if you’ve made similar projects

twilit tundra
#

You have to find clients and you pay fees + taxes

#

And companies with ML use cases usually have large cash flows

slim bone
#

That’s true, it also kind of occurred to me that doctors in the private sector charge way more than that for simple routine checks
A pretty bad comparison but the point still stands - knowledge is very valuable regardless of the effort

iron basalt
# slim bone That’s true, it also kind of occurred to me that doctors in the private sector c...

Not really just a knowledge issue in that case, but that is a whole off topic discussion to be had. The point that just knowing anything is important is still correct. Also if someone has the skill set to do something far more complicated and high paying with their time, they are paying opportunity cost, and so you need to pay them more to make it worth their time. There is an opportunity to fill that gap where you can be payed much less, and know much less, and people do fill that role (by just barely dipping into ML and only knowing surface level knowledge / how to use stuff like these Amazon tools).

slim bone
#

Regarding their competence - I can't advocate. But they do have a portfolio that seems legit

#

This is the main reason this is pricing is strange to me. Because I've seen people with much more knowledge (As far as you can quantify "knowledge"), working much harder, being paid much less

twilit tundra
#

5 bucks to do what? A full website? A module? Either way that sound very low

iron basalt
slim bone
twilit tundra
#

You're barely paying for your electricity and internet at that rate

slim bone
slim bone
iron basalt
slim bone
#

I suppose. But I'm looking for a stable job myself

#

I'll concede and say that learning ML is far more daunting and confusing than learning Javascript though lol

iron basalt
#

And having a general skill set protects you from this. For example, if ML suddenly dies down (doubt), then if you learned the math for it (which is very general purpose), you will have an easier time finding something else, since you can spend less time preparing for that.

twilit tundra
#

ML is basically trendy statistics

slim bone
#

I mean that's a little offensive isn't it? haha

iron basalt
#

The name will probably change at some point, but it will be around.

slim bone
#

This conversation kind of made me think - Do ya'll know how researched non-NN based ML is? (Hope I didn't butcher the terminology)

#

I thought ML == Neural Networks not too long ago and I was happy to find out that I'm completely wrong

iron basalt
#

(Compute graphs)

slim bone
#

Not sure honestly. Decision tree learning or whatever it's called comes to mind

twilit tundra
#

Boosting-based models are still very used in everything tabular if that's part of non-NN ML

slim bone
twilit tundra
#

It's a lot slower than deep learning but there are still research afaik

slim bone
#

Like, if I want to pursue a masters in ML - is there a reasonable chance that my professor would want to make a thesis about something that isn't NN?

slim bone
slim bone
#

The syllabus is still rather gibberish-y atm

twilit tundra
slim bone
twilit tundra
#

NLP, Computer Vision, interdisciplinary courses,etc.

slim bone
twilit tundra
twilit tundra
slim bone
#

I suppose I only know about DL so my mental image of the field is rather tiny.

twilit tundra
#

You need knowledge for NLP that is different from other fields but the more recent models are considered DL and overall it's part of ML imo

#

Like ML is a broad term and then NLP is ML applied to language

slim bone
#

Oh. Curious

#

And the same goes for those specializations you mentioned I assume?

twilit tundra
#

Yes

slim bone
#

That's honestly nice to hear
I've been hoping to deviate from NN towards the end of my summer break

#

But learning the theory behind NN took me weeks upon weeks

twilit tundra
#

Everything except tabular data uses NN unfortunately

slim bone
#

Unfortunately?

twilit tundra
#

If you want to do research on other fields

slim bone
#

Oh, so you have no choice?

twilit tundra
#

If you want to do research in CV, NLP or speech, you probably need to work on neural networks/DL to produce "publishable" results

slim bone
#

That doesn't sound too bad

#

whole learning process has been fascinating so far

#

Then again, didn't read a thing about statistics

iron basalt
#

ML was born from memoization of Checkers board states. Rather than doing the whole tree search to compute the value, only do it once and remember it for next time (learning). This then mixes with Monte-Carlo methods / probability / statistics. Rather than do everything, do some, and then guess the rest (induction / abduction). This also opened up the scope to problems where you can't try everything and are forced to guess. ML on its own is a mathematical topic (theory of computation, probability, statistics, decision theory, (multivariate) calculus, linear algebra (Von Neumann machines love linear algebra), etc).

#

There are several NN / biologically inspired methods in ML, and they can be very different in feel. The most popular, DL, is very different from other NN based methods. Most of these NN based methods only draw loose inspiration from biology. To be effective they must be very different from their biological counterparts, because they need to run well on existing computer hardware, and that hardware is not well suited at all to directly simulating such biological systems.

#

Typically they seek to replicate some mathematical insight given by the biology (which evolution randomly found for us, so just copy it).

slim bone
# iron basalt ML was born from memoization of Checkers board states. Rather than doing the who...

The book I'm reading gave a brief overview about how machine learning came to be which was rather interesting to see.

I don't know the slightest bit of statistics yet, so there's a bit of a void in my heart in that aspect.

I am kind of curious to know just how relevant calculus and linear algebra is to modern-day ML research? As in, how much of it do you actually need to know in order to find something new? It feels like the modern libraries have abstracted and optimized every single tensor operation to its maximum for example

iron basalt
slim bone
slim bone
#

I can only imagine how crucial it is to be able to multiply matrices properly efficiently but that's something that's already been implemented for you

iron basalt
slim bone
#

So the definition is pretty loose it seems

#

Unless I'm missing something

iron basalt
slim bone
#

Right, it's the "many layers" part that got me wondering

#

"Many" is probably entirely subjective

twilit tundra
#

According to my supervisor, it is more than 2

slim bone
#

Yeah that's kind of what I heard

slim bone
naive crown
#

Guys, I take code for training model that works, and then I just change one dimension (with updating model paramiters) and then the model loss stays constant forever. Can someone please help or give me tips

mild dirge
slim bone
mild dirge
#

Yeah, maybe just echo chamber of confusion

slim bone
#

I think so

iron basalt
twilit tundra
#

My own definition is that deep learning is when the model is able to learn features without you having to design them

slim bone
#

From my experience as a beginner, beginners don't often think about the input and output layers as a layer

iron basalt
#

Ultimately, it's a buzzterm.

slim bone
#

Is Machine Learning a buzzterm as well?

slim bone