#data-science-and-ml

1 messages · Page 73 of 1

wooden sail
#

i used to do that too but ended up wsling due to laziness to reboot

tidal bough
#

now rewriting it in rust

#

oof, it's about as fast as the python solution. which, like, makes sense because it uses python lists, I suppose

night prawn
#

I've followed this tutorial but it doesn't work. https://medium.com/@gpj/making-your-ai-sound-like-you-a-guide-to-creating-custom-text-to-speech-8b595d5cf259 it gives me this erroe message : ModuleNotFoundError Traceback (most recent call last)
<ipython-input-2-0f6e1f002713> in <cell line: 9>()
7 import IPython
8
----> 9 from tortoise.api import TextToSpeech
10 from tortoise.utils.audio import load_audio, load_voice, load_voices
11

2 frames
/content/tortoise-tts/tortoise/models/transformer.py in <module>
4 import torch.nn.functional as F
5 from einops import rearrange
----> 6 from rotary_embedding_torch import RotaryEmbedding, broadcat
7 from torch import nn
8

ModuleNotFoundError: No module named 'rotary_embedding_torch'

Medium

In this tutorial we are going to generate speech from text with our own voice.

static granite
sleek harbor
wooden sail
#

ubuntu. i don't use tensorflow, but jax works well with gpu for me

errant spear
#

Would anyone be able to explain to me exactly how backpropagation through time works in an LSTM? I’ve been trying to understand, but can’t seem to grasp how it actually works.

#

Preferably with an example like stock market data, using RMSE.

left tartan
# tidal bough oof, it's about as fast as the python solution. which, like, makes sense because...

maybe: ```py

@nb.njit(parallel=True)
def separate_elems_numba1(cloud: np.ndarray, k: int, indices: np.ndarray):
gc = [np.sum(indices == i) for i in range(k)]

lists = [np.empty((gc[j], cloud.shape[1])) for j in range(k)]

# Track the next index to insert into the lists[i] array, to avoid append
last_idxs = np.zeros(k, dtype=np.int32)

for i in nb.prange(len(indices)):
    idx = indices[i]
    last_idx = last_idxs[idx]
    lists[idx][last_idx] = cloud[i]
    last_idxs[idx] += 1

return lists
tidal bough
#

hmm, does this work correctly? these parallel operations on the same variable concern me.

left tartan
#

Correctly wasn’t part of the requirements 😉

#

Yah, there’s (most likely) a race for the last_index where two iterations get the same last_idx.

left tartan
wooden sail
#

if correctness isn't needed, then we could just [] lol

tidal bough
#

gc = [np.sum(indices == i) for i in range(k)]
can be a np.unique btw, with handling for the case where there's no occurences of one of the groups

uneven bronze
#

can anyone tell me the 2 types of ai and what is the diffrence

dreamy isle
#

i'm gonna write this as a C extension tomorrow

past meteor
#

The lists could be arrays

past meteor
grim musk
#

nice

#

is it possible to make a NOR gate by combining two perceptrons?

#

because I tried this

def modified_perceptron(input1, input2, input3, expected_output):
    output_perceptron = input1 * weights[0] + bias * weights[3] + (input2 * weights[1] + input3 * weights[2] + bias * weights[3]) * weights[4] + bias * weights[5]
    if output_perceptron > 0:
        output_perceptron = 1
    else:
        output_perceptron = 0
    error = expected_output - output_perceptron
    weights[0] += error * input1 * learning_rate
    weights[1] += error * input2 * learning_rate
    weights[2] += error * input3 * learning_rate
    weights[3] += error * bias * learning_rate
    weights[4] += error * (input2 * weights[1] + input3 * weights[2] + bias * weights[3]) * learning_rate
    weights[5] += error * bias * learning_rate
    return input1, input2, input3, output_perceptron, weights
sleek harbor
past meteor
#

Work: Windows + Ubuntu SSH or WSL (Ubuntu)

Home: daily driver laptop Fedora. Desktop Windows.

#

I learnt a lot of Linux through WSL on my laptop and work because all of my development is on remote so when I switched it was really painless / easy.

The main advantage for me was that I always get tripped up when using CMD/PowerShell, some things are easier to install on Linux and some things don't even run (e.g., gunicorn)

sleek harbor
#

Yeah, I was mostly interested about WSL. As actual Linux distros I'm interested in OpenSUSE (vs GeckoLinux, Fedora), Rocky Linux, and recently have been seeing a lot about NixOS. But for now I'm just interested in WSL working with as little bugs as possible (and I've encountered a few problems with Debian, which I tried before Ubuntu - don't know if it was due to my inexperience or actual bugs)

sleek harbor
coral field
#

can OpenFace (github) classify facial emotions, or can it only detect where a face is on an image/video?

uneven bronze
#

What is the difference between ai software developer and machine learning engineer

serene scaffold
uneven bronze
serene scaffold
uneven bronze
#

Wait really so there is nothing as such

#

I got it from chat GPT

serene scaffold
uneven bronze
serene scaffold
uneven bronze
serene scaffold
wooden sail
#

smh stelercus, this is what my tax money is going to

uneven bronze
wooden sail
#

(i'm joking)

serene scaffold
#

My actual job is taxpayer funded

#

@uneven bronze what is your reason for asking? To decide on a career path?

uneven bronze
uneven bronze
serene scaffold
#

Anyway, if you're interested in AI, be sure to take as many AI related courses as an undergraduate as you can.

#

Look for internships that are related to AI. And be prepared to probably do grad school

uneven bronze
uneven bronze
uneven bronze
serene scaffold
uneven bronze
serene scaffold
#

Because if you want to look at salaries on Glassdoor, you should search a few different job titles.

left tartan
uneven bronze
past meteor
#

I'm not a guru in this stuff either. I just need something that works and both do for me 🤣

#

I even think that the windows + WSL set-up is perfect for data science. I wanted to try something else for fun. After switching idk if it's demonstrably better or not than just doing that. You're fine either way 🤷

unique viper
#

hey anyone know if there's a vectorized way to broadcast but for every combination on an axis with numpy:

a = np.array([[1,2,3,4], [5,6,7,8]])
b = np.array([[11,12,13,14],[15,16,17,18], [21,22,23,24]])

# broadcast but for every combination of rows
b - a
# Desired result:
[[10,10,10,10],[6,6,6,6],[14,14,14,14],[10,10,10,10],[20,20,20,20],[16,16,16,16]]
tidal bough
arctic wedgeBOT
#

@tidal bough :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | [[[10 10 10 10]
002 |   [14 14 14 14]
003 |   [20 20 20 20]]
004 | 
005 |  [[ 6  6  6  6]
006 |   [10 10 10 10]
007 |   [16 16 16 16]]]
unique viper
#

ah perfect thanks!

left tartan
# tidal bough hmm, does this work correctly? these parallel operations on the same variable co...

Now with 50% fewer race conditions (edit: updated to initialize the typed Dict, edit2: slightly different dict construction using comprehension): ```py
import numpy as np
import numba as nb

k = 16
d = 3
N = 10**7
cloud = np.random.random((N, d))
indices = np.random.randint(0, k, N)

def separate_elems_numpy(cloud: np.ndarray, k: int, indices: np.ndarray):
return [cloud[indices==i] for i in range(k)]

@nb.njit(parallel=True)
def separate_elems_numba(cloud: np.ndarray, k: int, indices: np.ndarray):
e = np.empty((0, cloud.shape[1]), dtype=np.float64)
result = {
i: e
for i in range(k)
}
for i in nb.prange(k):
mask = indices == i
group = cloud[mask]
result[i] = group

return result

%timeit separate_elems_numba(cloud, k, indices)
%timeit separate_elems_numpy(cloud, k, indices)
vnumba = separate_elems_numba(cloud, k, indices)

vnumpy = separate_elems_numpy(cloud, k, indices)

for i in range(k):
print(np.all(vnumba[i] == vnumpy[i]))```

glacial spoke
#

Does anyone know of a bookclub community for python data science similar to the R for Data Science https://www.rfordatasci.com/ community? Basically these are small weekly reading groups who meet weekly on zoom to work through a particular data science / R programming book. They use slack to coordinate, with several bookclubs going concurrently. I have searched but have not been able to locate a python focused community like it! (If you are interested in this, here is a link to the slack: http://r4ds.io/join)

umbral charm
#

is there any good Pandas course / youtube video

#

For data stuff

serene scaffold
#

!resources data science

arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

serene scaffold
umbral charm
desert oar
quartz ivy
quick fable
#

Aws

#

Is there any private ml notebooks like google colab functionality but don't get your data exist ?

#

Make website through react , deploy it on cloud in aws server. You will have to pay for cloud though

white flint
#

does anyone know anything about CNN's?

potent sky
#

Seems exciting

lapis sequoia
#

I have question why python AI really long

potent sky
main sonnet
#

I'm working on a problem statement in which I need to predict the amount of bets one has made using poker chips.

#

An idea is to first treat it as an object detection problem statement where I localize and classify the color of the chip stack (chips may be vertically stacked).

#

Each detected region of interest will contain [1-N] number of chips of the same color.

#

Once that is done, I can create two ML models in post-processing. The first model will be a regression model to predict the number of chips in the bounding box using the (width, height, mid_x, mid_y) coordinates. The second model will be a classification model that determines the player who has thrown the chips into the pot using Polar coordinate from a fixed point.

#

I need suggestions on the camera position and camera's quality.

#

PS: Sorry for the lengthy description of the approach. I would also like suggestions on the approach as well.

past meteor
#

Disk space bottle neck

potent sky
#

but well if it has to support all of them, they will need to be dependencies right
and rn the dependency list is rather short
just tf, torch and jax basically

past meteor
#

Sure but not having to install Tensorflow is in the works. In the future it'll probably be something likepip install keras[Torch] which won't requrie 2GB of tf

potent sky
#

ah fantastic

past meteor
#

Hadn't heard of it 😮

night prawn
#

i want to train my own tts with this code https://paste.pythondiscord.com/YHWQ (i want to use css10 in French) but it returns me this error message : Traceback (most recent call last):
File "/home/cecilien/python/TTS je pense.py", line 58, in <module>
train_samples, eval_samples = load_tts_samples(
File "/home/cecilien/miniconda3/envs/tf/lib/python3.9/site-packages/TTS/tts/datasets/init.py", line 123, in load_tts_samples
meta_data_train = add_extra_keys(meta_data_train, language, dataset_name)
File "/home/cecilien/miniconda3/envs/tf/lib/python3.9/site-packages/TTS/tts/datasets/init.py", line 64, in add_extra_keys
relfilepath = os.path.splitext(os.path.relpath(item["audio_file"], item["root_path"]))[0]
KeyError: 'root_path'

wooden forge
#

Visualizing correlation between hyper-parameters and metrics for Neural Network

I am working with a neural network and I want to investigate how different settings affect the loss and standard deviation of the network. I can change various parameters such as the loss function, learning rate, epochs, batch size, threshold value for loss calculation, number of hidden layers, and specific parameters like beta for SmoothL1Loss or num_harmonic for HarmonicFunctionLoss. I have a CSV file for each loss function I use, where the columns correspond to the settings mentioned above. I want to use matplotlib to represent how these settings influence the standard deviation and loss. However, if I fix one parameter, there is a chance that others may change between different trainings. Therefore, I am looking for a good representation to see if there is any correlation between the settings I input and the metrics I use to define the precision of my network. What is the best way to visualize this relationship?

For a little bit of context, I am working on a network using pytorch to recognize line orientation in images. And I am trying to find the optimal parameters to improve the performance of the network. The standard deviation helps me get a much more precise idea on the performance.

wooden forge
#

I thought about doing something like this

past meteor
# wooden forge ## Visualizing correlation between hyper-parameters and metrics for Neural Netwo...

Parallel coordinates are a common way of visualizing and analyzing high-dimensional datasets.
To show a set of points in an n-dimensional space, a backdrop is drawn consisting of n parallel lines, typically vertical and equally spaced. A point in n-dimensional space is represented as a polyline with vertices on the parallel axes; the position of...

wooden forge
#

Sounds interesting !

past meteor
#

I think you'd need something like this because each of your parameters are correlated in a potentially non-linear way (so standard correlations make no sense here probably).

The analytics folks at work like this type of chart. Personally I don't like it whatsoever because it takes too long to figure out what it's conveying so YMMV

#

parallel coordinate plots are what are used in MLflow, Optuna and co. for this type of thing.

wooden forge
#

Nice

#

Good to know thanks!

potent sky
#

cold mail ig

#

intriguing stuff though

shadow viper
past meteor
#

I rarely wanted to transpile TF to Torch or vice versa

#

If anything, transpiling MXnet to TF/Torch would be valuable lol. Maybe the deployment stuff is killer? Idk for now I'm happy with what I have but that could also just be classical programmer stockholm syndrome

lapis sequoia
#

Running into a weird issue with NumPy. I'm running a small little simulation, and at one point in computing the energy I compute this non-linear term using a little lambda function:python N = ( lambda u, v: (u + v) * ( 1 - (np.linalg.norm(u, axis=0) ** 2 + np.linalg.norm(v, axis=0) ** 2) / 2 ) / (2 * self.ε**2) ) but when I actually run my simulation I get a warning:```
simulation.py:266: RuntimeWarning: overflow encountered in multiply
lambda u, v: (u + v)

It's not really clear to me how or why this would be causing an overflow, since I'm passing two arrays of shape `(3, 275, 275)`, every element of which has magnitude less than 10. It's also an array of floats, so the whole "overflow" think *really* doesn't make sense to me, since floats don't tend to... well... do that.
boreal gale
potent sky
#

I'll put MXNet in the feedback if I do reply xd

lapis sequoia
#

The lambda function which causes the overflow is on line 238.

past meteor
potent sky
boreal gale
# lapis sequoia Here's an attempt to trim it down as much as I can: <https://paste.rs/rSPCt.py3>...
  • you seem to have pasted twice in your example
  • don't use bare exceptions (your raise Exception and try: ... except: ... are often frowned upon (in fact have made my life much harder since i didn't realise you used it)
  • assuming you fixed the Exception usage, you can use np.seterr(all='raise') to make it raise an exception upon hitting that overflow
  • assuming you made it raise error instead, you can use pdb to inspect exactly what is overflowing.

i don't know what you are trying to do so i can't comment more.
all i can tell you is here are the combination that caused your overflow https://paste.pythondiscord.com/U6WQ

let me know if you have a specific question.

rare quest
#

Hi, I'm getting an error about Series.append, I've read a StackOverflow post
https://stackoverflow.com/questions/76102473/how-to-fix-attributeerror-series-object-has-no-attribute-append
that said it was deprecated, so I used concat. Concat throws the same error AttributeError: 'Series' object has no attribute 'concat'
I asked here, but the help section also suggested I ask in chat:
https://discord.com/channels/267624335836053506/1130588399758233690

series = pd.Series({'a':1,'b':2,'c':3,'d':4,'e':5})
#Adding
series['f'] = 6
new_series = series.concat(pd.Series({'g':7,'h':8,'i':9})) #AttributeError: 'Series' object has no attribute 'concat'
print(series)
print(new_series)```
boreal gale
#

you probably meant pd.concat instead of series.concat

rare quest
#

i want new_series to be series plus the keys g-i

#

if that makes sense

boreal gale
#

!e

import pandas as pd
series = pd.Series({'a':1,'b':2,'c':3,'d':4,'e':5})
#Adding
series['f'] = 6
new_series = pd.concat([series, pd.Series({'g':7,'h':8,'i':9})])
print(series)
print(new_series)
arctic wedgeBOT
#

@boreal gale :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | a    1
002 | b    2
003 | c    3
004 | d    4
005 | e    5
006 | f    6
007 | dtype: int64
008 | a    1
009 | b    2
010 | c    3
011 | d    4
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/47PRVYNTCI4K77HUCULV6XPUTU

rare quest
#

awesome

lapis sequoia
rare quest
#

I was using concat wrong, like you suggested

boreal gale
lapis sequoia
# boreal gale - you seem to have pasted twice in your example - don't use bare exceptions (you...

okay, so fixing all of that gets me this:```
Traceback (most recent call last):
File "simulation.py", line 338, in <module>
simulation.run()
File "simulation.py", line 52, in run
new_m = self.fixed_point_method.fixed_point(self.m, self.Δt)
File "simulation.py", line 301, in fixed_point
np.fft.fft2(Δt * N(previous_candidate, m) + Δt * self.Z),
File "simulation.py", line 271, in <lambda>
lambda u, v: (u + v)
FloatingPointError: overflow encountered in multiply

#

are you then recommending that I use pdb?

boreal gale
lapis sequoia
boreal gale
#

i just usually use the pdb magic in an ipython session

lapis sequoia
#

Ah, it looks like I want a post-mortem.

brave sand
#
Requirement already satisfied: kaggle>=1.3.9 in /usr/local/lib/python3.10/dist-packages (from tf-models-official>=2.5.1->object-detection==0.1) (1.5.15)
Requirement already satisfied: oauth2client in /usr/local/lib/python3.10/dist-packages (from tf-models-official>=2.5.1->object-detection==0.1) (4.1.3)
Requirement already satisfied: opencv-python-headless in /usr/local/lib/python3.10/dist-packages (from tf-models-official>=2.5.1->object-detection==0.1) (4.8.0.74)
Requirement already satisfied: psutil>=5.4.3 in /usr/local/lib/python3.10/dist-packages (from tf-models-official>=2.5.1->object-detection==0.1) (5.9.5)
Requirement already satisfied: py-cpuinfo>=3.3.0 in /usr/local/lib/python3.10/dist-packages (from tf-models-official>=2.5.1->object-detection==0.1) (9.0.0)
Collecting pyyaml<6.0,>=5.1 (from tf-models-official>=2.5.1->object-detection==0.1)
  Using cached PyYAML-5.4.1.tar.gz (175 kB)
  Installing build dependencies ... done
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  Getting requirements to build wheel ... error
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
boreal gale
#

or that!

brave sand
#

any idea what this means?

#

this is for the object detection api

lapis sequoia
#

Ah... u is somehow exceptionally large. On the order of 10^127

#

that seems problematic

wooden forge
#

And my hand

shadow viper
#

can i volunteer to what youre doing?

wooden forge
wooden forge
shadow viper
wooden forge
#

🎷🦐 I actually do better graphic design stuff lmao

shadow viper
# wooden forge Wdym by that

i mean, can i join you in what youre working on?
im no expert but i want to improve my skills so maybe working with someone might help

wooden forge
shadow viper
#

data science, computer vision?

wooden forge
#

Worked on quantum dots calibration using machine learning algorithm. My job was to improve the detection of charge stability zone by reworking how the angles of the lines of the diagrams were calculated

#

Very mysterious and weird title I know haha

shadow viper
shadow viper
#

this is really nice

shadow viper
wooden forge
#

Sure

shadow viper
#

thanks bro, please accept

brave sand
#

how much should someone be paying for a trained model?

rugged rapids
brave sand
rugged rapids
brave sand
rugged rapids
brave sand
#

how much should I be paying someone for an objection detection model for a drone target? i only have 150 images, i just can't do it myself

brave sand
rugged rapids
#

idk then

shrewd wraith
#

anyone here have experience with numerical analysis? i'm having an issue with some code i'm writing for base 10 rounding, and i don't really know if the problem is with the code or my understanding of the techniques behind it - if anyone has any recommended resources that'd be great

desert oar
#

i don't think we have too many users specifically with numerical analysis experience

sand jackal
#

anyone able to help with scikit-learn KDE?

dusk tide
#

Hi guys, I am practicing visulization with plotly and ran into a problem .If you zoom in , you will find that when I hover on the graph it shows the name of the country and ** (trace 3)** . I want to remove this trace 0 but cannot find the way . If anyone knows then please help. I am sending the code of this plot creation `rows = 8

fig = make_subplots(
rows=rows, cols=1,
subplot_titles=("AFC Asian Cup", "African Cup of Nations", "Confederations Cup", "Copa América",'FIFA World Cup',"King's Cup",'UEFA Nations League','UEFA Euro')
)
cups = ["AFC Asian Cup", "African Cup of Nations", "Confederations Cup", "Copa América",'FIFA World Cup',"King's Cup",'UEFA Nations League','UEFA Euro']

plot the top 10 teams to won most number of matches in a tournament

for idx,cup in enumerate(cups):
fig.add_trace(go.Bar(
x=winning_home_teams.loc[cup].sort_values('total_matches_won',ascending=False)[:10].index, y=winning_home_teams.loc[cup].sort_values('total_matches_won',ascending=False)[:10].total_matches_won),
row=idx+1, col=1)

fig.update_layout(height=1100, width=1300,
title_text="Top 10 Home Teams winning most matches in particular tournament",showlegend=False)
fig.show()`

hollow night
#

Hi everyone!
"I have watched a few videos on YouTube, but all of them use Jupyter Notebook as the IDE. Is it necessary to learn NumPy in Jupyter? Do you know of any NumPy tutorial that uses VS Code or PyCharm?"

potent sky
hollow night
#

Thanks DUDE!

sand jackal
#

Anyone do any work with scikit-learn KDE?

karmic veldt
#

Hello everyone
Have someone encountered a problem in pandas where the csv data is well structured and pandas can print the data fully, but when it's sorting values pandas doesn't work and throws errors:

in sort_values
    k = self._get_label_or_level_values(by, axis=axis)
_get_label_or_level_values
    raise KeyError(key)
KeyError: 'general_average'

'general_average' column containts floats.

tidal bough
#

if you can, then this is strange indeed.

karmic veldt
#

Can i send you the data and test ?

#

here's the code overview along with errors after running

boreal gale
karmic veldt
obsidian trench
#

How does the .apply() function work in the context of df.apply(lambda x: x**2)

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['a', 'b', 'c'])

The lambda function lambda x: x**2 is applied to the first column, which is represented by the Series x.
The elements in the first column, let's say column "A", are squared. For example, if the column "A" has values [1, 2, 3], squaring each element results in [1, 4, 9].
The resulting Series represents the squared values of the first column.

The lambda function lambda x: x**2 is applied to the second column, which is represented by the Series x.
    The elements in the second column, let's say column "B", are squared. For example, if the column "B" has values [4, 5, 6], squaring each element results in [16, 25, 36].
    The resulting Series represents the squared values of the second column.

The resulting Series from each column operation is concatenated back together to form a new DataFrame.
    The resulting DataFrame has the same shape as the original DataFrame, where each element has been squared.

Is the intuition correct how the apply function works

unique flame
#

How do I only keep parts of a dataframe in Dash? So I have a data frame in which one column has either True or False. In Dash app, I want the user to be able to see only dataframe with true, false or just both. I tried with radio items and dropdown option in Dash, but going in the dark when using the callback.

shadow ridge
#

I have some bicycling GPS, power meter data from and I am researching the relationship between speed(m/s) and power (watts). Data point every 1sec. I am smoothing the data with a rolling(30)
If I plot x=speed, y=power, the slope is mostly negative, increasing power==decreasing speed, unexpected!

  • I would like to view only the data when the slop is positive but not filter the negative slope data. More like, make the negative slope points invisible. How can I do this?
  • When I try to plot the positive only slope, I still get mostly negative. I think my problem is mostly in the way I am plotting.

Here is what I have.
df[['distance', 'speed', 'power']].head(10)

2022-02-20 17:42:22+00:00,0.00,NaN,NaN
2022-02-20 17:42:24+00:00,3.89,5.440,155.0
2022-02-20 17:42:25+00:00,6.88,5.440,156.0
2022-02-20 17:42:26+00:00,10.39,5.813,397.0
2022-02-20 17:42:27+00:00,18.97,5.813,271.0
2022-02-20 17:42:28+00:00,28.99,5.934,271.0

df.rolling(30).mean().plot(x="speed", y="power")
plot 1

df['slope'] = (df['power'].rolling(30).mean() / df['speed'].rolling(30).mean()).diff()
df[df['slope'] > 0].rolling(30).mean().plot(x="speed", y="power")

plot 2

devout sail
devout sail
shadow ridge
#

I know these plots look like a mess. What's unexpected is that there is a lot of negative slope which implies an inverse relationship between power and speed. The data is sorted by timestamp. I should use a plot that maybe colors the points by time so that is more clear the direction of time along the line.

boreal gale
#

how did you get the power in watts?
could you provide a sample of your data?
is

df['slope'] = (df['power'].rolling(30).mean() / df['speed'].rolling(30).mean()).diff()
df[df['slope'] > 0].rolling(30).mean().plot(x="speed", y="power")

all you have done thus far?

rugged rapids
#

meta just open sourced llamav2

agile cobalt
#

👀

#
true scaffold
#

Hi all, i was trying to run falcon-7b-instruct llm on my local machine (RTX 3060 12GB VRAM, 16GB RAM), but the model itself is around ~13GB, so i'm getting the famous torch.cuda.CudaOutOfMemory error for a small amount of space, is there anyway to run this model locally with my current config? I've heard of accelerate & bitsandbytes libraries, can they help me achieve the same?

If this is not possible, can you atleast point me to some other open-source llms, with like 3-5B params?

Thanks.

weak lagoon
#

Hello. Few days ago I asked about using a ML algo for predicting text. The code executed with 90% accuracy. To set the context, a sample query looks like this:

$startdate='20230301';$starttime='06:40:13';$verb='retrieve';$version='20230105';$application='mars';$class='od';$type='an';$stream='oper';$expver='0001';$retdate='20230228';$age='1';$nbdates='1';$reqno='6';$fields='4';$database='fdb';$bytes='41218640';$written='6014840';$interpolated='4';$writetarget='0';$cpu='0';$elapsed='0';$status='ok';$stopdate='20230301';$stoptime='06:40:14';$user='e487dfc54c';$category='basic|boundary_conditions|esa|valid_forecast';$account='b892ca6621';$abc='b892ca6621';$environment='batch';$date='20230228';$time='0000|0600|1200|1800';$step='00';$domain='g';$fieldset='sst';$resol='auto';$grid='0.25|0.25';

However the predict text looked like this:
43 228011 228012 228013 228014 243 244 245 229 230 231 232 213 212 8 9 228089 228090 228001 260121 260123 003020 228029 228251 228216 228217 228218 228219 228220 228221 260015 228050 151132' date '20201217' time '0000' step '126 127 128 129 130 131 132' anoffset '9' domain 'g' password '0860048d32' rdatabase 'fdb' startdate '20230301' starttime '00 00 20' verb 'retrieve' version '20230206' application 'mars' class 'rd' type 'an' stream 'oper' expver 'hy2w' retdate '20221222' age '69' nbdates '1' reqno '1' interpolated '0' cpu '0' elapsed '0' status 'fail' reason 'expected 288 got 0 request failed' stopdate '20230301' stoptime '00

While it is producing somewhat relevant output but most of the content is gibberish. I was wondering that with 90% accuracy why is the ml algo producing this output?

Appreciate you help in disecting this

small wedge
# true scaffold Hi all, i was trying to run `falcon-7b-instruct` llm on my local machine (RTX 30...

https://huggingface.co/TheBloke/LLaMA-7b-GPTQ have you looked into quantized models?

GPTQ is an algorithm that was dropped that allows quantizing (using a much lower float precision like 8, 4, 3, 2 etc to conserve memory with a minimal loss in preformance) general large language models, so you could use just google any model on this list with "GPTQ" and you're pretty likely to find something https://github.com/eugeneyan/open-llms

#

I saw one for falcon-7b-instruct even that was quantized to 4 bits but the hugging face repo said it was experimental and very slow

small wedge
flint flax
#

hey this is really petty but im curious, how do i make pyplot return images with a perfectly square resolution? i can set the aspect ratio to equal but there's a slight difference between the left and right side if i add axis labels and its driving me crazy

plucky relic
#

I am entry level into my Data Science position. So forgive me for having basic programming skills. I am in the process of data cleaning a CSV file. One of the Date columns is in DayHourMinTimeZoneMonthYear, I need put the Time first followed by DayMonthYear. I’ve tried writing plenty of scripts myself for something so basic and I cannot for the life of me, get this working. Does anyone out there have a sample script or a script I could use to accomplish this task? I don’t have a mentor to lean on over here and I cannot figure this out. Thank you to anyone who can help or mentor.

desert oar
sullen sage
#

use an ai data analyst

desert oar
sullen sage
#

code interpreter

desert oar
#

it's not a bad learning tool

plucky relic
#

I can use gpt at work unfortunately. They blocked it

desert oar
sullen sage
#

plus privacy issues

plucky relic
#

I bet my column is a string and that’s why it won’t recognize the date time format

plucky relic
#

Crap I feel like a idiot

desert oar
#

@sullen sage it's a good habit to learn how to create representative minimal examples anyway. so you can pass those to chatgpt instead of your actual work data

desert oar
plucky relic
#

Thanks y’all, I just left work. I’ll play with it tomorrow. This discord has help so much with the minor details. I definitely appreciate all the advice.

sick ember
#

Hello everyone, my model seems to experience difficulties in recognizing seizure frequency patterns, I trained it on 10 second eeg recording(2500 data points of raw eeg amplitude over 20 seconds )

#

Is there anything I can do to improve results

#

Like it seems to be able to recognize amplitude, but I terms of frequency over time(seizures have very high and rapid frequency with consistent amplitude over time) it seems to have trouble

#

Should Considering training it on power spectrum graphs rather than raw eeg recording?

serene scaffold
#

I hate strings that are formatted as timestamps (as opposed to actual timestamp types)
all my hombies hate strings that are formatted as timestamps (as opposed to actual timestamp types)

past meteor
#

unix timestamp or cry

grand quarry
#

Hey just a simple question, I'm thinking if neural network will learn the order of the best to worst. (will it extrapolate well?)
If I have the following player vs player data points:

player1 vs player2 -> player1 wins
player2 vs player3 -> player2 wins

So by logic player 1 is better than player 3, but will a simple neural network learn that correctly?

player1 vs player3 -> player1 wins

past meteor
#

I had 3 data sources at work that used different methods of encoding time => road to crying

delicate apex
serene scaffold
grand quarry
serene scaffold
#

if the problem is actually "(A, B) beats (B, C)", "(C, B, D) beats (A)", that is a different problem.

civic void
#

Anyone here have any experience with doing First Order Logic with Python? And: Can I use Python for all the same tasks as one uses Prolog?

serene scaffold
civic void
#

Yes

#

Or, well, its based on this

serene scaffold
civic void
#

Im taking a class called Knowledge Representation and Reasoning

grand quarry
#

But more letters & I have a lot of data to train the model

unique escarp
#

curious if anyone knows of a good data engineering server?

serene scaffold
civic void
#

Which is done in python

#

Just wondering if using Python for this is better/as good as using Prolog

serene scaffold
civic void
#

Prolog is a logic programming language associated with artificial intelligence and computational linguistics.Prolog has its roots in first-order logic, a formal logic, and unlike many other programming languages, Prolog is intended primarily as a declarative programming language: the program logic is expressed in terms of relations, represented ...

serene scaffold
#

I won't be able to tell you if Python is better suited for this than Prolog if I haven't used Prolog.

#

is your instructor allowing you to pick?

#

"Prolog is a logic programming language associated with ... computational linguistics"
I am a computational linguist, and I've never heard of it, so learning Python would be a better investment of your time.

civic void
#

Idk, course hasn’t started (starts mid august). Just trying to get ahead. Read 3 chapters; book mentions Prolog which makes me guess we’re gonna use it. But, I love python and would rather learn more Python than a new, niche language

serene scaffold
serene scaffold
serene scaffold
serene scaffold
grand quarry
#

But i got it now, you helped me already, thank you

desert oar
#

however there might be a python library that implements a prolog-like logic programming system

#

you're almost certainly better off using prolog for logic programming compared to python

#

if you really want to use python, i'm not sure that such an engine exists as a python library specifically. however there might be C or C++ libraries that implement first-order logic which you can call from python somewhat easily. or there might be an embeddable prolog implementation that you can use inside python.

#

that should be usable from ctypes, cffi, or cython

civic void
desert oar
#

i'm sure that a set of high-level python bindings to swipl would be an interesting open-source contribution. but that would also be a lot more work than you would want to take on for a school assignment, unless it's a master's thesis

desert oar
# civic void Never heard of any of these things…
  • ctypes is a built-in way for python to access functions in C libraries as well as work with C data types
  • cffi is a 3rd-party library that does the same, but in a nicer interface
  • cython is a superset of python that compiles to C code, and the resulting compiled library can be imported as a python module
#

SWIG would be another way to call C or C++ libraries from python. it generates C and python code that allows you to call the library

civic void
#

Very interesting; will definately look into this

#

Thanks for a long, detailed reply

desert oar
#

but anyway this is quite a big yak to shave if you just want to do your school assignments in python instead of swi or scryer or whatever

civic void
#

Okok, gotcha (i think…)

#

So, one conclusion to draw from this:
Prolog is not old and useless; it’s actually a good language to learn?

desert oar
# civic void So, one conclusion to draw from this: Prolog is not old and useless; it’s actual...

it's not something you're likely to use at a typical programming job. but if you're interested in the craft of computer programming and/or logical reasoning, it will expand your horizons and provide a lot of interesting opportunities for hobby work, and maybe a couple of niche job opportunities.

if you want to learn it, check out https://www.metalevel.at/ and specifically their book https://www.metalevel.at/prolog + its associated youtube channel https://www.youtube.com/@ThePowerOfProlog

#

as for ai and nlp specifically, prolog has indeed turned out to be kind of a dead end for the time being

#

one of the big complaints about prolog in practice is that you end up having to worry about the implementation details of how the solution search algorithm works

#

some people don't mind that. but it's not a magic tool.

#

if you're interested even more in logic programming, check out minikanren, which is a different approach to logic programming. it's mostly a research toy though, it's not something people actually use that i know of.

#

what might be interesting is if you can get an llm to write and interact with prolog code, the way you can get them to use web searches and things like that

#

so the model itself doesn't need to be great at logical reasoning, it can write and execute its own prolog scripts

civic void
#

Thanks again for loads of interesting info!!

true scaffold
#

Now i might go with 8bit quantisation, do you think it will work on my machine? (16gb ram, 12gb vram rtx 3060, falcon7b-instruct ~13gb)

small wedge
#

I assume the regular falcon-7b-instruct model is 16 bit fp so yeah 8 bit quantization should cut that in half.

past meteor
keen gust
robust jungle
#

I'm trying to find the average position of all positive points in a binary image using cv2, what is the most efficient way to do this?

hasty grail
#

!e

import numpy as np

img = np.random.randint(2, size=(8, 8))
print(img)

pts = np.argwhere(img)
print(pts)

avg = pts.mean(axis=0)
print(avg)
arctic wedgeBOT
#

@hasty grail :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | [[0 1 1 1 1 1 1 0]
002 |  [0 0 1 1 1 1 1 1]
003 |  [1 1 1 0 1 0 0 1]
004 |  [1 1 0 1 0 0 0 1]
005 |  [0 0 0 0 0 0 0 0]
006 |  [0 1 1 1 1 1 0 0]
007 |  [0 0 1 0 0 0 0 1]
008 |  [0 1 0 0 1 1 1 0]]
009 | [[0 1]
010 |  [0 2]
011 |  [0 3]
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/JO2NVMSE2AW4VUKKYZQKKCRTSE

severe topaz
#

i was stalked, robbed, "assualted" and harrassed by several people... i tried telling the police but it seems their attention is inclined to arresting people who lack mommy and daddies political connections. i was thinking of looking into worm gpt/poison gpt for ethical purposes.

#

where may i download worm gpt repository?

#

wihout paying a monthly subscription?

robust jungle
#

rules 5 and 8

stark mulch
severe topaz
#

what do you suggest doing then?

agile cobalt
#

police / government organisations / law enforcement etc

worthy laurel
#

possibly somewhat out of scope, are there any efforts to selectively navigate/search/scrape similar contents on different websites?

like for example i want to be able to search for lamps then scrape the title desc price for each result

tailoring such a scraper for one specific website is easy enough but i’d like something i can just feed a list of websites into and avoid manually finding the correct selectors for each site

mortal sequoia
#

Hello! I am doing my first challenge for classes and can't seem to get my csv file to open in vscode so that I can write a script showing my different values can somebody assist?

worldly dawn
desert oar
#

it's like object recognition in images but for chunks of html

worldly dawn
desert oar
worldly dawn
#

if we want to take it that far, they may want to look into entity extraction and stuff.
But tbh, it's faster to make sure metadata is extracted

hollow night
#

I have been learning data science in Python for some time now. Recently, I started exploring the Numpy library through YouTube tutorials. However, I have never used Jupyter Notebook before, and most of the tutorials available online are demonstrated in Jupyter Notebook.

I find the method of writing and printing code in Jupyter Notebook different from what I am used to. As a result, I am having some trouble understanding it.

Is there any place or resource where I can learn Numpy specifically using the PyCharm IDE? I believe learning it in my familiar IDE will make the process smoother and help me grasp the concepts better.

desert oar
#

the code is the same of course. you might also be interested in Jupytext which creates "cells" in plain .py files by delimiting them with specially formatted comments, but can still be rendered to HTML https://jupytext.readthedocs.io/en/latest/

#

heavily inspired by a system in the R language called R Markdown and a long lineage of similar systems dating back many years called Weave

serene scaffold
hollow night
#

Thanks

hot hazel
#

I fear that ai in the next 20 years will replace us not now now its not that good but in 20 years its definitely replacing us any thoughts?

sick ember
#

Hello all, I’m having troubles with configuring my training data, with the best data type to feed into my training model that do binary classification of seizure and no seizure on EEG recording, what are the pros and cons of feeding my model samples of raw eeg amplitude vs time, power spectrums of the samples, and Mel spectrogram of the samples?

serene scaffold
#

because "AI replaces humans", without any other specification, means "there being no more humans, and there instead being AIs"

#

like, no more humans in the world.

molten hamlet
#

im facing problem with qvals being toooooo weird, im doing DeepQN, and one action is always greater in value, but there are negative rewards to train, so im confused 😐

nocturne spruce
#

guys can anyone basing on his experience could guide me a bit how to jump into this ai/ml field? I've read that i need solid math foundation so now im taking these lessons on the khan academy. Should i be getting acquainted with the python libraries like pytorch sci kit etc or just focus on math firstly? If anyone can give me an advice i would be glad

serene scaffold
# nocturne spruce guys can anyone basing on his experience could guide me a bit how to jump into t...

keep in mind that no matter how much self-study you do, you still need a university degree that is relevant to AI to be taken seriously by employers in this space.

as far as the math goes, I would start with prob/stat and linear algebra.

while being able to use the different python libraries is important, you do not want to learn in terms of the libraries. if you are "learning pytorch" or "learning sklearn", you are doing it wrong. you want to learn how to do different things like "train a logistic regression classifier on housing data", which could involve learning and using specific parts of several libraries.

hot hazel
serene scaffold
serene scaffold
# hot hazel How

I don't have time to go into all of it, but it's also about deciding what to code.

nocturne spruce
serene scaffold
#

is "automation systems" a concentration within computer science?

sullen sage
#

Man I bet these jobs are super difficult to do once you get hired with all these requirements.

hot hazel
#

And what are these other stuff

sullen sage
#

What does a Lead NLP Data Scientist do?

#

I read the description and it still doesn't make sense just says phd basically pays a lot though I want that job

nocturne spruce
serene scaffold
rancid mango
#

Hii, with regression is it correct that it predicts values of continous data?

sullen sage
#

Oh I just learned about that man I forget though

hot hazel
sullen sage
#

I think it's a very simple model and it isn't continous

serene scaffold
#

@hot hazel this channel isn't a place to engage in doomerist speculation. ultimately, no one knows what the demand for developrs will be in 20 years.

hot hazel
nocturne spruce
#

Do you think that i might be able to land my first ai job/intern just with bachelor degree but in the meantime focusing on my own portfolio? Or master/phd currently is required

serene scaffold
sullen sage
#

Master phd required wtf

serene scaffold
rancid mango
#

at what age did you start learning machine learning stuff anyone

serene scaffold
rancid mango
#

niceee

serene scaffold
sullen sage
#

I like to start big and work down

serene scaffold
#

well that's not going to work for job seeking.

sullen sage
#

I was going to use it for a case study for my certification project

civic elm
#

are there other numpy libraries on other languages?

nocturne spruce
# serene scaffold you can get AI related internships as an undergrad (my department has some right...

yeah got it man. Highly appreciate your response, im getting into this ai field and it does kinda intrigue me. Let's see how long it will last. I will be focusing on my first year university on learning math, ml frameworks and trying to build some simple models to link them to my github. But read some articles where people were saying that phd was required and im not sure if i wanna spend around 8 years still studying where others might be starting making money and developing their career path in IT in the same time. But if you said its possible after bachelor i will be grinding hard 😄

serene scaffold
merry ridge
#

There is a whole gradient of useful applied topics 0 to AI. It's great to have an end goal, but I would caution against how practical it is to learn that much mathematics and computer science in only the span of a bachelors.

civic elm
#

MS degree have no time limit right? I can take it for years?

#

I don't have much resources

#

mostly time and money

nocturne spruce
merry ridge
#

Any reasonable institution is going to have a time limit. It'll be flexible but you are not going to be allowed to take 10 years barring something exceptional

civic elm
#

or maybe I'll get 1 year into ms and start applying as an undergrad?

serene scaffold
civic elm
#

requirements by you mean the units taken?

serene scaffold
#

"units" is not a term used by universities in America, so idk what you're referring to.

civic elm
#

could it be credits?

serene scaffold
#

more like, they might remove a course that you've taken from the requirements, and add a course that you didn't take

#

"course" might be what you call "module"

tepid tartan
#

Quick question, Is Google data certification science/data analytics worth taking it?

civic elm
serene scaffold
tepid tartan
# serene scaffold No

What do you recommend, trying to get into that field in data science/analysis before I taken a data courses and 1.5 year to graduate

serene scaffold
sullen sage
#

How come when I trained a LSTM with 1 layer and dense output layer to predict btc price for 1 day on 45k data samples of btc price history it only took 30 seconds to train? shouldn't it take MUCH longer? When training epochs it showed step loss (if i said that right)

tepid tartan
serene scaffold
tepid tartan
#

got it. I do have got ML major course that mostly likely be taken in the spring semester. I can post what I got in my electives.

civic elm
#

what about the math though?

tepid tartan
#

I was thinking of getting a certifications in data. Then start project and also easily pass those courses as well

tepid tartan
civic elm
#

I'm thinking of taking Tensors and Calculus

#

will the MS will have lots of tensor math and calculus?

nocturne spruce
#

guys have any of you finished Machine Learning Specialization course by andrew ng?

merry ridge
#

What do you mean by taking tensors. It’s a topic usually introduced in a more advanced math course, it’s not an area of math like Calculus.

nocturne spruce
civic elm
nocturne spruce
civic elm
civic elm
civic elm
merry ridge
#

I don’t know what kind of answer you are expecting other than that you prepare for mathematics by studying mathematics. In my opinion you should at least get to the level of linear algebra fluency that eigenvalues are not a topic that blows you away.

civic elm
#

yeah so, eigenvalues it took me quite a while to be comfortable with them so that is why I'm a bit scared on what lies ahead

unkempt wedge
#

I am messing around with the Fourier Transform and some data from a network device (bits per second sampled on a consistent interval). My sample spans 2 weeks, and if I am interpreting this correctly, the largest component frequency of the DFT is 336 hours (aka 2 weeks). Is the fundamental frequency typically the same period as the sample, or have I messed up?

arctic crown
#
    poly_features = PolynomialFeatures(degree=3)  # You can adjust the degree as needed
    X_poly = poly_features.fit_transform(features)

    # Splitting the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X_poly, target, test_size=0.2, random_state=42)

    # Step 5: Model Training and Evaluation
    model = LinearRegression()
    model.fit(X_train, y_train)```

can someone please explain why we are using LinearRegression() here? even tho I intend to use polynomial regression
merry ridge
#

A polynomial regression model is linear in the polynomial's coefficients. PolynomialFeatures essentially converts the data into a format you can use linear regression on.

arctic crown
merry ridge
#

Sure I guess? It's just the same linear algebra after a certain point. It's just over the vector space of polynomials

arctic crown
#

@merry ridge do you mind if i dm you?

merry ridge
#

I'm not really a good person to talk to. 95% of my work is in Mathematics at this point. I haven't even pushed anything to my github is almost 15 months.

arctic crown
deft sinew
#
X = health.iloc[:,:-1]
y = health.iloc[:,-1]

Could somebody explain how iloc works with the code in the brackets. I assume the code is a slice but I dont understand it

left tartan
#

You have two axes, rows and columns. So the first is your row axes slice, and second is your column axes slice

serene scaffold
eternal canyon
serene scaffold
deft sinew
#

Thanks

#

but iloc is by index loc is value

mint palm
#

For a HIGHLY CARDINAL DATASET
When does it make sense to use Hashing Encoder followed by dense layer as a replacement of embedding layer before GRU/LSTM?

#

or is it even justified?

lapis sequoia
#

Hello everyone, I'm working on a multitask model that's good on predicting if something is there or not (classification) and regressing the bounding box of that class. I have 25 classes and I have 5 neurons on my last layer for each class (1 for classification and 4 for the bounding box) so 125 neurons. I'm trying to add an auxiliary task to predict some angle (between different objects calculated with centers etc). I added another neuron that predicts the angle. The target angle is in radius. Which loss function should I use for this task? It doesn't make sens to use MSE since it doesn't take into account the circular property so 360° would be far from -1. I tried to use 1 - torch.cos(angle - target_angle).mean() but i'm not getting good results and the loss is always too small for some reason.
Thank you in advance for helping me.

mint palm
#

how about this

lapis sequoia
#

true-pred without abs?

#

i can try, i don't have the intuition behind this

#

thank you very much

mint palm
lapis sequoia
#

its in radius, the true values are between -pi/2 and pi/2

#

and my prediction since i didnt clip them can be any value

mint palm
lapis sequoia
#

i don't know which objects you are referring to, i have multiple bounding boxes and i take the center of 4 of them and calculate the angle between the lines created by (center1 center 2) and (center3 center4)

#

and the model needs to predict that angle

mint palm
#

if lines were not distinct, both would be 30 degree/radian whatever

#

because lines are different color, second angle become negative as angle is measured red to blue
I think you should consider this

lapis sequoia
#

i'm not sure i'm understanding this right

#

sorry

mint palm
#

what i mean is, if instead of identifying 4 areas in image as c1, c2,c3,c4 respectively,
if your model identifies same areas as c2,c1, c4,c3. will it be a problem?

lapis sequoia
#

no

mint palm
#

but then angle would change:
heres an example

lapis sequoia
#

because i'm not calculating the angle from what he identifies, i'm asking him to give me the value directly and that i will be comparing to the right value that is calculated from the right c1 c2 c3 c4 but the predicted angle is not calculated from the predicted c1 c2 c3 c4

mint palm
#

i mean can you give the names to them/ call them something? like TV, edge etc

lapis sequoia
#

yes, so i have images of a heart and the heart has some keypoints and we use that to calculate the angle the heart is making

mint palm
#

m1 and m2 are slope( can be calculated using 2 points we have for each line

lapis sequoia
#

for the true angle, i'm using c1 c2 c3 c4 in the right order
for the model prediction, the model is outputting a float value directly, where does the order of c1 c2 c3 c4 come here?

mint palm
lapis sequoia
#

i think this is besides the point, imagine i have a true angle and a predicted angle what loss function would you use?

mint palm
#

i would need to see dataset, i am unable to conclude some things

lapis sequoia
#

thank you anyway

mint palm
tidal bough
mighty patio
#

I would avoid the angle altogether and instead predict cos(Ang) and sin(Ang) and then normalize to sin²+cos²=1.
This avoids the problem of the angle looping around. I also think it will be easier for the network to train to

tidal bough
lapis sequoia
#

thank you @tidal bough @mighty patio

wanton laurel
#

Getting this
AttributeError: module 'torch.nn' has no attribute 'view'
on this line in Colab
model = Autoencoder()
Can anyone help?

class Autoencoder(nn.Module):
    def __init__(self):
        super(Autoencoder, self).__init__()
        self.encoder = nn.Sequential(
            nn.Conv2d(3, 128, kernel_size=3, stride=2, padding=1),
            nn.ReLU(),
            nn.Conv2d(128, 64, kernel_size=3, stride=2, padding=1),
            nn.ReLU(),
            nn.Flatten(),
        )
        self.decoder = nn.Sequential(
            nn.Linear(64, 128 * 7 * 7),
            nn.ReLU(),
            torch.view((128, 7, 7)),
            nn.ConvTranspose2d(128, 64, kernel_size=3, stride=2, padding=1),
            nn.ReLU(),
            nn.ConvTranspose2d(64, 3, kernel_size=3, stride=2, padding=1),
            nn.Tanh(),
        )
    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return decoded
lapis sequoia
wanton laurel
lapis sequoia
native umbra
#

can LLaMA adapt/learn during usage?

wanton laurel
lapis sequoia
wanton laurel
#

i meant nn view not torch view
this autencoder is supposed to take a gray scale image (CIFAR10) and convert to colour

small wedge
forest lintel
#

How much programming do I need in order to get into data science?

nocturne eagle
#

a lot

#

"data science" is essentially the intersection of math/statistics/analysis and programming. you'll spend most of your time cleaning, munging, transforming, and otherwise wrangling data. but you get paid for the analysis. both are done by programming.

tepid tartan
nocturne eagle
#

if you haven't taken probability and statistics, do so. it's essentially required for "data science". two semesters would be even better.

#

a systems modelling class, if offered at your university, would also be a plus

civic elm
#

How does one actually clean a data?

pale hemlock
tepid tartan
#

Plan on doing prob/stats in spring while discrete in September because I need do to that in order do other computer classes

tepid tartan
nocturne eagle
#

well, there you go then

nocturne eagle
#

if you can imagine something going wrong, some data set somewhere will have it

tepid tartan
nocturne eagle
#

let me give you an example, I once had to deal with stock price data from the London Stock Exchange... we noticed that one stock had the price increase by 100x at some point

#

turns out that the LSE prices in both GBP and pence. and switches for some stocks at random times. the data set was supposed to have already been adjusted for this. but it was not.

#

another was a data set of convertible bonds. many of the prices were not adjusted for corporate actions (splits, etc). I've seen data sets of temperatures where some was in farenheit and some in celsius... but nothing to indicate which was in which.

#

data sets where missing data is noted by various things, ranging from nothing to 0 to -1 to "NULL".

#

the worst is when the different values are used in the same data set... sigh

tidal bough
#

I once had to parse a csv file full of floats, where the column separator was , and the decimal separator was, guess what, also ,.

nocturne eagle
#

lol, damn germans

#

oh, I once had to deal with a data set with fixed with columns... but sometimes, if the data wouldn't fit, it would spill over into the next column. <grrr>

civic elm
#

Sounds like data science has lot's of grindy uncool work

nocturne eagle
#

well, you don't get paid to do the grunt work, it's just part of the job. you get paid for the analysis, conclusions and presentation

#

but you can't effectively clean the data if you don't know what the data should look like. so...

civic elm
#

And i assume coding neural nets is like least priority?

tardy cloak
#

I've completed Statistics, Probability, Linear Algebra and Calculus lectures now I am going to learn Linear Regression but I've not spent time with data cleaning exploring stuff. What should I do?

nocturne eagle
#

there's no direct connection between build AI models and "data science" except in so far as both require data prep pipelines

nocturne eagle
tardy cloak
burnt mountain
#

How much is the effort to migrate cuda training code to amd or intel

#

Is it reliable at all?

tardy cloak
#

I've good exp in Python so I skipped the Data Analysis part

nocturne eagle
tardy cloak
#

Okay but I want to start with ML and DL asap

#

Not with that data cleaning and exploring stuff is it a good approach for a beginner?

nocturne eagle
#

data cleaning can technically be done by any random programmer. the issue is knowing what good data looks like vs bad data

#

to wit, it's not writing the code that takes a long time, it's looking at the data to find problems and figuring out if it can be fixed or not

tardy cloak
#

Yeah it makes sense in real world or jobs. But for exploring and understanding algorithms Can I skip those steps?

nocturne eagle
#

probably

#

unless "exploring" covers things like SQL queries

#

you might want to do that in a class if your SQL is weak

tardy cloak
#

Yeah I dont like SQL I have used ORMs

nocturne eagle
#

have you read Celko's "SQL for Smarties"?

#

SQL is a full, turing complete, programming language. one of the few declarative languages that is in widespread use.

#

you can compress pages of python into a few lines of SQL. and vice versa 🙂 it's a great tool for some jobs and a shit tool for others.

civic elm
tardy cloak
#

No havent read that, but now I am taking SQL more seriously as everyone saying its very crucial in Data Science. I didnt find raw queiries that much of use in Software Dev.

civic elm
#

Like in books courses etc..

nocturne eagle
hollow kettle
tardy cloak
#

Should I focus more on Stats or Calculus?

nocturne eagle
tardy cloak
#

Both?

nocturne eagle
#

yup. or hell, take a stochastic calc class. that's calculus where the variables are probability distributions 🙂

#

mind bendy stuff

tardy cloak
#

Area under curve of Normal Distributions? I know that stuff

#

Its cool

nocturne eagle
#

lol, no

#

that's stats 101

#

you know in normal calc, you might have an integral for x and y over some range?

#

well, instead of being simple scalar variables, in stochastic calc, they are probabilities

tardy cloak
#

oh okay sound great, I better start with ML first then it makes more sense to learn it after that.

nocturne eagle
#

probably. I doubt you'd need stochastic calc for most DS jobs. but if you knew it, you could make everyone else feel inferior 🙂

tardy cloak
tepid tartan
nocturne eagle
#

I can't do anything about your instructor or lesson plans

tepid tartan
#

Personally, that reason why I want to do online certification and project to get more understanding on data science

mild dirge
#

This is a sideview of some terrain. You can see the ground, and in the middle some trees. What would be a good way to fit a line or a set of points to the ground curve. I don't think taking the minimum y value pixel at each x value would be good as there might be some holes in the ground in my data.

nocturne eagle
#

that's a pretty steep hill

nocturne eagle
mild dirge
#

This might be a good example of where the image isn't perfect (occluded on the left)

mild dirge
nocturne eagle
#

so remove the trees

mild dirge
#

They are also not really points, but just pixels

mild dirge
nocturne eagle
#

pixels are just points

#

by hand, of course. the first step to any data analysis is manual scrubbing of the data

mild dirge
#

...

#

The demo data I work with has 128 of these, and the data I will work with later maybe few ten thousands

#

Not an option

nocturne eagle
#

if you really are against doing that because of classwork "rules", then just take the min y value for every x

mild dirge
#

It's not classwork, just personal project for now

nocturne eagle
#

I fail to see why min y value of every x wouldn't work

#

being afraid to to do manual work is a bad sign for "data scientists"

#

but whatever floats your boat

mild dirge
#

If anyone has any suggestion, feel free to ping 🙂

nocturne eagle
#

what's wrong with min y value for every x?

#

hello?

mild dirge
#

You'd get this

nocturne eagle
#

yes, and? then run a least squares regression on the red data

#

guess you could smooth it first with a moving average to remove transient spikes, but I doubt the results would be that different

civic elm
#

I ordered a esp32 for my data collection project. I dunno what to collect yet but maybe I'll make a plant classifier

#

Kinda excited about it lol

thorn bobcat
#

anyone here used a siamese neural network before?

coral field
#

are there any good datasets for training a model to detect classes of cars? by this i mean something similar to Stanford's car dataset, but with moremuch much more than 44 images per car.

desert oar
#

i don't think i'd have thought of it either

#

lowest pixel in each column + loess/lowess would probably do the job of moving average + linear regression

#

that way it can actually be nonlinear

#

Local regression or local polynomial regression, also known as moving regression, is a generalization of the moving average and polynomial regression.
Its most common methods, initially developed for scatterplot smoothing, are LOESS (locally estimated scatterplot smoothing) and LOWESS (locally weighted scatterplot smoothing), both pronounced . T...

sturdy canyon
neon mango
#

I don't know how to articulate this, but as a preface tl:dr; I made a script that turns a documentation website into a really naïve knowledge graph for actually useful embeddings. I'm trying to figure out a generic way to get the selector for main documentation content of most websites. Does this seem like a good idea? ```document.querySelector('h1').parentElement.parentElement.children.length
32
document.querySelector('h1').parentElement.children.length
1
document.querySelector('h1').parentElement.parentElement.parentElement.children.length
3

#

My goal is automating my own version of those "Chat with your documentation with sources included!!!!" and this is the last bit to making it pretty much insert a URL and you'll have a somewhat useful embedding to chat with.

tepid tartan
#

Ibm or google certification? Which is better or there is better route?

fallow frost
lapis sequoia
# tardy cloak Okay but I want to start with ML and DL asap

most of the time you will spend will be on that, not on building deep learning models. And when you get a job, in most companies the model would be already there in some cases. You don't always spend your days only building models. So if you want to do this, you need to learn the not so cool stuff too.

lapis sequoia
severe trellis
#

I'm confused regarding the difference between neural networks and ML. Given a linear regression model (an ML model) designed to identify whether something is a windmill or not would be trained with a bunch of pictures of windmills, where it'll autonomously create its own patterns (and weight them!) during the training phase.
This trained model can now tell us whether an image is a windmill or not (to a questionable accuracy).

Through my research, this is not considered the use of a neural network, which confuses me, as it seems like a lot of this is done "under the hood", especially the creation and identification of patterns.

tepid tartan
#

Built my portfolio as well if need be

lapis sequoia
# tepid tartan To get experience, and knowledge and start before I graduated or taken those cou...

I did the same, but I tried to do it horizontally not vertically since its a really broad field. I tried to study from everything to get a sense of it and choose what I liked the most. So I would recommend you take a look at most things. Start by having a strong foundation in the needed mathemathics (linear algebra, propability, statistics etc maybe even monte carlo methods) and in python ( you can do a project for your portfolio with python, maybe a webscrapping thing that you can build on afterwards). After python try to get courses in datavisualization, learn the basics, you can add a dataviz project to your portfolio with the data you scrapped from the python project. Try then to take a machine learning course (Andrew ng has a really good one) and get a sense of most of machine learning algorithms. You can take a small kaggle competition and try to use machine learning to find a solution and test most of the algorithms. You can also take a course in data analysis, expoloratory methods and things like clustering, pca, mca etc... Then jump into deep learning try to learn the foundation. Try a project for your portfolio along the way. After that as i said you can try things horizontally, you can try some computer vision, some nlp, some graph neural networks, some reinforcement learning etc and try to find what you like the most. You can easily find projects for your portfolio along the way. I'm a recent graduate and this is what I did to learn. For nlp, i tried to generate arabic poetry. For computer vision I tried a competition of classification of sign language. For Reinforcement learning I tried to beat the mini black jack (from gym library).

lapis sequoia
tepid tartan
#

Appreciate. I have token Python but still trash it and still on this on how I got B+ on it. Currently taken college algebra and really hated because it's pretty there with no use. I do have discrete and stats classes. I'll focus on those when I get there.

#

Is linear algebra is same as college algebra? @lapis sequoia

lapis sequoia
lapis sequoia
lapis sequoia
#

take a look

tepid tartan
#

Ok. College algebra is basically that course everyone need go take if they transfer

lapis sequoia
#

those are good yes

#

also where do you study?

tepid tartan
#

Computer Sciences

lapis sequoia
#

i was asking about the name of the uni

tepid tartan
#

Franklin University

#

Its a online school

#

The downside personally is a 12-week course

#

The best option that I would like to work on is Python,that related to data science @lapis sequoia

nocturne spruce
#

Guys i already finished my linear algebra course on khan academy which one should i take next in your opinion?(i wanna have a good foundation for ml/ai):
-Multivariable Calculus
-Differential Equations
-Statistics Probability

civic elm
#

All of the above?

nocturne spruce
#

have started with linear algebra already and im nearly finishing it so the next one you reccomend is prob/statistics so lets go for them. After i finished it you think order of this 2 plays a role?
-Multivariable Calculus
-Differential Equations
Or it wont matter much

serene scaffold
nocturne spruce
civic elm
#

I regret not studying enough of derivatives

nocturne spruce
civic elm
nocturne spruce
#

it seems to be more reasonable to learn all this math first then start to build models

#

rather than doing it in the opposite order

civic elm
#

you might be right, try to check logistic regression every one course you take and see if it makes sense

#

check especially the gradient descent if it makes sense if not then study more calculus

nocturne spruce
rancid salmon
#

just make sure you don’t fall into the trap of spending unreasonable amount of time in just the math learning phase and not transition to actually doing projects. Get a foundation and each project will hint at you to learn which math

#

Also there are tons of pre-built libraries out there which abstract a lot of the math. Many things you’ll find to have been “nice to learn” but not super requirements.

#

Get a foundation > Do actual projects > Let project demands and curiosity drive future learning.

molten hamlet
#
df_agent = df[df['agent_i'] == agent]

is this multi index?

nocturne spruce
#

because dont know how to approach it

small wedge
# nocturne spruce yeah thanks but u think its neccesary to finish all this 4 courses before doing ...

There are sort of 2 parts to ML, there's the theory and the math. Knowledge of the theory is all you need to use high level libraries like pytorch and keras. The math helps you with diagnosing problems, coming up with new ways to use your models, and understanding why certain results happen/quantifying preformance. I agree with ashe that you can start on projects before being knee deep in all these math topics provided you have a working knowledge of theory.

nocturne spruce
#

What ides are u using for ml/ai? I've seen multiple people using Jupyter Notebook

small wedge
#

Just whatever you like, there's not really any benefit to any IDE for ML specifically

#

I use vscode

nocturne spruce
#

i like jetbrain's ides

small wedge
#

Pycharm?

sturdy canyon
# nocturne spruce yeah thanks but u think its neccesary to finish all this 4 courses before doing ...

I have walked middle school classes through ML projects where they were able to create their own classifiers. Unless you're specifically trying to build or modify your own model architectures, I don't really think any of these things are necessary to start projects in ML. In fact, I'd recommend at least taking a few out of the box/open source models and seeing what you can do with them, perhaps even before taking a lot of these classes so you have a frame of reference for how these topics are utilized in the real world and can start to form questions about how they work. Seeking out and learning the answers to those questions will often be a far more valuable and robust way of learning than trying to learn all of the theory first and then trying to utilize it in the real world after the fact.

#

The only caveat that I will add is there will be some trial and error doing it this way, which you'll have to understand and accept. Especially if you're self-guided, I wouldn't expect to actually complete the first projects you start. You should try to do as much as you can until you hit a wall, then take a break, learn some more, and either come back to it or use your experience + what you learned to start a new project that you're interested in

outer flare
#

anyone here willing to teach me machine learning in python?(just trying to make some projects so that it may increase the chances of me getting into a university)

#

(also i like python so yea)

serene scaffold
sturdy canyon
# outer flare anyone here willing to teach me machine learning in python?(just trying to make...

There are plenty of YouTube channels devoted to intro ML concepts. I'd recommend looking up something you'd be interested in building and trying to find a model that can achieve that. From there, look for tutorials on it/something similar. Alternatively, you can just shop around for tutorials until you find one that clicks with you on a topic you enjoy. The earlier you just start trying (and more importantly failing) things, the better off you'll be!

outer flare
#

hm k

rugged rapids
#

find out WHY you want to learn ML/AI

civic elm
#

need the math foundation though so the papers can make sense

sturdy canyon
civic elm
sturdy canyon
#

Ah alright, I reasonably agree then! I think the math is very important once you've got a solid understanding of things and are looking to get into the weeds of optimization or experimentation. Also not saying it's bad to have the background when starting out, but it's by no means necessary

kindred spade
#

What could be the reason that my model is reporting very poor accuracy during training but giving me perfect recall/precision when I test it after?

sturdy canyon
#

You're testing with data used in training?

kindred spade
#

yes, the goal was to attempt to overfit a model just to ensure that I could reach high accuracy on a small subset of the dataset

#

it's giving me around 0.07 accuracy with near 0 loss, so I wanted to investigate further, and I found that the model is actually performing perfectly

sturdy canyon
#

I suppose I'm a bit confused. You trained your model on N samples, and that model had accuracy reporting every X epochs. Then, you took a subset of N and used it in a validation test and it reported both perfect precision and recall? What data is being used in the accuracy calculation?

kindred spade
#

yup i was confused too, i just found the issue though

#

basically i was using a subset sampler for this training, and the problem was to calculate accuracy i was using len(dataloader.dataset)

#

but the problem was, that returned the full dataset whereas the sampler would actually only return a small portion of it

#

do you know if there's a better way to get the number of samples in a dataloader as opposed to just tracking the number of samples that the code receives each epoch?

sturdy canyon
#

Better way as opposed to what? Len(dataloader.dataset)?

kindred spade
#

yeah, because the subset sampler means that it's not using the full dataset

#

that was my initial code which caused the problem

sturdy canyon
#

I imagine the data used in training is being passed or accessed somewhere no?

kindred spade
#

yeah, i could just track it bit by bit

#

but i think len(dataloader.sampler) is working for now

#

as long as I have some type of sampler

nocturne spruce
lapis sequoia
#

Hey I just recently learn basics of python and now I want to learn AI can someone who already in this field help

sturdy canyon
hollow heath
#

Has someone been able to make a PDF to TEX converter that works reliably? Possibly making use of object detection and the TF API?

rugged rapids
#

thats a really shallow answer

rancid mango
#

Hii, can someone explain this for me? There is a decision tree expresivemness called N-XOR. I know that XOR is basically when one input is true the output is true (out of for example two attribuets). But Is N-XOR when its the theoretical output with n attributes, applying the XOR qualities of only one is true?

nocturne spruce
calm pasture
#

im a relative new beginner in python and i'm interested in making mlb game score predictor. would anyone be interested in assisting me with this on github?

simple tapir
#

I've been studying machine learning for a while and nowadays, I've a question stuck in my head. I wanted to make money while I'm at university but I'm confused about how to do something with artifical intelligence. I mean, game developers create some game, publish it and they make money but machine learning engineers? You may think of creating a chatbot or something which is pretty common or some prediction apps...

serene scaffold
simple tapir
#

Well, look at mark zuckerberg, he created facebook as a student. I wanted to create something while I'm still at uni but artifical intelligence, which im interested in, doesn't seem to be such a field, or I'm too ignorant about that

#

I may make some computer vision projects et cetra but how would it be useful to people as it runs in console

serene scaffold
#

cases like mark zuckerberg are one in hundreds of thousands.

#

you can try. and I suppose there's an incredibly small chance that you'd be successful. but in all the cases where you aren't, your time would have been better spent studying and applying for internships.

simple tapir
#

i didn't mean to quit uni. I want to create, produce something while I'm at uni

serene scaffold
serene scaffold
#

hiring managers for internships will be looking at your grades, and a signal that you're a good fit for what their team does. working on ML-related projects could satisfy the latter, but if you're working on those projects chiefly to monetize them, considerations that are more important to those hiring managers might fall by the wayside.

simple tapir
serene scaffold
simple tapir
#

Can I apply for a remote internship without letting my school know?

serene scaffold
simple tapir
#

I see

#

What if i want to create something on my own, would ai be appropriate for such a purpose?

serene scaffold
simple tapir
#

Should i learn any other fields on the other side?

serene scaffold
#

whatever is interesting to you, sure

#

you don't have to pick a lane and stay in a lane

simple tapir
#

I see. I sometimes get burnout due to this issue :/

serene scaffold
#

what's the rush?

simple tapir
#

to be productive and useful

merry ridge
#

I've hired a ton of University undergrads over the years for one project or another. Frankly the ones that got noticed and lead to them being offered some form of a paid position already loved their topic so much they were doing a LOT of work on it unpaid and developed a reputation for it. The majority of students that were not that passionate could never make it through the most basic hurdles I set in front of them before I could seriously consider them.

simple tapir
#

I see your point, thanks a lot guys. 🙏

sullen sage
merry ridge
#

I don’t know. It depends on the situation. A basic hurdle is usually something like look at the research group website and tell me what interests them. I don’t set complicated hurdles. I may as well just tell them to never talk to me again.

lapis sequoia
lapis sequoia
#

I'm trying to train an AI model to recognize certain images, but I keep getting errors similar to this:

in user code:

    File "C:\Users\techi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\keras\src\engine\training.py", line 1338, in train_function  *
        return step_function(self, iterator)
    File "C:\Users\techi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\keras\src\engine\training.py", line 1322, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "C:\Users\techi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\keras\src\engine\training.py", line 1303, in run_step  **
        outputs = model.train_step(data)
    File "C:\Users\techi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\keras\src\engine\training.py", line 1080, in train_step
        y_pred = self(x, training=True)
    File "C:\Users\techi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\keras\src\utils\traceback_utils.py", line 70, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "C:\Users\techi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\keras\src\engine\input_spec.py", line 235, in assert_input_compatibility
        raise ValueError(

    ValueError: Exception encountered when calling layer 'resnet50' (type Functional).

    Input 0 of layer "conv1_pad" is incompatible with the layer: expected ndim=4, found ndim=3. Full shape received: (224, 224, 3)

    Call arguments received by layer 'resnet50' (type Functional):
      • inputs=tf.Tensor(shape=(224, 224, 3), dtype=float32)
      • training=True
      • mask=None

and

Data cardinality is ambiguous:
  x sizes: 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224
  y sizes: 201
Make sure all arrays contain the same number of samples.

and i have no idea why

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied timeout to @lapis sequoia until <t:1690006145:f> (10 minutes) (reason: newlines spam - sent 107 newlines).

The <@&831776746206265384> have been alerted for review.

merry oak
#

!unmute 1000729109720219778 use the pastebin

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: pardoned infraction timeout for @lapis sequoia.

hollow elbow
#

excuse me, i want to ask something, Im currently doing a logistic regression in python and i have some independent variable as a string data type in my dataset, the string data are all categorical like (Short, Medium, Tall) etc. can i just change those values as like number (1,2,3) so that the sklearn can take it as a integer data type or i should not do that?

lapis sequoia
#

ive tried several variations, searched stackoverflow and reddit

#

even asked chatgpt

mild dirge
#

It's called "one hot encoding"

#

Converting it to integers 1 2 3 can also be good, but then you implicitly say that short is closer to medium than tall f.e. which in this case is true. But you also say that the distance between these terms is the same (short is 1 away from medium, and 2 away from tall).

hollow elbow
sleek harbor
#

this is really handy, can't believe I only found out about it now:
%%script echo wubalubadudub ᘇᘏᗢ、

vapid sentinel
#

guys how we use data science in Microbiology

outer flare
#

are there any projects done already on data science?

#

like a project bin consisting of like 15+ projects

onyx vale
#

anyone familiar with delpying machine learning code

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied timeout to @rough schooner until <t:1690027633:f> (10 minutes) (reason: links spam - sent 90 links).

The <@&831776746206265384> have been alerted for review.

nocturne spruce
#

guys do u think that trying to write my own ocr as a first machine learning project might be a good challenge or its too hard for beginner?

past meteor
hard rover
#

sir, my code stopped working cuz they changed sites

strong sluice
#

I ended up going down a rabbit hole and made an OCR program, to automatically click on genshin impact menus to get details of equipped items LOL

#

training the dataset was absolute hell because cleaning, but ill take the 82% accuracy

#

(and then a month later genshin makes a website that retrieves data from the game anyway)

earnest anchor
jaunty geyser
#

when you transpose a tensor is the Original tensor and the transposed tensor the same

tidal bough
#

what do you mean "the same"?

#

they certainly aren't equal in the general case - if swapping a pair of axes doesn't change the tensor, it's symmetric by these two axes.

mint palm
#

does CNN with GRU sound wierd?
I was thinking of using in my research
I have heard a lot of CNN with LSTM but how about this, why does it sound so wierd lol

quasi sparrow
#

How can I orchestrate machine learning pipelines for devices on the edge? I know Apache Beam is used for large models, but I don't need that complexity.
Is there something equivalent for small models?

serene scaffold
quasi sparrow
past meteor
#

We don't use Apache Beam or anything of the likes. I think Beam is more for distributed processing and not necesarily inferencing?

simple flame
#

Hello, does anyone here have experience building and training a MAE model? I want to create a simple MAE model and train it on the MNIST dataset. I wonder if it is possible.

sullen sage
#

Bro do machine learning jobs really only consist of training models? there's no way

hasty spear
#

well, thats kinda the job, to sum it up, more or less

#

loads of people experimenting with applications too im sure

#

but i'd imagine most novel applications require training and data and shit to even just try

mint palm
iron valve
#

Is there any books u guys recommend to learn SQL as complete beginner for DA/ML?

nocturne spruce
#

Guys most of the models are hosted in goggle workspace?

young granite
young granite
simple flame
#

No...the masked auto encoder model

young granite
young granite
# simple flame No...the masked auto encoder model
simple flame
#

Thanks. I will take a look tomorrow

brisk vapor
#

!warn 712970859400265789 our server is not an ad board. Asking for retweets and likes here is againt our rules. Don't do this again unless you wish to be removed from the server.

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied warning to @plucky meadow.

true scaffold
#

Is it possible to use tools without using agents in langchain? If yes, then how? The issue is, while using agent, it tends to hallucinate a lot, plus, it takes more time as the computations increases a lot because it needs to think and for that it again uses the LLM, so I’m trying to avoid using agent but i need to use the Tools in order for my chatbot to search net, do maths and other stuff. With agent, inference : 2-3 minutes + hallucinations, without agent, inference: 10-15 seconds + very well generated output. Some help?

quasi sparrow
past meteor
#

Will you use multiple models on edge or just 1?

quasi sparrow
#

It’s just one model. I have 3 drilling units and want to run simple anomaly detection with VARs and predicted power efficiency with XGBoost in each one of the drills.

#

I’m not doing anything distributed or an inference that requires a lot of computational power, but still, I want to automate my pipelines as much as possible because once the model is deployed on-site, it’s hard to go online to monitor.

past meteor
#

Are you sure it needs to be on edge? Can you call an API somewhere that has the model? That drastically makes things easier.

past meteor
quasi sparrow
#

But what about data preprocessing and monitoring performance such as data drifting and retraining models. Do any of the tools normally used for large models still apply to applications like mine?

past meteor
#

Do you need to do predictions in real time?

sturdy canyon
quasi sparrow
#

I thought about using TFX and airflow because it would be easier for me to troubleshoot the system if I get a call about the system not working as expected.

#

The locations are remote and far from where I live :/

past meteor
#

Your edge device could feasibly just have a container running your model(s). You SSH to it from anywhere if you need to trouble shoot it

#

If you're able to persist the data you do the following, store the prediction and the data somewhere in a DB/filestore. Upload this in batch to a server you can easily access

#

This means you can do all of your monitoring, retraining, ... with MLflow running somewhere completely differently

quasi sparrow
#

That makes a lot of sense! That way I only use the edge device for inference and if the models drifts, I can just push another trained model in a container to the edge device.

#

Thank you 👍🏻👍🏻

hasty mountain
#

Gated CNNs, Dilated CNNs. I think Flowtron uses more Invertible CNNs (since it's a Flow model)

true scaffold
jaunty geyser
#

Can somebody explain to me how a tensor can you represent any kind of data?'

ancient fractal
#

I want to add a point spread function to my image as a noise, is function correct for generating a point spread function? py def generate_PSF(size, sigma): """ Generates a 2D Gaussian Point Spread Function (PSF). """ x, y = np.meshgrid(np.linspace(-size/2, size/2, size), np.linspace(-size/2, size/2, size)) d = np.sqrt(x*x + y*y) psf = np.exp(-d**2 / (2*sigma**2)) psf /= np.sum(psf) # Normalize the PSF return psf

rancid salmon
coral field
#

when using tensorflow's "tf.Keras.Sequential()" for data augmentation, is the data & its corresponding labels duplicated or altered in place

serene scaffold
coral field
#

i don't have enough training data, so i want to use data augmentation to try and increase it

serene scaffold
#

Sequential is just for putting however many layers together, whether you're using it for data augmentation or whatever else.

coral field
#

then how do you recommend i increase the amount of training images without downloading new files?

odd meteor
coral field
#

but does calling the sequential on the data to augment it create more copies of the data?

serene scaffold
#

it's more that you're passing data through the Sequential model. and putting a tensor through a model doesn't modify the tensor that you put into it--it returns a new one

odd meteor
serene scaffold
#

also, models that you make with Sequential aren't necessarily for data augmentation. Sequential is just for general purpose building of (relatively simple) neural networks.

coral field
#

alright

odd meteor
#

What kind of data are you working with? Image? text? tabular data?

coral field
#

Image

odd meteor
tacit basin
#

Looking for working script / code to instruction fine tune llama2 on 1xA100. Tried Abhishek's autotrain advanced in 4bit but so far can't get loss to go in expected direction 😅. If you successfully fine tuned llama2 and willing to share how I would appreciate that 💚

deft skiff
#

Do you know free books about ai for gaming.?

small wedge
deft skiff
small wedge
deft skiff
#

Thank you...

tepid tartan
#

What type of sql do I need to know on data science?

tiny python
tepid tartan
slate kraken
#

Hi, I am building a text classification model, I wanted to know if similar sentence not duplicates might affect model performance. For example:

I hate watching Netflix series
I hate watching Amazon Prime series

Or should I just keep one instance of the sentence like just keeping either of I hate watching Netflix series or I hate watching Amazon Prime series

cold osprey
#

i dont understand

#

what are the other categories?

wanton sentinel
#

Anyone know why pandas would still be throwing A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead warnings even though I am doing exactly that?

#

Is it because the referenced DF is already just a filter of another DF (meaning I should copy(deep=True) it)?

quartz ivy
#

is anyone interested in medium account sharing?

hollow kettle
# slate kraken Please anyone can help?

what are the categories you are trying to classify the sentences into? can you explain your project a bit further? people might be able to help with this info

#

In general, you do want to summarize similar sentences into one category (in this example: "I hate things" or "emotions") for your model to be able to recognize those variations of sentences belonging to one category

slate kraken
#

Its a sentiment analysis model a multi-class sentiment analysis. So the categories are negative, positive neutral.

slate kraken
hollow kettle
#

I think I understand your point now, so the sentences should have a great variation of for example negative verbs or adjectives, to prevent overfitting to one specific verb I'd recommend not to use the same verb in many training sentences. But using the same construction twice or 3 times should not be a problem.

slate kraken
#

Got it and my data is pretty big with different variations of negative reviews these are just few instance where the construction is same

#

Thanks a ton @hollow kettle

hollow kettle
#

Then it shouldn't be a problem, glad I could help

native umbra
#

Hi, I do not know how to express this, but I am a bit lost. I'm uncertain about which roadmap to follow or what field to choose. I recently completed the course "Supervised Machine Learning: Regression and Classification" by Andrew NG, and everything went well. However, when I started the second course, "Advanced Learning Algorithms," which covers neural networks using TensorFlow & Keras, I found it quite challenging to understand. I would say I only grasped about 20% of the material, and it feels overwhelming. Also, I could not understand the syntax well, which adds to my confusion.

I have completed some basic projects like house price prediction, diabetes prediction, and two other logistic regression projects. Despite that, I'm still confused and struggling to grasp the information, unlike my experience with computer science. Can someone please advise me on what I should do next?

lapis sequoia
# native umbra Hi, I do not know how to express this, but I am a bit lost. I'm uncertain about ...

that's normal, it takes time to understand and grasp everything. What i did was take those courses more then one time while trying things in between and each time i would understand more. So try to see what you don't understand. Is it maths? go take a look at some concepts you need like the gradient descent or the linear algebra. Is it python/matlab? go take a look at the syntax and what we're doing.
What I would advice is to try to implement perceptron alone with python don't use any library other then numpy and try to understand what is happening. Print everything and try to take an example by hand to see what is happening. You can try to fit the XOR function and understand what is happening under the hood then take that course another time

sullen sage
#

Is there anything that currently exist that can get me something like 10k price charts, or book summaries of data in a csv in 5-10 seconds?

hollow elbow
#

Hi, i want to ask, if u did a logistic regression and then you get the coefficient of the model, can u predict the dependent variable by using those coeficients?

Ex
Coef_ age = 1.23
Coef_tall = -0.61
Coef_Single = 0.45

and i want to find the dependent variable (1 or 0)
withm data are , age =20, tall = 1, single =1
can i count the probability manually?

cold osprey
#

yes thats how a logistic regression works

#

are u familiar with the formula?

cinder schooner
#

Hello, I have a baseline model that does segmentation + postprocessing to achieve object detection and classification (postprocessing to find the bounding box out of masks). This model achieves 93% on the metric. I'm trying to use the same model backbone but changing the segmentation head with linear layers to detect directly if each class is there and the 4 bounding box points (so 5*number of classes neurons on last layer). The model is good but not enough (91% on the metric, i need to achieve the same as the baseline at least to change it since this one is really lighter and the post processing was a bottleneck).
What can i do to tune my model? Would Knowledge distillation work from segmentation to teach the second model?

native umbra
hollow elbow
terse ledge
#

hello, i have a database of approx. 300k in-game character names (no personal info) that have been previously reviewed by moderators after being requested for approval. each row in the data is labeled as "approved" or "rejected", when it broke a name rule (i.e. bad words).

what would the best approach be to train a model to help provide an extra data metric to names pending review as "system leans reject" or "system leans approve"?

cold osprey
main solar
#

where can I talk regarding Machine learning?

serene scaffold
main solar
#

Actually I am just a begineer

#

But took this ML course based on python

#

and I am learning it rn

modest hatch
main solar
#

I know C and C++ for now

modest hatch
#

Hi people... Newbie here

main solar
#

learned it years ago

#

yeah I am new here too

lapis sequoia
#

I have 25 classes and a bounding box for each class with a batch size of 32, so the prediction is of shape 32,25,4. I calculate the IOU loss by calculating the mean of all the ious of each pair of predicted - label box. I want to find a way to give more weight to the boxes predicted for some classes since the model predictes poorly on those

oblique quarry
#

anybody familiar with convolutional layers?

hollow elbow
grand quarry
#

Hey, I would like to create a Pandas DataFrame in which one of the columns is a list.
I have one list like this:
[1,2,3]
and another like this:
[[x,y,z],[x,y,z],[x,y,z]]
I want this:
col1:[1,2,3],col2:[[x,y,z],[x,y,z],[x,y,z]]

#

I get error: Per-column arrays must each be 1-dimensional

agile cobalt
#

you should just about never put lists inside of a pandas dataframe

grand quarry
#

How about numpy arrays?

agile cobalt
#

even worse

#

why are you trying to put lists in a dataframe?

grand quarry
#

I basically have names and categorical data to those names. I would like to save it so I can decode the actual name later

agile cobalt
#

do you have multiple names or multiple pieces of data per name?

grand quarry
#

john:[0 0 0 0 1] something like this

tidal bough
#

!e it just works for me:

import pandas as pd
df = pd.DataFrame({"a":[1,2,3],"b":[[1,2,3],[2,3,4],[4,5,6]]})
print(df.dtypes)
print(df)
arctic wedgeBOT
#

@tidal bough :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | a     int64
002 | b    object
003 | dtype: object
004 |    a          b
005 | 0  1  [1, 2, 3]
006 | 1  2  [2, 3, 4]
007 | 2  3  [4, 5, 6]
tidal bough
#

it's probably a bad idea since, well, object dtype, but it works.

grand quarry
#

What is the proper way to save categorical data to decode later?

tidal bough
#

What do you mean by categorical data?

grand quarry
#

Cause neural network will output [0 1 0 0 0] and I wont know what name it is refering to

tidal bough
#

oh, you mean one-hot encoding

grand quarry
#

yeah

tidal bough
#

perhaps just map it to indices? like, [0 0 0 0 1] -> 4.

grand quarry
#

Yea and I want to save it into a dataframe

#

But as you say its not the proper way to do it right?

grand quarry
tidal bough
#

Do you have the array of names?

grand quarry
#

yep

tidal bough
#

then yeah, map these before putting them into the dataframe.

#

whatever ML library you're using might have a function for that; otherwise you could do, uhh

agile cobalt
agile cobalt
grand quarry
tidal bough
tidal bough
#

no, I think mine is right, the first array returned by where should just be an arange(N) if there's one nonzero per row, and hence boring.

agile cobalt
#

!e ```py
import numpy as np
print(np.where([0, 0, 1, 0]))

#

[0][0] actually

tidal bough
#

that's a (k,) array, not an (N,k) one

agile cobalt
#

oops

grand quarry
#

I think im just gonna save it as a dictionary, much simpler

serene scaffold
#

though it's often the right one. pandas pandas pandas.

summer halo
#

I have two pandas dataframe in Python. Each dataframes gather sensor data from a different sensors. Let say that one takes 1.000 samples per second and another one takes 1 per second. If I merge both dataframes, I have two possibilities to deal with this difference:

  • Using sparse based structures: this way most of the fields would be None and thus saving lots of memory
  • Filling all empty spaces with repeated values. In this case, is there a way to do it without blowing the memory? Like same way as categorical data, where huge repetition of values do not penalize
civic elm
#

Hi I got a question about statistics, what is the relationship of probability and a standard deviation? My understanding is that if a feature has high std deviation then it is easier to predict?

summer halo
tidal bough
#

(duckdb can probably do it too)

mellow fox
#

any roadmap for data science and ml?

left tartan
summer halo
left tartan
#

Secondly, use an ‘as_of’ join to combine that type of data: where data is updated on different intervals.

summer halo
left tartan
#

Read the link for as_of’s, they’re powerful but that’s a better explanation (and as_of is supported by a variety of systems).

summer halo
#

This function seems to be really useful. I will have ended up wondering how to do this as well

left tartan
summer halo
summer halo
#

Let's assume storing the data is not a problem

#

I saw that the function you told me tends to repeat all this data when merging the two dataframes. So I would also have a problem when it comes to merging as well. I will check the flags that the function admits

left tartan
#

If you're using plain vanilla pandas, then yah, it'd probably be a concern if you're storing a large number of values.

#

(but there's often a memory / performance tradeoff here)

grand quarry
#

Hey, I have Numpy array X with shape (1000,10).
How do I reshape it into (1000,640)?
I have hot encoded data with 64 values. The actual numpy array should be (1000,10,64) but it isnt?

I hope the question makes sense...

serene scaffold
#

you can reshape (1000,640) to (1000, 10, 64), however.

grand quarry
#

yeah true, I meant to say my array is in a different form than I would like.

serene scaffold
#

so do you actually have a (1000, 10) shape array, or (1000, 640)?

grand quarry
#

(1000,10)

serene scaffold
#

and you have not yet one hot encoded it?

grand quarry
#

print(X[0][0]) returns part of the hot encoding, instead of print(X[0]) returning all of it

serene scaffold
#

I don't understand that sentence.

can you show the code that includes the part that performs one-hot encoding?

#

@grand quarry please ping me when you show the code, and if I'm able to look at it at that time, I will.

grand quarry
serene scaffold
grand quarry
void veldt
#

anyone here use Scipy and LMFIT before?

serene scaffold
void veldt
#

since the code is a bit on the longer side, I posted in SO and am posting the link to the question here. In short, I get different answers for LMFIT vs. Scipy

merry ridge
#

Depending on the natural of the problem it's not unusual for very small differences in the convergence criteria or implementation in an algorithm can cause them to find very different local minima. You should probably start by trying both methods on something easier and convex to see how much the solution differs in that case.

void veldt
young granite
merry ridge
#

I'm not sure what point you are trying to get across. If both methods found a local minima, there could always be another method out there that finds an even better one you haven't tried yet. You should at least perturb your initial conditions and get some vague sense for how resilient your function to finding very different local extrema.

void veldt
void veldt
short path
#

a table in JupyterLab is showing médio instead of "médio"

#

How could I change it? Isn't the default encoding of jupyter the UTF-8?

flint shoal
#

hi question about clasification machine learning model

flint shoal
#

i have perforation in the land positive perforation and negative and i want to use maps pixes with raster max such hidrology, gelogical and topographic etc

#

my problem is okay i can use model in machine learning such a random forest, SVM etc for clasificate a map nevertheless the data of the perforation is not in the same high so the data is not homogenouse

#

so i want to found if a perforatin is negative if not found water or positive if it found water but also is important where or in which distance i will find water cause then i will have like a kind of map cluster clasification

#

u.u

#

the othe rproblem that i have is raleted with the amount of data cauze if i start to cluterize the data in the depth i will lose data and quantity

#

then maybe i can develop a model in machine learning over 50 to 100 meters cause i have more resources than one of 200 to 300 or more about 400 meters

#

that taken intoacout that plus topography also will afect your axis

cursive drift
#

hi, it is possible learn develop AI without deep math knowledge and freelance?

agile cobalt
#

for extremely simple things you can find literal hundreds of tutorials and examples of online? yes

developing actually new stuff? no

cursive drift
visual sleet
#

But it still does involve math

cursive drift
visual sleet
#

However, in life, if you want to get good at something, you should spend a lot of time thoroughly learning it from the bottom to the top

visual sleet
#

Machine learning is more mathematics than it is coding

cursive drift
visual sleet
agile cobalt
cursive drift
#

but im very interesting in AI

#

and i love programming very much

visual sleet
cursive drift
#

but i know only scraping and create bot, but now i need earn in freelance

cursive drift
visual sleet
#

ML/AI is quite complex and takes time to learn. If you’re trying to make quick money but don’t care, not sure if you want to go down this path

visual sleet
#

However, it is your choice. I do agree that machine learning and artificial intelligence are very interesting topics, but I would recommend learning them if you genuinely want to spend time making something out of them instead of trying to make quick money

visual sleet
#

You could become a game developer on a platform like roblox and commission

agile cobalt
cursive drift
visual sleet
#

Forgive me please

cursive drift
visual sleet
#

My ultimate recommendation for anyone is to program because you like and enjoy it

#

Money is a byproduct of hard work and success

cursive drift
visual sleet
#

That goes for anything in life as well, do something because you like it and perhaps great money could be made from it, but never do something solely for the purpose of making money with you somewhat liking it

agile cobalt
#

idk overall I'd probably recommend just looking for a normal (non-programming) job and looking for anything you can use your programming skills in in it

cursive drift
visual sleet
#

Because what will happen is that you won’t work harder than your competition because you don’t have the passion or drive like your competitors do

cursive drift
visual sleet
#

What is your first language?

cursive drift
#

python 😆

#

and its last