#data-science-and-ml
1 messages · Page 73 of 1
now rewriting it in rust
oof, it's about as fast as the python solution. which, like, makes sense because it uses python lists, I suppose
I've followed this tutorial but it doesn't work. https://medium.com/@gpj/making-your-ai-sound-like-you-a-guide-to-creating-custom-text-to-speech-8b595d5cf259 it gives me this erroe message : ModuleNotFoundError Traceback (most recent call last)
<ipython-input-2-0f6e1f002713> in <cell line: 9>()
7 import IPython
8
----> 9 from tortoise.api import TextToSpeech
10 from tortoise.utils.audio import load_audio, load_voice, load_voices
11
2 frames
/content/tortoise-tts/tortoise/models/transformer.py in <module>
4 import torch.nn.functional as F
5 from einops import rearrange
----> 6 from rotary_embedding_torch import RotaryEmbedding, broadcat
7 from torch import nn
8
ModuleNotFoundError: No module named 'rotary_embedding_torch'
In this tutorial we are going to generate speech from text with our own voice.
Hello, is any data science big brain able to explain to me why my precision/recall curve drops so suddenly?
The task is object detection.
I've used YOLOX trained on COCO dataset and pycocotools for evaluation, but for some reason my precision and recall drop to 0 after certain threshold (depending on a class).
Here is the code snippet:
https://colab.research.google.com/drive/1A7JC2hNxFDLVLJzRUK-jELcYb4BqLwdF?usp=sharing
What distro do u use? And have u tested GPU support (like for tensorflow, xgboost and stuff)?
ubuntu. i don't use tensorflow, but jax works well with gpu for me
Would anyone be able to explain to me exactly how backpropagation through time works in an LSTM? I’ve been trying to understand, but can’t seem to grasp how it actually works.
Preferably with an example like stock market data, using RMSE.
maybe: ```py
@nb.njit(parallel=True)
def separate_elems_numba1(cloud: np.ndarray, k: int, indices: np.ndarray):
gc = [np.sum(indices == i) for i in range(k)]
lists = [np.empty((gc[j], cloud.shape[1])) for j in range(k)]
# Track the next index to insert into the lists[i] array, to avoid append
last_idxs = np.zeros(k, dtype=np.int32)
for i in nb.prange(len(indices)):
idx = indices[i]
last_idx = last_idxs[idx]
lists[idx][last_idx] = cloud[i]
last_idxs[idx] += 1
return lists
hmm, does this work correctly? these parallel operations on the same variable concern me.
Correctly wasn’t part of the requirements 😉
Yah, there’s (most likely) a race for the last_index where two iterations get the same last_idx.
Maybe rewrite to compute insert position for each i (a cumulative count), eliminate last_index, such that for each i, we just need the index and cumcount
if correctness isn't needed, then we could just [] lol
gc = [np.sum(indices == i) for i in range(k)]
can be anp.uniquebtw, with handling for the case where there's no occurences of one of the groups
can anyone tell me the 2 types of ai and what is the diffrence
i'm gonna write this as a C extension tomorrow
The lists could be arrays
Fedora. Great distro, was trivial to make my NVIDIA GPU etc. work
nice
is it possible to make a NOR gate by combining two perceptrons?
because I tried this
def modified_perceptron(input1, input2, input3, expected_output):
output_perceptron = input1 * weights[0] + bias * weights[3] + (input2 * weights[1] + input3 * weights[2] + bias * weights[3]) * weights[4] + bias * weights[5]
if output_perceptron > 0:
output_perceptron = 1
else:
output_perceptron = 0
error = expected_output - output_perceptron
weights[0] += error * input1 * learning_rate
weights[1] += error * input2 * learning_rate
weights[2] += error * input3 * learning_rate
weights[3] += error * bias * learning_rate
weights[4] += error * (input2 * weights[1] + input3 * weights[2] + bias * weights[3]) * learning_rate
weights[5] += error * bias * learning_rate
return input1, input2, input3, output_perceptron, weights
Specifically fedora in WSL, or are u using actual Fedora? I've always really wanted to try OpenSUSE, but.. honestly I think those might be too advanced for me (not exactly a Linux pro), especially when using WSL
Work: Windows + Ubuntu SSH or WSL (Ubuntu)
Home: daily driver laptop Fedora. Desktop Windows.
I learnt a lot of Linux through WSL on my laptop and work because all of my development is on remote so when I switched it was really painless / easy.
The main advantage for me was that I always get tripped up when using CMD/PowerShell, some things are easier to install on Linux and some things don't even run (e.g., gunicorn)
Yeah, I was mostly interested about WSL. As actual Linux distros I'm interested in OpenSUSE (vs GeckoLinux, Fedora), Rocky Linux, and recently have been seeing a lot about NixOS. But for now I'm just interested in WSL working with as little bugs as possible (and I've encountered a few problems with Debian, which I tried before Ubuntu - don't know if it was due to my inexperience or actual bugs)
Because of the terminal, or things don't run cus of WSL? I thought it supported graphics.. Try the Alacritty terminal - it's nice
can OpenFace (github) classify facial emotions, or can it only detect where a face is on an image/video?
What is the difference between ai software developer and machine learning engineer
it ultimately depends on the company--there aren't universally accepted sets of job titles with well-defined responsibilities
So in some companies its the same others no
I've never actually heard "AI software developer" as a title.
there's your first mistake.
What are the others
I might be able to answer this later
Can you give me a certain time ? On when
No. I am a volunteer and make no guarantees about my availability
Ok thanks
smh stelercus, this is what my tax money is going to
Where is going to ?
(i'm joking)
My actual job is taxpayer funded
@uneven bronze what is your reason for asking? To decide on a career path?
I am currently on my first year in CS and I am looking to what I will work as so for example i want to be an ai developer or ai programmer I will focus more on it or go deeper in it
Oh ok
There's no difference between "programmer" and "developer"
Anyway, if you're interested in AI, be sure to take as many AI related courses as an undergraduate as you can.
Look for internships that are related to AI. And be prepared to probably do grad school
Ok thanks I thought they were used in Different cases
Like being an software developer I want to be a software developer but with interest in ai somewhere in the middle
Then what I said applies.
Ok and is python a good language for that because its the only one I know I am trying to learn c++ and java
Languages other than python are not likely to matter.
So only python and I have a question what is the job title that you would expect based on the stuff that I gave you
Why do you ask. To look on Glassdoor?
Because if you want to look at salaries on Glassdoor, you should search a few different job titles.
There is no standard ‘title’ for anything. Companies don’t use the same titles, and when they do, they can mean very different things. So, don’t read too much into titles.
The description of the job ?
Yeah maybe
For WSL I'm fine with just running Ubuntu. Idt I'd switch my WSL on my work computer to match my personal computer's distro (Fedora)
I'm not a guru in this stuff either. I just need something that works and both do for me 🤣
I even think that the windows + WSL set-up is perfect for data science. I wanted to try something else for fun. After switching idk if it's demonstrably better or not than just doing that. You're fine either way 🤷
hey anyone know if there's a vectorized way to broadcast but for every combination on an axis with numpy:
a = np.array([[1,2,3,4], [5,6,7,8]])
b = np.array([[11,12,13,14],[15,16,17,18], [21,22,23,24]])
# broadcast but for every combination of rows
b - a
# Desired result:
[[10,10,10,10],[6,6,6,6],[14,14,14,14],[10,10,10,10],[20,20,20,20],[16,16,16,16]]
!e There's a common trick for this sort of thing - making the two arrays "point" along different axes, e.g. here I from (n,k) and (m,k) to (1,n,k) and (m,1,k), which together broadcast to (m,n,k).
import numpy as np
a = np.array([[1,2,3,4], [5,6,7,8]])
b = np.array([[11,12,13,14],[15,16,17,18], [21,22,23,24]])
print(b[None,...] - a[:,None,:])
@tidal bough :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | [[[10 10 10 10]
002 | [14 14 14 14]
003 | [20 20 20 20]]
004 |
005 | [[ 6 6 6 6]
006 | [10 10 10 10]
007 | [16 16 16 16]]]
ah perfect thanks!
Now with 50% fewer race conditions (edit: updated to initialize the typed Dict, edit2: slightly different dict construction using comprehension): ```py
import numpy as np
import numba as nb
k = 16
d = 3
N = 10**7
cloud = np.random.random((N, d))
indices = np.random.randint(0, k, N)
def separate_elems_numpy(cloud: np.ndarray, k: int, indices: np.ndarray):
return [cloud[indices==i] for i in range(k)]
@nb.njit(parallel=True)
def separate_elems_numba(cloud: np.ndarray, k: int, indices: np.ndarray):
e = np.empty((0, cloud.shape[1]), dtype=np.float64)
result = {
i: e
for i in range(k)
}
for i in nb.prange(k):
mask = indices == i
group = cloud[mask]
result[i] = group
return result
%timeit separate_elems_numba(cloud, k, indices)
%timeit separate_elems_numpy(cloud, k, indices)
vnumba = separate_elems_numba(cloud, k, indices)
vnumpy = separate_elems_numpy(cloud, k, indices)
for i in range(k):
print(np.all(vnumba[i] == vnumpy[i]))```
Does anyone know of a bookclub community for python data science similar to the R for Data Science https://www.rfordatasci.com/ community? Basically these are small weekly reading groups who meet weekly on zoom to work through a particular data science / R programming book. They use slack to coordinate, with several bookclubs going concurrently. I have searched but have not been able to locate a python focused community like it! (If you are interested in this, here is a link to the slack: http://r4ds.io/join)
Hmm i found something sort of adjacent: https://datatalks.club/
!resources data science
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
There's an interactive pandas tutorial in here
thanks
the official pandas tutorials and user guides contain a lot of useful information. it's not that easy to follow and sometimes it's missing some details, but it's worth working through it.
Aws
Is there any private ml notebooks like google colab functionality but don't get your data exist ?
Make website through react , deploy it on cloud in aws server. You will have to pay for cloud though
does anyone know anything about CNN's?
https://keras.io/keras_core/announcement/
Keras 3.0 is here, with multi-backend again, for tensorflow, pytorch and jax!
Keras Core documentation
Seems exciting
I have question why python AI really long
#1035199133436354600 if you fancy
I'm working on a problem statement in which I need to predict the amount of bets one has made using poker chips.
An idea is to first treat it as an object detection problem statement where I localize and classify the color of the chip stack (chips may be vertically stacked).
Each detected region of interest will contain [1-N] number of chips of the same color.
Once that is done, I can create two ML models in post-processing. The first model will be a regression model to predict the number of chips in the bounding box using the (width, height, mid_x, mid_y) coordinates. The second model will be a classification model that determines the player who has thrown the chips into the pot using Polar coordinate from a fixed point.
I need suggestions on the camera position and camera's quality.
PS: Sorry for the lengthy description of the approach. I would also like suggestions on the approach as well.
Still depends on TF. Somehow having both TF and Torch in a single venv is very off putting to me
Disk space bottle neck
hmmm
but well if it has to support all of them, they will need to be dependencies right
and rn the dependency list is rather short
just tf, torch and jax basically
Sure but not having to install Tensorflow is in the works. In the future it'll probably be something likepip install keras[Torch] which won't requrie 2GB of tf
Hadn't heard of it 😮
i want to train my own tts with this code https://paste.pythondiscord.com/YHWQ (i want to use css10 in French) but it returns me this error message : Traceback (most recent call last):
File "/home/cecilien/python/TTS je pense.py", line 58, in <module>
train_samples, eval_samples = load_tts_samples(
File "/home/cecilien/miniconda3/envs/tf/lib/python3.9/site-packages/TTS/tts/datasets/init.py", line 123, in load_tts_samples
meta_data_train = add_extra_keys(meta_data_train, language, dataset_name)
File "/home/cecilien/miniconda3/envs/tf/lib/python3.9/site-packages/TTS/tts/datasets/init.py", line 64, in add_extra_keys
relfilepath = os.path.splitext(os.path.relpath(item["audio_file"], item["root_path"]))[0]
KeyError: 'root_path'
Visualizing correlation between hyper-parameters and metrics for Neural Network
I am working with a neural network and I want to investigate how different settings affect the loss and standard deviation of the network. I can change various parameters such as the loss function, learning rate, epochs, batch size, threshold value for loss calculation, number of hidden layers, and specific parameters like beta for SmoothL1Loss or num_harmonic for HarmonicFunctionLoss. I have a CSV file for each loss function I use, where the columns correspond to the settings mentioned above. I want to use matplotlib to represent how these settings influence the standard deviation and loss. However, if I fix one parameter, there is a chance that others may change between different trainings. Therefore, I am looking for a good representation to see if there is any correlation between the settings I input and the metrics I use to define the precision of my network. What is the best way to visualize this relationship?
For a little bit of context, I am working on a network using pytorch to recognize line orientation in images. And I am trying to find the optimal parameters to improve the performance of the network. The standard deviation helps me get a much more precise idea on the performance.
I thought about doing something like this
Parallel coordinates are a common way of visualizing and analyzing high-dimensional datasets.
To show a set of points in an n-dimensional space, a backdrop is drawn consisting of n parallel lines, typically vertical and equally spaced. A point in n-dimensional space is represented as a polyline with vertices on the parallel axes; the position of...
Sounds interesting !
I think you'd need something like this because each of your parameters are correlated in a potentially non-linear way (so standard correlations make no sense here probably).
The analytics folks at work like this type of chart. Personally I don't like it whatsoever because it takes too long to figure out what it's conveying so YMMV
parallel coordinate plots are what are used in MLflow, Optuna and co. for this type of thing.
got a mail from them for feedback/thoughts lol
cold mail ig
intriguing stuff though
what module did you use for this?
I'm just not sure what problem it solves
I rarely wanted to transpile TF to Torch or vice versa
If anything, transpiling MXnet to TF/Torch would be valuable lol. Maybe the deployment stuff is killer? Idk for now I'm happy with what I have but that could also just be classical programmer stockholm syndrome
Running into a weird issue with NumPy. I'm running a small little simulation, and at one point in computing the energy I compute this non-linear term using a little lambda function:python N = ( lambda u, v: (u + v) * ( 1 - (np.linalg.norm(u, axis=0) ** 2 + np.linalg.norm(v, axis=0) ** 2) / 2 ) / (2 * self.ε**2) ) but when I actually run my simulation I get a warning:```
simulation.py:266: RuntimeWarning: overflow encountered in multiply
lambda u, v: (u + v)
It's not really clear to me how or why this would be causing an overflow, since I'm passing two arrays of shape `(3, 275, 275)`, every element of which has magnitude less than 10. It's also an array of floats, so the whole "overflow" think *really* doesn't make sense to me, since floats don't tend to... well... do that.
can you post a more complete snippet please?
something that allow us to replicate your error in a potentially realistic setting would be perfect.
one use is integrating codebases written with different frameworks, especially in implementations of research papers
I'll put MXNet in the feedback if I do reply xd
Here's an attempt to trim it down as much as I can: https://paste.rs/rSPCt.py3.
The lambda function which causes the overflow is on line 238.
Have you had this problem in practice?
for research implementations, yes
- you seem to have pasted twice in your example
- don't use bare exceptions (your
raise Exceptionandtry: ... except: ...are often frowned upon (in fact have made my life much harder since i didn't realise you used it) - assuming you fixed the
Exceptionusage, you can usenp.seterr(all='raise')to make it raise an exception upon hitting that overflow - assuming you made it raise error instead, you can use
pdbto inspect exactly what is overflowing.
i don't know what you are trying to do so i can't comment more.
all i can tell you is here are the combination that caused your overflow https://paste.pythondiscord.com/U6WQ
let me know if you have a specific question.
Hi, I'm getting an error about Series.append, I've read a StackOverflow post
https://stackoverflow.com/questions/76102473/how-to-fix-attributeerror-series-object-has-no-attribute-append
that said it was deprecated, so I used concat. Concat throws the same error AttributeError: 'Series' object has no attribute 'concat'
I asked here, but the help section also suggested I ask in chat:
https://discord.com/channels/267624335836053506/1130588399758233690
series = pd.Series({'a':1,'b':2,'c':3,'d':4,'e':5})
#Adding
series['f'] = 6
new_series = series.concat(pd.Series({'g':7,'h':8,'i':9})) #AttributeError: 'Series' object has no attribute 'concat'
print(series)
print(new_series)```
you probably meant pd.concat instead of series.concat
!e
import pandas as pd
series = pd.Series({'a':1,'b':2,'c':3,'d':4,'e':5})
#Adding
series['f'] = 6
new_series = pd.concat([series, pd.Series({'g':7,'h':8,'i':9})])
print(series)
print(new_series)
@boreal gale :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | a 1
002 | b 2
003 | c 3
004 | d 4
005 | e 5
006 | f 6
007 | dtype: int64
008 | a 1
009 | b 2
010 | c 3
011 | d 4
... (truncated - too many lines)
Full output: https://paste.pythondiscord.com/47PRVYNTCI4K77HUCULV6XPUTU
awesome
Is the correct way to do exceptions to define something like a ConvergenceFailedException and then raise that?
I understand now. Thank you so much
I was using concat wrong, like you suggested
that would be one of the ways yes, and in fact that is my preferred way.
some people argue there is no need to define new exception classes when the built-in ValueError and similiar built-ins works, but i find that not very clear.
okay, so fixing all of that gets me this:```
Traceback (most recent call last):
File "simulation.py", line 338, in <module>
simulation.run()
File "simulation.py", line 52, in run
new_m = self.fixed_point_method.fixed_point(self.m, self.Δt)
File "simulation.py", line 301, in fixed_point
np.fft.fft2(Δt * N(previous_candidate, m) + Δt * self.Z),
File "simulation.py", line 271, in <lambda>
lambda u, v: (u + v)
FloatingPointError: overflow encountered in multiply
are you then recommending that I use pdb?
yes, so that you can inspect exactly what is the u and v passed into the lambda
Is there some way to make the exception a breakpoint? I've never used PDB before
i just usually use the pdb magic in an ipython session
Ah, it looks like I want a post-mortem.
Requirement already satisfied: kaggle>=1.3.9 in /usr/local/lib/python3.10/dist-packages (from tf-models-official>=2.5.1->object-detection==0.1) (1.5.15)
Requirement already satisfied: oauth2client in /usr/local/lib/python3.10/dist-packages (from tf-models-official>=2.5.1->object-detection==0.1) (4.1.3)
Requirement already satisfied: opencv-python-headless in /usr/local/lib/python3.10/dist-packages (from tf-models-official>=2.5.1->object-detection==0.1) (4.8.0.74)
Requirement already satisfied: psutil>=5.4.3 in /usr/local/lib/python3.10/dist-packages (from tf-models-official>=2.5.1->object-detection==0.1) (5.9.5)
Requirement already satisfied: py-cpuinfo>=3.3.0 in /usr/local/lib/python3.10/dist-packages (from tf-models-official>=2.5.1->object-detection==0.1) (9.0.0)
Collecting pyyaml<6.0,>=5.1 (from tf-models-official>=2.5.1->object-detection==0.1)
Using cached PyYAML-5.4.1.tar.gz (175 kB)
Installing build dependencies ... done
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
or that!
Ah... u is somehow exceptionally large. On the order of 10^127
that seems problematic
well its beautiful
can i volunteer to what youre doing?
Wdym by that
This is a very rough sketch lol
i find it attractive lol
🎷🦐 I actually do better graphic design stuff lmao
i mean, can i join you in what youre working on?
im no expert but i want to improve my skills so maybe working with someone might help
nice
Well this is part of my internship project and I go back to Europe by the end of next week lol so it's almost over haha
thats nice
what exactly did you do for your internship?
data science, computer vision?
Worked on quantum dots calibration using machine learning algorithm. My job was to improve the detection of charge stability zone by reworking how the angles of the lines of the diagrams were calculated
Very mysterious and weird title I know haha
boy thats dope... no idea what quantum dots are but thats dope
i like it lol... it sounds interesting
i will do my research on quantum dot
this is really nice
hey, can i send you a friend request?
Sure
thanks bro, please accept
how much should someone be paying for a trained model?
it really depends on what the model is trained for, the quality of data of the model, how large is the model, and a bunch of other things
how much should I ask for an object detector? it's the 320x320 mobilenet ssd version
bro it REALLY depends on your market, what price u want, what fair price is, the current market, how much u paid (if u paid) to train the model on a server, and if its even worth paying for your model when i can get one for free from huggingface
how do I get a model for free on HuggingFace?
go to huggingface. click models. u can download the model, clone it, deploy it, or use in diffusers
how much should I be paying someone for an objection detection model for a drone target? i only have 150 images, i just can't do it myself
it's not custom for my targets though
idk then
anyone here have experience with numerical analysis? i'm having an issue with some code i'm writing for base 10 rounding, and i don't really know if the problem is with the code or my understanding of the techniques behind it - if anyone has any recommended resources that'd be great
can you share the code? this is the kind of question where it might be hard to know if it's within one's ability to answer without more details
i don't think we have too many users specifically with numerical analysis experience
anyone able to help with scikit-learn KDE?
Hi guys, I am practicing visulization with plotly and ran into a problem .If you zoom in , you will find that when I hover on the graph it shows the name of the country and ** (trace 3)** . I want to remove this trace 0 but cannot find the way . If anyone knows then please help. I am sending the code of this plot creation `rows = 8
fig = make_subplots(
rows=rows, cols=1,
subplot_titles=("AFC Asian Cup", "African Cup of Nations", "Confederations Cup", "Copa América",'FIFA World Cup',"King's Cup",'UEFA Nations League','UEFA Euro')
)
cups = ["AFC Asian Cup", "African Cup of Nations", "Confederations Cup", "Copa América",'FIFA World Cup',"King's Cup",'UEFA Nations League','UEFA Euro']
plot the top 10 teams to won most number of matches in a tournament
for idx,cup in enumerate(cups):
fig.add_trace(go.Bar(
x=winning_home_teams.loc[cup].sort_values('total_matches_won',ascending=False)[:10].index, y=winning_home_teams.loc[cup].sort_values('total_matches_won',ascending=False)[:10].total_matches_won),
row=idx+1, col=1)
fig.update_layout(height=1100, width=1300,
title_text="Top 10 Home Teams winning most matches in particular tournament",showlegend=False)
fig.show()`
Hi everyone!
"I have watched a few videos on YouTube, but all of them use Jupyter Notebook as the IDE. Is it necessary to learn NumPy in Jupyter? Do you know of any NumPy tutorial that uses VS Code or PyCharm?"
It's not necessary, but an ipython environment is very convenient for the kind of quick experimentation you'd want to do when learning something like this.
And you can use IPython notebooks in VSCode as well, that's what I usually do!
Thanks DUDE!
Anyone do any work with scikit-learn KDE?
Hello everyone
Have someone encountered a problem in pandas where the csv data is well structured and pandas can print the data fully, but when it's sorting values pandas doesn't work and throws errors:
in sort_values
k = self._get_label_or_level_values(by, axis=axis)
_get_label_or_level_values
raise KeyError(key)
KeyError: 'general_average'
'general_average' column containts floats.
check whether you can do df['general_average'] - my first thought is that you're mistaken about what the column name is, like maybe it has an extra space somewhere.
if you can, then this is strange indeed.
I copy pasted the column name from myDataFrame.columns
Can i send you the data and test ?
here's the code overview along with errors after running
in case you haven't already figured it out, please note the whitespace.
omg I haven't noticed this whitespace at all, now it works.
thanks a lot @tidal bough and @boreal gale
How does the .apply() function work in the context of df.apply(lambda x: x**2)
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['a', 'b', 'c'])
The lambda function lambda x: x**2 is applied to the first column, which is represented by the Series x.
The elements in the first column, let's say column "A", are squared. For example, if the column "A" has values [1, 2, 3], squaring each element results in [1, 4, 9].
The resulting Series represents the squared values of the first column.
The lambda function lambda x: x**2 is applied to the second column, which is represented by the Series x.
The elements in the second column, let's say column "B", are squared. For example, if the column "B" has values [4, 5, 6], squaring each element results in [16, 25, 36].
The resulting Series represents the squared values of the second column.
The resulting Series from each column operation is concatenated back together to form a new DataFrame.
The resulting DataFrame has the same shape as the original DataFrame, where each element has been squared.
Is the intuition correct how the apply function works
can someone help me ?
How do I only keep parts of a dataframe in Dash? So I have a data frame in which one column has either True or False. In Dash app, I want the user to be able to see only dataframe with true, false or just both. I tried with radio items and dropdown option in Dash, but going in the dark when using the callback.
I have some bicycling GPS, power meter data from and I am researching the relationship between speed(m/s) and power (watts). Data point every 1sec. I am smoothing the data with a rolling(30)
If I plot x=speed, y=power, the slope is mostly negative, increasing power==decreasing speed, unexpected!
- I would like to view only the data when the slop is positive but not filter the negative slope data. More like, make the negative slope points invisible. How can I do this?
- When I try to plot the positive only slope, I still get mostly negative. I think my problem is mostly in the way I am plotting.
Here is what I have.
df[['distance', 'speed', 'power']].head(10)
2022-02-20 17:42:22+00:00,0.00,NaN,NaN
2022-02-20 17:42:24+00:00,3.89,5.440,155.0
2022-02-20 17:42:25+00:00,6.88,5.440,156.0
2022-02-20 17:42:26+00:00,10.39,5.813,397.0
2022-02-20 17:42:27+00:00,18.97,5.813,271.0
2022-02-20 17:42:28+00:00,28.99,5.934,271.0
df.rolling(30).mean().plot(x="speed", y="power")
plot 1
df['slope'] = (df['power'].rolling(30).mean() / df['speed'].rolling(30).mean()).diff()
df[df['slope'] > 0].rolling(30).mean().plot(x="speed", y="power")
plot 2
I don't understand the goal. Removing the negative slope points means filtering, but you don't want to filter? And the images you sent didn't embed.
The default behavior of apply is to go by the rows. That means that in the function lambda x: x**2 the x was another row each time.
Keep in mind this only worked because all of the elements in the dataframe were numbers. If one of the elements was say a string you would get an error, because your can't square a string.
Thanks for the reply.
How do I do a line plot with all the data but when the slope is less then 0 the points are invisible?
plot-1 all points, plot-2 only positive slope
I know these plots look like a mess. What's unexpected is that there is a lot of negative slope which implies an inverse relationship between power and speed. The data is sorted by timestamp. I should use a plot that maybe colors the points by time so that is more clear the direction of time along the line.
how did you get the power in watts?
could you provide a sample of your data?
is
df['slope'] = (df['power'].rolling(30).mean() / df['speed'].rolling(30).mean()).diff()
df[df['slope'] > 0].rolling(30).mean().plot(x="speed", y="power")
all you have done thus far?
meta just open sourced llamav2
👀
https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/
https://ai.meta.com/llama/
Hi all, i was trying to run falcon-7b-instruct llm on my local machine (RTX 3060 12GB VRAM, 16GB RAM), but the model itself is around ~13GB, so i'm getting the famous torch.cuda.CudaOutOfMemory error for a small amount of space, is there anyway to run this model locally with my current config? I've heard of accelerate & bitsandbytes libraries, can they help me achieve the same?
If this is not possible, can you atleast point me to some other open-source llms, with like 3-5B params?
Thanks.
Hello. Few days ago I asked about using a ML algo for predicting text. The code executed with 90% accuracy. To set the context, a sample query looks like this:
$startdate='20230301';$starttime='06:40:13';$verb='retrieve';$version='20230105';$application='mars';$class='od';$type='an';$stream='oper';$expver='0001';$retdate='20230228';$age='1';$nbdates='1';$reqno='6';$fields='4';$database='fdb';$bytes='41218640';$written='6014840';$interpolated='4';$writetarget='0';$cpu='0';$elapsed='0';$status='ok';$stopdate='20230301';$stoptime='06:40:14';$user='e487dfc54c';$category='basic|boundary_conditions|esa|valid_forecast';$account='b892ca6621';$abc='b892ca6621';$environment='batch';$date='20230228';$time='0000|0600|1200|1800';$step='00';$domain='g';$fieldset='sst';$resol='auto';$grid='0.25|0.25';
However the predict text looked like this:
43 228011 228012 228013 228014 243 244 245 229 230 231 232 213 212 8 9 228089 228090 228001 260121 260123 003020 228029 228251 228216 228217 228218 228219 228220 228221 260015 228050 151132' date '20201217' time '0000' step '126 127 128 129 130 131 132' anoffset '9' domain 'g' password '0860048d32' rdatabase 'fdb' startdate '20230301' starttime '00 00 20' verb 'retrieve' version '20230206' application 'mars' class 'rd' type 'an' stream 'oper' expver 'hy2w' retdate '20221222' age '69' nbdates '1' reqno '1' interpolated '0' cpu '0' elapsed '0' status 'fail' reason 'expected 288 got 0 request failed' stopdate '20230301' stoptime '00
While it is producing somewhat relevant output but most of the content is gibberish. I was wondering that with 90% accuracy why is the ml algo producing this output?
Appreciate you help in disecting this
I used RNN with LSTM variant
https://huggingface.co/TheBloke/LLaMA-7b-GPTQ have you looked into quantized models?
GPTQ is an algorithm that was dropped that allows quantizing (using a much lower float precision like 8, 4, 3, 2 etc to conserve memory with a minimal loss in preformance) general large language models, so you could use just google any model on this list with "GPTQ" and you're pretty likely to find something https://github.com/eugeneyan/open-llms
I saw one for falcon-7b-instruct even that was quantized to 4 bits but the hugging face repo said it was experimental and very slow
can you share your code for this?
there are a lot of moving parts and extra context needed to come up with an educated answer (i.e. are you doing character prediction or token prediction? have you removed any characters/tokens from the model's vocabulary/ dataset labels like $ and =, what's the architecture, etc)
hey this is really petty but im curious, how do i make pyplot return images with a perfectly square resolution? i can set the aspect ratio to equal but there's a slight difference between the left and right side if i add axis labels and its driving me crazy
I am entry level into my Data Science position. So forgive me for having basic programming skills. I am in the process of data cleaning a CSV file. One of the Date columns is in DayHourMinTimeZoneMonthYear, I need put the Time first followed by DayMonthYear. I’ve tried writing plenty of scripts myself for something so basic and I cannot for the life of me, get this working. Does anyone out there have a sample script or a script I could use to accomplish this task? I don’t have a mentor to lean on over here and I cannot figure this out. Thank you to anyone who can help or mentor.
are you using pandas? make sure that your columns are using the actual built-in datetime type (and not something else like strings) so you can use all the built-in datetime operations
see https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
use an ai data analyst
gpt-3.5 is actually pretty good at basic pandas idioms
code interpreter
it's not a bad learning tool
Using pandas, and you know what I didn’t check to see the format of my columns. I bet you that the problem
I can use gpt at work unfortunately. They blocked it
pandas work is much much easier when you exert precise control over the columns that you read. see the dtype=, parse_dates=, and infer_datetime_format= parameters in https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html as well as other related options
plus privacy issues
I bet my column is a string and that’s why it won’t recognize the date time format
it's very likely
Crap I feel like a idiot
@sullen sage it's a good habit to learn how to create representative minimal examples anyway. so you can pass those to chatgpt instead of your actual work data
welcome to being a beginner. this is how you learn
Thanks y’all, I just left work. I’ll play with it tomorrow. This discord has help so much with the minor details. I definitely appreciate all the advice.
Hello everyone, my model seems to experience difficulties in recognizing seizure frequency patterns, I trained it on 10 second eeg recording(2500 data points of raw eeg amplitude over 20 seconds )
Is there anything I can do to improve results
Like it seems to be able to recognize amplitude, but I terms of frequency over time(seizures have very high and rapid frequency with consistent amplitude over time) it seems to have trouble
Should Considering training it on power spectrum graphs rather than raw eeg recording?
I hate strings that are formatted as timestamps (as opposed to actual timestamp types)
all my hombies hate strings that are formatted as timestamps (as opposed to actual timestamp types)
unix timestamp or cry
Hey just a simple question, I'm thinking if neural network will learn the order of the best to worst. (will it extrapolate well?)
If I have the following player vs player data points:
player1 vs player2 -> player1 wins
player2 vs player3 -> player2 wins
So by logic player 1 is better than player 3, but will a simple neural network learn that correctly?
player1 vs player3 -> player1 wins
I had 3 data sources at work that used different methods of encoding time => road to crying
i hope that hombies aren't homie zombies (or is it zombie homies🤔)
you don't need a neural network for this. you can use something like topological sort.
I understand, but this is just a simplified model to explain the problem. I will have like 50 vs 50 players in the actual scenario.
no matter how many pairs of "A beats B" pairs you have, the problem will never be so complicated that a neural network would be a good solution.
if the problem is actually "(A, B) beats (B, C)", "(C, B, D) beats (A)", that is a different problem.
Anyone here have any experience with doing First Order Logic with Python? And: Can I use Python for all the same tasks as one uses Prolog?
do you mean stuff like this?
what exactly do you want to do? generate truth tables?
Im taking a class called Knowledge Representation and Reasoning
Its like (A,B,D,G,S,T,R,E,G) bests (D,S,F,E,R,G,F,D,P)- does (S,D,V,C,G,E,D,G,W) beat (W,E,G,E,T,D,F,T,Y)
But more letters & I have a lot of data to train the model
curious if anyone knows of a good data engineering server?
I see. If you simplify that to "A beats B" (where there's always one on each side), it no longer encapsulates the actual problem.
Stuff like this:
This tutorial is an introduction to first-order logic in Python. I am going to show you how to create a knowledge base, how to make inferences with forward ...
Which is done in python
Just wondering if using Python for this is better/as good as using Prolog
I've never heard of prolog.
Prolog is a logic programming language associated with artificial intelligence and computational linguistics.Prolog has its roots in first-order logic, a formal logic, and unlike many other programming languages, Prolog is intended primarily as a declarative programming language: the program logic is expressed in terms of relations, represented ...
I won't be able to tell you if Python is better suited for this than Prolog if I haven't used Prolog.
is your instructor allowing you to pick?
"Prolog is a logic programming language associated with ... computational linguistics"
I am a computational linguist, and I've never heard of it, so learning Python would be a better investment of your time.
Idk, course hasn’t started (starts mid august). Just trying to get ahead. Read 3 chapters; book mentions Prolog which makes me guess we’re gonna use it. But, I love python and would rather learn more Python than a new, niche language
you could use a neural network for this, yes. is it always 9 v 9?
yes
if you're wanting to spend time learning outside of class, your time would be best spent learning python. and once the course starts, you can just learn the bare minimum amount of prolog to get by.
does the order of letters on either side matter in any way?
yeah it does
why is that? do different positions represent a role that the player has on the team?
yeah they are different roles, so like in the first place you have usually either A, D or S and rarely other letters
But i got it now, you helped me already, thank you
prolog is definitely better at logic programming than python
however there might be a python library that implements a prolog-like logic programming system
i found this, which is old but maybe still works https://pypi.org/project/FLiP/
you're almost certainly better off using prolog for logic programming compared to python
if you really want to use python, i'm not sure that such an engine exists as a python library specifically. however there might be C or C++ libraries that implement first-order logic which you can call from python somewhat easily. or there might be an embeddable prolog implementation that you can use inside python.
actually it looks like SWI-Prolog is embeddable as a C library https://www.swi-prolog.org/pldoc/man?section=embedded
that should be usable from ctypes, cffi, or cython
Never heard of any of these things…
i'm sure that a set of high-level python bindings to swipl would be an interesting open-source contribution. but that would also be a lot more work than you would want to take on for a school assignment, unless it's a master's thesis
- ctypes is a built-in way for python to access functions in C libraries as well as work with C data types
- cffi is a 3rd-party library that does the same, but in a nicer interface
- cython is a superset of python that compiles to C code, and the resulting compiled library can be imported as a python module
SWIG would be another way to call C or C++ libraries from python. it generates C and python code that allows you to call the library
cffi has a good "goals" doc that describes its own goals + mentions some of these alternatives https://cffi.readthedocs.io/en/latest/goals.html#goals
but anyway this is quite a big yak to shave if you just want to do your school assignments in python instead of swi or scryer or whatever
Okok, gotcha (i think…)
So, one conclusion to draw from this:
Prolog is not old and useless; it’s actually a good language to learn?
it's not something you're likely to use at a typical programming job. but if you're interested in the craft of computer programming and/or logical reasoning, it will expand your horizons and provide a lot of interesting opportunities for hobby work, and maybe a couple of niche job opportunities.
if you want to learn it, check out https://www.metalevel.at/ and specifically their book https://www.metalevel.at/prolog + its associated youtube channel https://www.youtube.com/@ThePowerOfProlog
as for ai and nlp specifically, prolog has indeed turned out to be kind of a dead end for the time being
one of the big complaints about prolog in practice is that you end up having to worry about the implementation details of how the solution search algorithm works
some people don't mind that. but it's not a magic tool.
if you're interested even more in logic programming, check out minikanren, which is a different approach to logic programming. it's mostly a research toy though, it's not something people actually use that i know of.
what might be interesting is if you can get an llm to write and interact with prolog code, the way you can get them to use web searches and things like that
so the model itself doesn't need to be great at logical reasoning, it can write and execute its own prolog scripts
Very interesting!
Thanks again for loads of interesting info!!
I just used the bitsandbytes lib to convert it to 4bit precision, and it takes forever, i waited for 1.5 hours, still no output
Now i might go with 8bit quantisation, do you think it will work on my machine? (16gb ram, 12gb vram rtx 3060, falcon7b-instruct ~13gb)
I assume the regular falcon-7b-instruct model is 16 bit fp so yeah 8 bit quantization should cut that in half.
Agree. I know (some) Prolog. Markus Trizska (power of prolog) is the most up to date resource of learning modern prolog
hey zestar, I have a df w/ just 2 columns of dates & int values and I'm trying to use Panel's Trend component but can't get the plot to display. I've created another df w/ similar data and it shows fine. Any ideas?
I'm trying to find the average position of all positive points in a binary image using cv2, what is the most efficient way to do this?
numpy should work just fine
!e
import numpy as np
img = np.random.randint(2, size=(8, 8))
print(img)
pts = np.argwhere(img)
print(pts)
avg = pts.mean(axis=0)
print(avg)
@hasty grail :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | [[0 1 1 1 1 1 1 0]
002 | [0 0 1 1 1 1 1 1]
003 | [1 1 1 0 1 0 0 1]
004 | [1 1 0 1 0 0 0 1]
005 | [0 0 0 0 0 0 0 0]
006 | [0 1 1 1 1 1 0 0]
007 | [0 0 1 0 0 0 0 1]
008 | [0 1 0 0 1 1 1 0]]
009 | [[0 1]
010 | [0 2]
011 | [0 3]
... (truncated - too many lines)
Full output: https://paste.pythondiscord.com/JO2NVMSE2AW4VUKKYZQKKCRTSE
i was stalked, robbed, "assualted" and harrassed by several people... i tried telling the police but it seems their attention is inclined to arresting people who lack mommy and daddies political connections. i was thinking of looking into worm gpt/poison gpt for ethical purposes.
where may i download worm gpt repository?
wihout paying a monthly subscription?
<@&831776746206265384>
rules 5 and 8
using the word "ethical" isn't a magic spell to make something ethical. It sounds like you're looking for something that will help you commit crimes.
what do you suggest doing then?
police / government organisations / law enforcement etc
possibly somewhat out of scope, are there any efforts to selectively navigate/search/scrape similar contents on different websites?
like for example i want to be able to search for lamps then scrape the title desc price for each result
tailoring such a scraper for one specific website is easy enough but i’d like something i can just feed a list of websites into and avoid manually finding the correct selectors for each site
Hello! I am doing my first challenge for classes and can't seem to get my csv file to open in vscode so that I can write a script showing my different values can somebody assist?
someone can probably assist
Sure. You would need a way to determine similarity between products based on the features you have extracted about them
i think this is very much an "AI" problem, but it's not easy and anyone who has figured it out reliably is probably not sharing how it works
it's like object recognition in images but for chunks of html
not really though.
You can look at the maker, model, serial numbers, upc codes, etc.
i interpreted their question to be asking about automating web scraping by learning page structure rather than personally figuring out each page structure and writing a bespoke scraper for each page type
if we want to take it that far, they may want to look into entity extraction and stuff.
But tbh, it's faster to make sure metadata is extracted
I have been learning data science in Python for some time now. Recently, I started exploring the Numpy library through YouTube tutorials. However, I have never used Jupyter Notebook before, and most of the tutorials available online are demonstrated in Jupyter Notebook.
I find the method of writing and printing code in Jupyter Notebook different from what I am used to. As a result, I am having some trouble understanding it.
Is there any place or resource where I can learn Numpy specifically using the PyCharm IDE? I believe learning it in my familiar IDE will make the process smoother and help me grasp the concepts better.
you can use the same "interactive" workflow using the interactive console built into pycharm
the code is the same of course. you might also be interested in Jupytext which creates "cells" in plain .py files by delimiting them with specially formatted comments, but can still be rendered to HTML https://jupytext.readthedocs.io/en/latest/
heavily inspired by a system in the R language called R Markdown and a long lineage of similar systems dating back many years called Weave
numpy is just a python library. you don't need to use jupyter to use it.
you can think of jupyter notebooks as one continuous python program, and every time you run a cell (even if you've run it before), it appends all the lines of code in that cell to the end of the program.
Thanks
I fear that ai in the next 20 years will replace us not now now its not that good but in 20 years its definitely replacing us any thoughts?
Hello all, I’m having troubles with configuring my training data, with the best data type to feed into my training model that do binary classification of seizure and no seizure on EEG recording, what are the pros and cons of feeding my model samples of raw eeg amplitude vs time, power spectrums of the samples, and Mel spectrogram of the samples?
who is "us" in this context? and replace us in what way?
because "AI replaces humans", without any other specification, means "there being no more humans, and there instead being AIs"
like, no more humans in the world.
im facing problem with qvals being toooooo weird, im doing DeepQN, and one action is always greater in value, but there are negative rewards to train, so im confused 😐
guys can anyone basing on his experience could guide me a bit how to jump into this ai/ml field? I've read that i need solid math foundation so now im taking these lessons on the khan academy. Should i be getting acquainted with the python libraries like pytorch sci kit etc or just focus on math firstly? If anyone can give me an advice i would be glad
keep in mind that no matter how much self-study you do, you still need a university degree that is relevant to AI to be taken seriously by employers in this space.
as far as the math goes, I would start with prob/stat and linear algebra.
while being able to use the different python libraries is important, you do not want to learn in terms of the libraries. if you are "learning pytorch" or "learning sklearn", you are doing it wrong. you want to learn how to do different things like "train a logistic regression classifier on housing data", which could involve learning and using specific parts of several libraries.
Software developer and programmers
Not likely. Being a developer/programmer is about way more than just writing code.
How
I don't have time to go into all of it, but it's also about deciding what to code.
Can you make summary or say half
yeah i'll be starting my university bachelor's degree in the october but it is Computer Science Automation Systems major. Do you think it's will be enough or i might be needing to finish master with strictly ai major?
what is "computer science automation systems"?
is "automation systems" a concentration within computer science?
Man I bet these jobs are super difficult to do once you get hired with all these requirements.
Ai will replace the coding part in the next 20 years but not other stuff that programmers and software developers do ?
And what are these other stuff
What does a Lead NLP Data Scientist do?
I read the description and it still doesn't make sense just says phd basically pays a lot though I want that job
yeah its 70% of computer science major and maybe like 30% for microcontrolers etc
sounds like a late-career position for someone who has many years of experience developing models that use natural language.
Correct
Hii, with regression is it correct that it predicts values of continous data?
Oh I just learned about that man I forget though
I need answers
I think it's a very simple model and it isn't continous
@hot hazel this channel isn't a place to engage in doomerist speculation. ultimately, no one knows what the demand for developrs will be in 20 years.
What are other stuff that programmers and software developers do
yes
ask in #career-advice
Do you think that i might be able to land my first ai job/intern just with bachelor degree but in the meantime focusing on my own portfolio? Or master/phd currently is required
I'm not sure how helpful knowing about microcontrollers would be. maybe if you want to go into robotics. but I would talk to your academic advisor about how you can position yourself to do an MS in computer science that focuses on AI
Master phd required wtf
you can get AI related internships as an undergrad (my department has some right now). but you can only get an AI job with just a bachelors if your experience as an undergrad is very closely related to AI (which is rare)
at what age did you start learning machine learning stuff anyone
AI development is basically always research, so AI has some of the highest education requirements of any kind of development.
23
niceee
also you were looking at a lead position. not an entry level one.
I like to start big and work down
well that's not going to work for job seeking.
I was going to use it for a case study for my certification project
are there other numpy libraries on other languages?
yeah got it man. Highly appreciate your response, im getting into this ai field and it does kinda intrigue me. Let's see how long it will last. I will be focusing on my first year university on learning math, ml frameworks and trying to build some simple models to link them to my github. But read some articles where people were saying that phd was required and im not sure if i wanna spend around 8 years still studying where others might be starting making money and developing their career path in IT in the same time. But if you said its possible after bachelor i will be grinding hard 😄
you should brace yourself for needing to get an MS. I'm pursuing one now while working full time.
if you want to start working with only a bachelors, probably the best way to do that would be to intern with an AI company that will let you transition to working for them full time once you graduate.
There is a whole gradient of useful applied topics 0 to AI. It's great to have an end goal, but I would caution against how practical it is to learn that much mathematics and computer science in only the span of a bachelors.
MS degree have no time limit right? I can take it for years?
I don't have much resources
mostly time and money
Oh okay great so you pursue ms while working full time its great then. Just had this vision in my mind that don't wanna finish like being a 29 man that study for phd live with his mom and is broke 😂 while most of the people have majority of things set like job/car etc. I think that's the key as you mentioned to working in this field while pursuing master degree
Any reasonable institution is going to have a time limit. It'll be flexible but you are not going to be allowed to take 10 years barring something exceptional
or maybe I'll get 1 year into ms and start applying as an undergrad?
if you take too long, and they've updated the requirements since you started, they might force you to go by the new requirements, which could invalidate courses you've already taken
requirements by you mean the units taken?
"units" is not a term used by universities in America, so idk what you're referring to.
could it be credits?
more like, they might remove a course that you've taken from the requirements, and add a course that you didn't take
"course" might be what you call "module"
for SQL, try #databases
Quick question, Is Google data certification science/data analytics worth taking it?
so do you mean when are courses that have required courses to complete, and so when they update the requirements I might have to re-take?
not re-take a course you already took, but take a new course.
No
What do you recommend, trying to get into that field in data science/analysis before I taken a data courses and 1.5 year to graduate
so you are currently pursuing a bachelors degree in computer science, and will graduate in 1.5 years?
How come when I trained a LSTM with 1 layer and dense output layer to predict btc price for 1 day on 45k data samples of btc price history it only took 30 seconds to train? shouldn't it take MUCH longer? When training epochs it showed step loss (if i said that right)
Yes. Getting a Bach in 1 and half year
I would focus on taking as many AI/ML related courses as you can
got it. I do have got ML major course that mostly likely be taken in the spring semester. I can post what I got in my electives.
what about the math though?
I was thinking of getting a certifications in data. Then start project and also easily pass those courses as well
What math you be taking
I'm thinking of taking Tensors and Calculus
will the MS will have lots of tensor math and calculus?
guys have any of you finished Machine Learning Specialization course by andrew ng?
yes currently i'm on my week4
What do you mean by taking tensors. It’s a topic usually introduced in a more advanced math course, it’s not an area of math like Calculus.
can you reccomend it G?
Very much I have to lookup to khan academy though
Yeah a lot of math prolly, isnt it? Its like 49$ monthly or coursera? Im also into this specific course
what I am thinking is that there will be advanced mathematics like tensors on the MS that I might not be familiar with
yeah the math blows me away, like eigenvalues
my question is more like how do I prepare for the math before taking MS courses
look at pv bro
I don’t know what kind of answer you are expecting other than that you prepare for mathematics by studying mathematics. In my opinion you should at least get to the level of linear algebra fluency that eigenvalues are not a topic that blows you away.
yeah so, eigenvalues it took me quite a while to be comfortable with them so that is why I'm a bit scared on what lies ahead
I am messing around with the Fourier Transform and some data from a network device (bits per second sampled on a consistent interval). My sample spans 2 weeks, and if I am interpreting this correctly, the largest component frequency of the DFT is 336 hours (aka 2 weeks). Is the fundamental frequency typically the same period as the sample, or have I messed up?
poly_features = PolynomialFeatures(degree=3) # You can adjust the degree as needed
X_poly = poly_features.fit_transform(features)
# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_poly, target, test_size=0.2, random_state=42)
# Step 5: Model Training and Evaluation
model = LinearRegression()
model.fit(X_train, y_train)```
can someone please explain why we are using LinearRegression() here? even tho I intend to use polynomial regression
A polynomial regression model is linear in the polynomial's coefficients. PolynomialFeatures essentially converts the data into a format you can use linear regression on.
so it takes the coefficients and the uses linear reression to do the rest of the calculations?
Sure I guess? It's just the same linear algebra after a certain point. It's just over the vector space of polynomials
@merry ridge do you mind if i dm you?
I'm not really a good person to talk to. 95% of my work is in Mathematics at this point. I haven't even pushed anything to my github is almost 15 months.
I just have a few questions about the mathematics of polynomial regression
then ask them here
X = health.iloc[:,:-1]
y = health.iloc[:,-1]
Could somebody explain how iloc works with the code in the brackets. I assume the code is a slice but I dont understand it
You have two axes, rows and columns. So the first is your row axes slice, and second is your column axes slice
Do you know how to slice two dimensional numpy arrays?
Going by memory you use loc for assignments. I get the feeling your code is off. Generally loc/iloc is on LHS of the assignment. If you search geeksforgeeks they have lot of examples. Note: It been long time my memory could be off.
you can use loc and iloc for both assignment and just getting slices of the dataframe. and I don't recommend geeksforgeeks.
Yeah I was confused for some reason but its just selecting all rows and all columns except the last one for X and all rows and only the last column for y
Thanks
but iloc is by index loc is value
For a HIGHLY CARDINAL DATASET
When does it make sense to use Hashing Encoder followed by dense layer as a replacement of embedding layer before GRU/LSTM?
or is it even justified?
Hello everyone, I'm working on a multitask model that's good on predicting if something is there or not (classification) and regressing the bounding box of that class. I have 25 classes and I have 5 neurons on my last layer for each class (1 for classification and 4 for the bounding box) so 125 neurons. I'm trying to add an auxiliary task to predict some angle (between different objects calculated with centers etc). I added another neuron that predicts the angle. The target angle is in radius. Which loss function should I use for this task? It doesn't make sens to use MSE since it doesn't take into account the circular property so 360° would be far from -1. I tried to use 1 - torch.cos(angle - target_angle).mean() but i'm not getting good results and the loss is always too small for some reason.
Thank you in advance for helping me.
min(360 - (true-pred), (true-pred))
how about this
true-pred without abs?
i can try, i don't have the intuition behind this
thank you very much
are your prediction containing angle like -30 deg , -x deg etc?
its in radius, the true values are between -pi/2 and pi/2
and my prediction since i didnt clip them can be any value
if its in radians, and also negative, then object 1 and object 2 might be distinct, not similar right?
i don't know which objects you are referring to, i have multiple bounding boxes and i take the center of 4 of them and calculate the angle between the lines created by (center1 center 2) and (center3 center4)
and the model needs to predict that angle
if lines were not distinct, both would be 30 degree/radian whatever
because lines are different color, second angle become negative as angle is measured red to blue
I think you should consider this
what i mean is, if instead of identifying 4 areas in image as c1, c2,c3,c4 respectively,
if your model identifies same areas as c2,c1, c4,c3. will it be a problem?
no
but then angle would change:
heres an example
because i'm not calculating the angle from what he identifies, i'm asking him to give me the value directly and that i will be comparing to the right value that is calculated from the right c1 c2 c3 c4 but the predicted angle is not calculated from the predicted c1 c2 c3 c4
in actual dataset what are those 4 points?
i mean can you give the names to them/ call them something? like TV, edge etc
yes, so i have images of a heart and the heart has some keypoints and we use that to calculate the angle the heart is making
then there is a high probability, c1, c2, c3, c4 should be identified in correct order.
angle should be given by
m1 and m2 are slope( can be calculated using 2 points we have for each line
for the true angle, i'm using c1 c2 c3 c4 in the right order
for the model prediction, the model is outputting a float value directly, where does the order of c1 c2 c3 c4 come here?
i dont know, maybe someone else can help
i think this is besides the point, imagine i have a true angle and a predicted angle what loss function would you use?
i would need to see dataset, i am unable to conclude some things
thank you anyway
i think mae/mse is good
reason: model didnt do two circular round and outputs 359 instead of 0. it doesn't mean it was calculating the outside angle(360-theta). if you try to compensate, for that, at test time, you may not know what to choose. you should compensate if ground truth mention theta, 360-theta both as correcct pred
1 - torch.cos(angle - target_angle).mean() which you're using would be my first idea, too. It's even equivalent to MSE for a small difference in angles.
there's also the fancy distance from https://stats.stackexchange.com/a/555464:
which goes to infinity when the angles are different by π
I would avoid the angle altogether and instead predict cos(Ang) and sin(Ang) and then normalize to sin²+cos²=1.
This avoids the problem of the angle looping around. I also think it will be easier for the network to train to
Using the mean-squared-error on the direction vector is actually the same thing as using the cosine distance on the angle:
thank you @tidal bough @mighty patio
Getting this
AttributeError: module 'torch.nn' has no attribute 'view'
on this line in Colab
model = Autoencoder()
Can anyone help?
class Autoencoder(nn.Module):
def __init__(self):
super(Autoencoder, self).__init__()
self.encoder = nn.Sequential(
nn.Conv2d(3, 128, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.Conv2d(128, 64, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.Flatten(),
)
self.decoder = nn.Sequential(
nn.Linear(64, 128 * 7 * 7),
nn.ReLU(),
torch.view((128, 7, 7)),
nn.ConvTranspose2d(128, 64, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.ConvTranspose2d(64, 3, kernel_size=3, stride=2, padding=1),
nn.Tanh(),
)
def forward(self, x):
encoded = self.encoder(x)
decoded = self.decoder(encoded)
return decoded
why do you have torch.view in the sequential?
is it not supposed to be there? whjere should it go
i don't know what you want to achieve with it, but everything that's in sequential needs to be an instance of nn.Module
can LLaMA adapt/learn during usage?
how to refactor my code to not use nn.sequential?
can you tell me what do you want to do exactly with the torch.view?
i meant nn view not torch view
this autencoder is supposed to take a gray scale image (CIFAR10) and convert to colour
No, cell states can get updated but those will change as the conversation changes. The model doesn't train during inference if that's what you mean.
How much programming do I need in order to get into data science?
a lot
"data science" is essentially the intersection of math/statistics/analysis and programming. you'll spend most of your time cleaning, munging, transforming, and otherwise wrangling data. but you get paid for the analysis. both are done by programming.
What type of course is that
Currently started my bachelor degree in Computer Science and currently doing sql database and shit college algebra. I got 3 data classes, bunch of programming with Java and have 2 maths
if you haven't taken probability and statistics, do so. it's essentially required for "data science". two semesters would be even better.
a systems modelling class, if offered at your university, would also be a plus
How does one actually clean a data?
that depends on context, if the data you want is what you need when you need it, then in all appropriateness you have clean data..
This is what I need to do to complete my degree
Plan on doing prob/stats in spring while discrete in September because I need do to that in order do other computer classes
I know they use spreadsheets and little on python
well, there you go then
data dumps will often contain everything from malformed text, misplaced decimal points, incorrect data, etc.
if you can imagine something going wrong, some data set somewhere will have it
Also might go ibm or google data science to get started before started those courses during off time if you recommend it
let me give you an example, I once had to deal with stock price data from the London Stock Exchange... we noticed that one stock had the price increase by 100x at some point
turns out that the LSE prices in both GBP and pence. and switches for some stocks at random times. the data set was supposed to have already been adjusted for this. but it was not.
another was a data set of convertible bonds. many of the prices were not adjusted for corporate actions (splits, etc). I've seen data sets of temperatures where some was in farenheit and some in celsius... but nothing to indicate which was in which.
data sets where missing data is noted by various things, ranging from nothing to 0 to -1 to "NULL".
the worst is when the different values are used in the same data set... sigh
I once had to parse a csv file full of floats, where the column separator was , and the decimal separator was, guess what, also ,.
lol, damn germans
oh, I once had to deal with a data set with fixed with columns... but sometimes, if the data wouldn't fit, it would spill over into the next column. <grrr>
Sounds like data science has lot's of grindy uncool work
well, you don't get paid to do the grunt work, it's just part of the job. you get paid for the analysis, conclusions and presentation
but you can't effectively clean the data if you don't know what the data should look like. so...
And i assume coding neural nets is like least priority?
I've completed Statistics, Probability, Linear Algebra and Calculus lectures now I am going to learn Linear Regression but I've not spent time with data cleaning exploring stuff. What should I do?
there's no direct connection between build AI models and "data science" except in so far as both require data prep pipelines
have you covered various ML techniques like k-means clustering yet?
Havent touched any single algorithm
How much is the effort to migrate cuda training code to amd or intel
Is it reliable at all?
I've good exp in Python so I skipped the Data Analysis part
ah, then look into that. some of the clustering heuristics are quite interesting. you've got the linear algebra already so you should be able to understand most of them
Okay but I want to start with ML and DL asap
Not with that data cleaning and exploring stuff is it a good approach for a beginner?
data cleaning can technically be done by any random programmer. the issue is knowing what good data looks like vs bad data
to wit, it's not writing the code that takes a long time, it's looking at the data to find problems and figuring out if it can be fixed or not
Yeah it makes sense in real world or jobs. But for exploring and understanding algorithms Can I skip those steps?
probably
unless "exploring" covers things like SQL queries
you might want to do that in a class if your SQL is weak
Yeah I dont like SQL I have used ORMs
have you read Celko's "SQL for Smarties"?
SQL is a full, turing complete, programming language. one of the few declarative languages that is in widespread use.
you can compress pages of python into a few lines of SQL. and vice versa 🙂 it's a great tool for some jobs and a shit tool for others.
But why does machine learning and data science always together?
No havent read that, but now I am taking SQL more seriously as everyone saying its very crucial in Data Science. I didnt find raw queiries that much of use in Software Dev.
Like in books courses etc..
mostly because of the various clustering, optimization, feature reduction, etc algos that are considered part of "machine learning"
Not an expert on this but I guess is essential to understand the data you are feeding your machine learning algorithm
Should I focus more on Stats or Calculus?
you know what my answer will be 🙂
Both?
yup. or hell, take a stochastic calc class. that's calculus where the variables are probability distributions 🙂
mind bendy stuff
lol, no
that's stats 101
you know in normal calc, you might have an integral for x and y over some range?
well, instead of being simple scalar variables, in stochastic calc, they are probabilities
oh okay sound great, I better start with ML first then it makes more sense to learn it after that.
probably. I doubt you'd need stochastic calc for most DS jobs. but if you knew it, you could make everyone else feel inferior 🙂
I found a great weapon I might definately consider it in future.
I am currently taken sql course and it’s slightly random because every week is something new. and sometime won’t over the the old material
I can't do anything about your instructor or lesson plans
Personally, that reason why I want to do online certification and project to get more understanding on data science
This is a sideview of some terrain. You can see the ground, and in the middle some trees. What would be a good way to fit a line or a set of points to the ground curve. I don't think taking the minimum y value pixel at each x value would be good as there might be some holes in the ground in my data.
that's a pretty steep hill
looks like a least squares regression homework problem
This might be a good example of where the image isn't perfect (occluded on the left)
That would be affected by the trees I would think.
so remove the trees
They are also not really points, but just pixels
how?
pixels are just points
by hand, of course. the first step to any data analysis is manual scrubbing of the data
...
The demo data I work with has 128 of these, and the data I will work with later maybe few ten thousands
Not an option
if you really are against doing that because of classwork "rules", then just take the min y value for every x
Read the original message, and take this image as an example.
It's not classwork, just personal project for now
I fail to see why min y value of every x wouldn't work
being afraid to to do manual work is a bad sign for "data scientists"
but whatever floats your boat
If anyone has any suggestion, feel free to ping 🙂
yes, and? then run a least squares regression on the red data
guess you could smooth it first with a moving average to remove transient spikes, but I doubt the results would be that different
I ordered a esp32 for my data collection project. I dunno what to collect yet but maybe I'll make a plant classifier
Kinda excited about it lol
anyone here used a siamese neural network before?
are there any good datasets for training a model to detect classes of cars? by this i mean something similar to Stanford's car dataset, but with moremuch much more than 44 images per car.
this is a great idea but let's not be derisive if someone didn't have a good idea that you had
i don't think i'd have thought of it either
lowest pixel in each column + loess/lowess would probably do the job of moving average + linear regression
that way it can actually be nonlinear
Local regression or local polynomial regression, also known as moving regression, is a generalization of the moving average and polynomial regression.
Its most common methods, initially developed for scatterplot smoothing, are LOESS (locally estimated scatterplot smoothing) and LOWESS (locally weighted scatterplot smoothing), both pronounced . T...
I have, though I'm not the most knowledgeable on them. What's up?
I don't know how to articulate this, but as a preface tl:dr; I made a script that turns a documentation website into a really naïve knowledge graph for actually useful embeddings. I'm trying to figure out a generic way to get the selector for main documentation content of most websites. Does this seem like a good idea? ```document.querySelector('h1').parentElement.parentElement.children.length
32
document.querySelector('h1').parentElement.children.length
1
document.querySelector('h1').parentElement.parentElement.parentElement.children.length
3
My goal is automating my own version of those "Chat with your documentation with sources included!!!!" and this is the last bit to making it pretty much insert a URL and you'll have a somewhat useful embedding to chat with.
Ibm or google certification? Which is better or there is better route?
if anybody is good with Pyarrow and has any feedback I would aprecciate it:
#1131861713910763580
most of the time you will spend will be on that, not on building deep learning models. And when you get a job, in most companies the model would be already there in some cases. You don't always spend your days only building models. So if you want to do this, you need to learn the not so cool stuff too.
what do you want to achieve exactly? where do you see yourself and why do you need a certification?
I'm confused regarding the difference between neural networks and ML. Given a linear regression model (an ML model) designed to identify whether something is a windmill or not would be trained with a bunch of pictures of windmills, where it'll autonomously create its own patterns (and weight them!) during the training phase.
This trained model can now tell us whether an image is a windmill or not (to a questionable accuracy).
Through my research, this is not considered the use of a neural network, which confuses me, as it seems like a lot of this is done "under the hood", especially the creation and identification of patterns.
To get experience, and knowledge and start before I graduated or taken those courses at the university
Built my portfolio as well if need be
I did the same, but I tried to do it horizontally not vertically since its a really broad field. I tried to study from everything to get a sense of it and choose what I liked the most. So I would recommend you take a look at most things. Start by having a strong foundation in the needed mathemathics (linear algebra, propability, statistics etc maybe even monte carlo methods) and in python ( you can do a project for your portfolio with python, maybe a webscrapping thing that you can build on afterwards). After python try to get courses in datavisualization, learn the basics, you can add a dataviz project to your portfolio with the data you scrapped from the python project. Try then to take a machine learning course (Andrew ng has a really good one) and get a sense of most of machine learning algorithms. You can take a small kaggle competition and try to use machine learning to find a solution and test most of the algorithms. You can also take a course in data analysis, expoloratory methods and things like clustering, pca, mca etc... Then jump into deep learning try to learn the foundation. Try a project for your portfolio along the way. After that as i said you can try things horizontally, you can try some computer vision, some nlp, some graph neural networks, some reinforcement learning etc and try to find what you like the most. You can easily find projects for your portfolio along the way. I'm a recent graduate and this is what I did to learn. For nlp, i tried to generate arabic poetry. For computer vision I tried a competition of classification of sign language. For Reinforcement learning I tried to beat the mini black jack (from gym library).
Another thing is to land a job you will sometimes have coding interviews so you need to get good at that too, so try leetcode. This website is really good https://interviews.school/
What I also did is that I created a git repo where I keep track of things I read or learn from. This is my repo if it may help, you can do something like this: https://github.com/ahmedbelgacem/awesome-datascience
Appreciate. I have token Python but still trash it and still on this on how I got B+ on it. Currently taken college algebra and really hated because it's pretty there with no use. I do have discrete and stats classes. I'll focus on those when I get there.
Is linear algebra is same as college algebra? @lapis sequoia
you don't really need to think about grades, try to make a project with it and you will quickly learn. Try to do something you're intrested in. For exemple if you looking for a new flat to rent you can make a script that scraps websites and looks for flats and sends you an sms when a flat is available
i don't really know what college algebra is
Ok. College algebra is basically that course everyone need go take if they transfer
@lapis sequoia
Computer Sciences
i was asking about the name of the uni
Franklin University
Its a online school
The downside personally is a 12-week course
The best option that I would like to work on is Python,that related to data science @lapis sequoia
Guys i already finished my linear algebra course on khan academy which one should i take next in your opinion?(i wanna have a good foundation for ml/ai):
-Multivariable Calculus
-Differential Equations
-Statistics Probability
I told you already 😛
All of the above?
oh fr mate sorry i forgot
have started with linear algebra already and im nearly finishing it so the next one you reccomend is prob/statistics so lets go for them. After i finished it you think order of this 2 plays a role?
-Multivariable Calculus
-Differential Equations
Or it wont matter much
after probstats, do multivariate calculus
Thanks man a lot, do u think these 4 fields of math will give me solid foundation to start ml/ai? Is there anything i should be acquainted with to understand ai/ml better on the beginning?
I regret not studying enough of derivatives
Is it even a good idea to stick just to the math at the beginning and after ill learnt those 4 fields of math start ml/ai then?
i want to emphasize that i know python at a good level ig
you just want to learn enough so you have intuition when facing an equation
it seems to be more reasonable to learn all this math first then start to build models
rather than doing it in the opposite order
you might be right, try to check logistic regression every one course you take and see if it makes sense
check especially the gradient descent if it makes sense if not then study more calculus
so additionaly check logistic regression and gradient descent right?
just make sure you don’t fall into the trap of spending unreasonable amount of time in just the math learning phase and not transition to actually doing projects. Get a foundation and each project will hint at you to learn which math
Also there are tons of pre-built libraries out there which abstract a lot of the math. Many things you’ll find to have been “nice to learn” but not super requirements.
Get a foundation > Do actual projects > Let project demands and curiosity drive future learning.
df_agent = df[df['agent_i'] == agent]
is this multi index?
yeah thanks but u think its neccesary to finish all this 4 courses before doing projects?
-linear algebra
-probability/statistics
-multivariable calculus
-differential equations
or i should be trying to build first projects learn libraries in the meantime?
because dont know how to approach it
There are sort of 2 parts to ML, there's the theory and the math. Knowledge of the theory is all you need to use high level libraries like pytorch and keras. The math helps you with diagnosing problems, coming up with new ways to use your models, and understanding why certain results happen/quantifying preformance. I agree with ashe that you can start on projects before being knee deep in all these math topics provided you have a working knowledge of theory.
yeah understanding underlying math will let me be more efficient in creating my models
What ides are u using for ml/ai? I've seen multiple people using Jupyter Notebook
Just whatever you like, there's not really any benefit to any IDE for ML specifically
I use vscode
i like jetbrain's ides
Pycharm?
I have walked middle school classes through ML projects where they were able to create their own classifiers. Unless you're specifically trying to build or modify your own model architectures, I don't really think any of these things are necessary to start projects in ML. In fact, I'd recommend at least taking a few out of the box/open source models and seeing what you can do with them, perhaps even before taking a lot of these classes so you have a frame of reference for how these topics are utilized in the real world and can start to form questions about how they work. Seeking out and learning the answers to those questions will often be a far more valuable and robust way of learning than trying to learn all of the theory first and then trying to utilize it in the real world after the fact.
The only caveat that I will add is there will be some trial and error doing it this way, which you'll have to understand and accept. Especially if you're self-guided, I wouldn't expect to actually complete the first projects you start. You should try to do as much as you can until you hit a wall, then take a break, learn some more, and either come back to it or use your experience + what you learned to start a new project that you're interested in
anyone here willing to teach me machine learning in python?(just trying to make some projects so that it may increase the chances of me getting into a university)
(also i like python so yea)
only you can teach yourself
There are plenty of YouTube channels devoted to intro ML concepts. I'd recommend looking up something you'd be interested in building and trying to find a model that can achieve that. From there, look for tutorials on it/something similar. Alternatively, you can just shop around for tutorials until you find one that clicks with you on a topic you enjoy. The earlier you just start trying (and more importantly failing) things, the better off you'll be!
hm k
no u dont need to take a single class to learn ML
find out WHY you want to learn ML/AI
need the math foundation though so the papers can make sense
I'm not sure you need to look at papers if you're brand new to ML. What scenario were you thinking of where you'd need to?
No not for someone new to ML but for the years that will carry forward. For purposes of plain research and exploration
Ah alright, I reasonably agree then! I think the math is very important once you've got a solid understanding of things and are looking to get into the weeds of optimization or experimentation. Also not saying it's bad to have the background when starting out, but it's by no means necessary
What could be the reason that my model is reporting very poor accuracy during training but giving me perfect recall/precision when I test it after?
You're testing with data used in training?
yes, the goal was to attempt to overfit a model just to ensure that I could reach high accuracy on a small subset of the dataset
it's giving me around 0.07 accuracy with near 0 loss, so I wanted to investigate further, and I found that the model is actually performing perfectly
I suppose I'm a bit confused. You trained your model on N samples, and that model had accuracy reporting every X epochs. Then, you took a subset of N and used it in a validation test and it reported both perfect precision and recall? What data is being used in the accuracy calculation?
yup i was confused too, i just found the issue though
basically i was using a subset sampler for this training, and the problem was to calculate accuracy i was using len(dataloader.dataset)
but the problem was, that returned the full dataset whereas the sampler would actually only return a small portion of it
do you know if there's a better way to get the number of samples in a dataloader as opposed to just tracking the number of samples that the code receives each epoch?
Better way as opposed to what? Len(dataloader.dataset)?
yeah, because the subset sampler means that it's not using the full dataset
that was my initial code which caused the problem
I imagine the data used in training is being passed or accessed somewhere no?
yeah, i could just track it bit by bit
but i think len(dataloader.sampler) is working for now
as long as I have some type of sampler
I find it interesting and intruguing asf and wanna build some models for myself
Hey I just recently learn basics of python and now I want to learn AI can someone who already in this field help
I'd read my messages from a couple hours ago replying to someone with a similar question. Maybe you can find someone who would be willing to guide you, but you can start whenever you want if you guide yourself
Has someone been able to make a PDF to TEX converter that works reliably? Possibly making use of object detection and the TF API?
models for what
thats a really shallow answer
Hii, can someone explain this for me? There is a decision tree expresivemness called N-XOR. I know that XOR is basically when one input is true the output is true (out of for example two attribuets). But Is N-XOR when its the theoretical output with n attributes, applying the XOR qualities of only one is true?
for recognizing letters in photos firstly the next one maybe for captcha recognition
im a relative new beginner in python and i'm interested in making mlb game score predictor. would anyone be interested in assisting me with this on github?
I've been studying machine learning for a while and nowadays, I've a question stuck in my head. I wanted to make money while I'm at university but I'm confused about how to do something with artifical intelligence. I mean, game developers create some game, publish it and they make money but machine learning engineers? You may think of creating a chatbot or something which is pretty common or some prediction apps...
the only way you're going to make money doing ML as an undergraduate student are ML-related internships over the summer. trying to attempt it any other way would be a waste of your time.
Well, look at mark zuckerberg, he created facebook as a student. I wanted to create something while I'm still at uni but artifical intelligence, which im interested in, doesn't seem to be such a field, or I'm too ignorant about that
I may make some computer vision projects et cetra but how would it be useful to people as it runs in console
cases like mark zuckerberg are one in hundreds of thousands.
you can try. and I suppose there's an incredibly small chance that you'd be successful. but in all the cases where you aren't, your time would have been better spent studying and applying for internships.
i didn't mean to quit uni. I want to create, produce something while I'm at uni
I didn't think you wanted to quit uni. I'm saying that the chances of you making money with an ML-based product as an undergraduate are so low that your time would be better spent doing things that are more likely to be worthwhile.
like?
studying and applying for internships.
hiring managers for internships will be looking at your grades, and a signal that you're a good fit for what their team does. working on ML-related projects could satisfy the latter, but if you're working on those projects chiefly to monetize them, considerations that are more important to those hiring managers might fall by the wayside.
I've just finished my freshman year, though
that doesn't change how applicable what I said is to your circumstances. it still is.
Can I apply for a remote internship without letting my school know?
internships are usually over the summer, so try to get one for next summer.
I see
What if i want to create something on my own, would ai be appropriate for such a purpose?
you can work on AI projects on your own, yes. but you should do it for learning, not for the hope of income. because it's very, very unlikely that you will create something that anyone wants to pay for.
Should i learn any other fields on the other side?
whatever is interesting to you, sure
you don't have to pick a lane and stay in a lane
I see. I sometimes get burnout due to this issue :/
what's the rush?
to be productive and useful
I've hired a ton of University undergrads over the years for one project or another. Frankly the ones that got noticed and lead to them being offered some form of a paid position already loved their topic so much they were doing a LOT of work on it unpaid and developed a reputation for it. The majority of students that were not that passionate could never make it through the most basic hurdles I set in front of them before I could seriously consider them.
I see your point, thanks a lot guys. 🙏
Basic hurdles like what and what would be a very complicated hurdle
I don’t know. It depends on the situation. A basic hurdle is usually something like look at the research group website and tell me what interests them. I don’t set complicated hurdles. I may as well just tell them to never talk to me again.
I'm trying to train an AI model to recognize certain images, but I keep getting errors similar to this:
in user code:
File "C:\Users\techi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\keras\src\engine\training.py", line 1338, in train_function *
return step_function(self, iterator)
File "C:\Users\techi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\keras\src\engine\training.py", line 1322, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "C:\Users\techi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\keras\src\engine\training.py", line 1303, in run_step **
outputs = model.train_step(data)
File "C:\Users\techi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\keras\src\engine\training.py", line 1080, in train_step
y_pred = self(x, training=True)
File "C:\Users\techi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\keras\src\utils\traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "C:\Users\techi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\keras\src\engine\input_spec.py", line 235, in assert_input_compatibility
raise ValueError(
ValueError: Exception encountered when calling layer 'resnet50' (type Functional).
Input 0 of layer "conv1_pad" is incompatible with the layer: expected ndim=4, found ndim=3. Full shape received: (224, 224, 3)
Call arguments received by layer 'resnet50' (type Functional):
• inputs=tf.Tensor(shape=(224, 224, 3), dtype=float32)
• training=True
• mask=None
and
Data cardinality is ambiguous:
x sizes: 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224
y sizes: 201
Make sure all arrays contain the same number of samples.
and i have no idea why
:incoming_envelope: :ok_hand: applied timeout to @lapis sequoia until <t:1690006145:f> (10 minutes) (reason: newlines spam - sent 107 newlines).
The <@&831776746206265384> have been alerted for review.
!unmute 1000729109720219778 use the pastebin
:incoming_envelope: :ok_hand: pardoned infraction timeout for @lapis sequoia.
excuse me, i want to ask something, Im currently doing a logistic regression in python and i have some independent variable as a string data type in my dataset, the string data are all categorical like (Short, Medium, Tall) etc. can i just change those values as like number (1,2,3) so that the sklearn can take it as a integer data type or i should not do that?
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
ive tried several variations, searched stackoverflow and reddit
even asked chatgpt
You'd convert this one column to three columns
category
--------
short
medium
short
tall
medium
----->
short medium tall
---------------------
1 0 0
0 1 0
1 0 0
0 0 1
0 1 0
It's called "one hot encoding"
Converting it to integers 1 2 3 can also be good, but then you implicitly say that short is closer to medium than tall f.e. which in this case is true. But you also say that the distance between these terms is the same (short is 1 away from medium, and 2 away from tall).
I see..., okay thanks for answering my questions, really appreciate it 😄
this is really handy, can't believe I only found out about it now:
%%script echo wubalubadudub ᘇᘏᗢ、
guys how we use data science in Microbiology
are there any projects done already on data science?
like a project bin consisting of like 15+ projects
anyone familiar with delpying machine learning code
:incoming_envelope: :ok_hand: applied timeout to @rough schooner until <t:1690027633:f> (10 minutes) (reason: links spam - sent 90 links).
The <@&831776746206265384> have been alerted for review.
guys do u think that trying to write my own ocr as a first machine learning project might be a good challenge or its too hard for beginner?
Depends. If you use existing OCR models/pipelines it's a good beginner project
sir, my code stopped working cuz they changed sites
its simple enough to be a good beginner project, provided you do the research
I ended up going down a rabbit hole and made an OCR program, to automatically click on genshin impact menus to get details of equipped items LOL
training the dataset was absolute hell because cleaning, but ill take the 82% accuracy
(and then a month later genshin makes a website that retrieves data from the game anyway)
when you transpose a tensor is the Original tensor and the transposed tensor the same
what do you mean "the same"?
they certainly aren't equal in the general case - if swapping a pair of axes doesn't change the tensor, it's symmetric by these two axes.
does CNN with GRU sound wierd?
I was thinking of using in my research
I have heard a lot of CNN with LSTM but how about this, why does it sound so wierd lol
How can I orchestrate machine learning pipelines for devices on the edge? I know Apache Beam is used for large models, but I don't need that complexity.
Is there something equivalent for small models?
What are devices on the edge?
.
I don’t know exactly, I think they mean something like a nvidia nano Jetson running pipelines and inference.
I agree with Stel, edge is a broad thing.
We run a project where we hook up multiple Nvidia machines that communicate with each other in a distributed way / network. They're capable of running decently sized neural nets
We don't use Apache Beam or anything of the likes. I think Beam is more for distributed processing and not necesarily inferencing?
Hello, does anyone here have experience building and training a MAE model? I want to create a simple MAE model and train it on the MNIST dataset. I wonder if it is possible.
Bro do machine learning jobs really only consist of training models? there's no way
well, thats kinda the job, to sum it up, more or less
loads of people experimenting with applications too im sure
but i'd imagine most novel applications require training and data and shit to even just try
Practically you will have to learn A lot more than what is required for just training models.
-deployment
-Exploratory Data Analysis
-feature engineering
-Monitoring and Maintenance
-Documentation
which require tools like
-Cloud Platforms(AWS GCP)
-Data Storage and Processing(hadoop etc)
-Containerization Tools(Docker and Kubernetes)
etc
Is there any books u guys recommend to learn SQL as complete beginner for DA/ML?
Guys most of the models are hosted in goggle workspace?
ml image recognition of samples for example?
mean absolute error?
industry got lots and lots of data from past decates which they currently try to get value from
No...the masked auto encoder model
makes more sense hahaha
https://www.kaggle.com/code/ritvik1909/masked-autoencoder-vision-transformer
http://proceedings.mlr.press/v37/germain15.html
Thanks. I will take a look tomorrow
!warn 712970859400265789 our server is not an ad board. Asking for retweets and likes here is againt our rules. Don't do this again unless you wish to be removed from the server.
:incoming_envelope: :ok_hand: applied warning to @plucky meadow.
Is it possible to use tools without using agents in langchain? If yes, then how? The issue is, while using agent, it tends to hallucinate a lot, plus, it takes more time as the computations increases a lot because it needs to think and for that it again uses the LLM, so I’m trying to avoid using agent but i need to use the Tools in order for my chatbot to search net, do maths and other stuff. With agent, inference : 2-3 minutes + hallucinations, without agent, inference: 10-15 seconds + very well generated output. Some help?
But how do you manage pipelines and keep track of the models trained? Can I still use tensorflow extended and airflow for a systems like the one you described?
I don't understand what you're trying to do.
Will you use multiple models on edge or just 1?
It’s just one model. I have 3 drilling units and want to run simple anomaly detection with VARs and predicted power efficiency with XGBoost in each one of the drills.
I’m not doing anything distributed or an inference that requires a lot of computational power, but still, I want to automate my pipelines as much as possible because once the model is deployed on-site, it’s hard to go online to monitor.
Are you sure it needs to be on edge? Can you call an API somewhere that has the model? That drastically makes things easier.
If not, Docker is your friend. You were talking about Jetsons. They don't need any "special treatment" as they are quite beefy.
But what about data preprocessing and monitoring performance such as data drifting and retraining models. Do any of the tools normally used for large models still apply to applications like mine?
What's your architecture? Do you collect data on edge as well? Is your machine connected to the internet?
Do you need to do predictions in real time?
I'm no expert in LLMs, but if you want to hack something together, could you write some form of listener for the LLM's output and if something appears in the output execute X with Y args? Alternatively, you could have another call to the LLM to ask it to judge if it should use the tool or not given the output from the first LLM response?
Yes, data collecting also on edge. Data is coming from sensors, and the predictions need to be real time.
I thought about using TFX and airflow because it would be easier for me to troubleshoot the system if I get a call about the system not working as expected.
The locations are remote and far from where I live :/
I don't (nor need to) know all of your non-functional requirements but I think it can be simple if you want it to be
Your edge device could feasibly just have a container running your model(s). You SSH to it from anywhere if you need to trouble shoot it
If you're able to persist the data you do the following, store the prediction and the data somewhere in a DB/filestore. Upload this in batch to a server you can easily access
This means you can do all of your monitoring, retraining, ... with MLflow running somewhere completely differently
That makes a lot of sense! That way I only use the edge device for inference and if the models drifts, I can just push another trained model in a container to the edge device.
Thank you 👍🏻👍🏻
I remember reading a model that had Gated CNNs. I don't remember exactly which model was, maybe one of the models used in Tacotron or Flowtron...Waveglow or something like that...
Gated CNNs, Dilated CNNs. I think Flowtron uses more Invertible CNNs (since it's a Flow model)
this is what an agent do in langchain, i'm not sure how would I add an event listener while the LLM is generating its output... I will look into it more...
Thanks anyway!
Can somebody explain to me how a tensor can you represent any kind of data?'
I want to add a point spread function to my image as a noise, is function correct for generating a point spread function? py def generate_PSF(size, sigma): """ Generates a 2D Gaussian Point Spread Function (PSF). """ x, y = np.meshgrid(np.linspace(-size/2, size/2, size), np.linspace(-size/2, size/2, size)) d = np.sqrt(x*x + y*y) psf = np.exp(-d**2 / (2*sigma**2)) psf /= np.sum(psf) # Normalize the PSF return psf
How mathematical are you? Search for "Cauchy stress tensor" for an example of how a box experiencing shear and stress forces can be represented with a tensor. That's a real world example.
when using tensorflow's "tf.Keras.Sequential()" for data augmentation, is the data & its corresponding labels duplicated or altered in place
what is the impetus for this question?
i don't have enough training data, so i want to use data augmentation to try and increase it
Sequential is just for putting however many layers together, whether you're using it for data augmentation or whatever else.
then how do you recommend i increase the amount of training images without downloading new files?
Just like Pope Stelercus mentioned, you use the Sequential() method to add layers to your NN
but does calling the sequential on the data to augment it create more copies of the data?
it's more that you're passing data through the Sequential model. and putting a tensor through a model doesn't modify the tensor that you put into it--it returns a new one
No. There's difference between building a multi layer perceptron and increasing the sample size of your data.
You can increase the number of layers in your NN by doing so but to increase the sample size of your data, that's an entire different thing.
also, models that you make with Sequential aren't necessarily for data augmentation. Sequential is just for general purpose building of (relatively simple) neural networks.
alright
What kind of data are you working with? Image? text? tabular data?
Image
You can use some data augmentation techniques like rotation, resizing of the images, GANs, etc..
https://www.tensorflow.org/tutorials/images/data_augmentation
Looking for working script / code to instruction fine tune llama2 on 1xA100. Tried Abhishek's autotrain advanced in 4bit but so far can't get loss to go in expected direction 😅. If you successfully fine tuned llama2 and willing to share how I would appreciate that 💚
i see, cool
Do you know free books about ai for gaming.?
can you elaborate a bit more on what AI for gaming would entail? like models that can play games or models that would be used by gamers?
indeed i wanna know about ai usage for gaming like ml models is used and etc... If you have book about ai and gaming or another docs pease share it to me.
well, that didn't answer my question
but here's a book on reinforcement learning and how it's applied to have models learn to play games https://redirect.cs.umbc.edu/courses/graduate/678/spring17/RL-3.pdf
Thank you...
What type of sql do I need to know on data science?
Type of SQL? u mean type of database?
Yea
Hi, I am building a text classification model, I wanted to know if similar sentence not duplicates might affect model performance. For example:
I hate watching Netflix series
I hate watching Amazon Prime series
Or should I just keep one instance of the sentence like just keeping either of I hate watching Netflix series or I hate watching Amazon Prime series
Anyone know why pandas would still be throwing A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead warnings even though I am doing exactly that?
Is it because the referenced DF is already just a filter of another DF (meaning I should copy(deep=True) it)?
is anyone interested in medium account sharing?
Please anyone can help?
what are the categories you are trying to classify the sentences into? can you explain your project a bit further? people might be able to help with this info
In general, you do want to summarize similar sentences into one category (in this example: "I hate things" or "emotions") for your model to be able to recognize those variations of sentences belonging to one category
Its a sentiment analysis model a multi-class sentiment analysis. So the categories are negative, positive neutral.
So if two exact sentence like I hate "something" where something might be a different movie name. Like I hate Godzilla and I hate Captain America Civil War, you want me to keep just one of it like just keeping I hate Godzilla ?
One these are just few instances in the dataset. I have other reviews that are not exactly same as I hate something but are reviews with negative sentiment
I think I understand your point now, so the sentences should have a great variation of for example negative verbs or adjectives, to prevent overfitting to one specific verb I'd recommend not to use the same verb in many training sentences. But using the same construction twice or 3 times should not be a problem.
Got it and my data is pretty big with different variations of negative reviews these are just few instance where the construction is same
Thanks a ton @hollow kettle
Then it shouldn't be a problem, glad I could help
Hi, I do not know how to express this, but I am a bit lost. I'm uncertain about which roadmap to follow or what field to choose. I recently completed the course "Supervised Machine Learning: Regression and Classification" by Andrew NG, and everything went well. However, when I started the second course, "Advanced Learning Algorithms," which covers neural networks using TensorFlow & Keras, I found it quite challenging to understand. I would say I only grasped about 20% of the material, and it feels overwhelming. Also, I could not understand the syntax well, which adds to my confusion.
I have completed some basic projects like house price prediction, diabetes prediction, and two other logistic regression projects. Despite that, I'm still confused and struggling to grasp the information, unlike my experience with computer science. Can someone please advise me on what I should do next?
that's normal, it takes time to understand and grasp everything. What i did was take those courses more then one time while trying things in between and each time i would understand more. So try to see what you don't understand. Is it maths? go take a look at some concepts you need like the gradient descent or the linear algebra. Is it python/matlab? go take a look at the syntax and what we're doing.
What I would advice is to try to implement perceptron alone with python don't use any library other then numpy and try to understand what is happening. Print everything and try to take an example by hand to see what is happening. You can try to fit the XOR function and understand what is happening under the hood then take that course another time
Is there anything that currently exist that can get me something like 10k price charts, or book summaries of data in a csv in 5-10 seconds?
Hi, i want to ask, if u did a logistic regression and then you get the coefficient of the model, can u predict the dependent variable by using those coeficients?
Ex
Coef_ age = 1.23
Coef_tall = -0.61
Coef_Single = 0.45
and i want to find the dependent variable (1 or 0)
withm data are , age =20, tall = 1, single =1
can i count the probability manually?
Hello, I have a baseline model that does segmentation + postprocessing to achieve object detection and classification (postprocessing to find the bounding box out of masks). This model achieves 93% on the metric. I'm trying to use the same model backbone but changing the segmentation head with linear layers to detect directly if each class is there and the 4 bounding box points (so 5*number of classes neurons on last layer). The model is good but not enough (91% on the metric, i need to achieve the same as the baseline at least to change it since this one is really lighter and the post processing was a bottleneck).
What can i do to tune my model? Would Knowledge distillation work from segmentation to teach the second model?
thanks, I really appreciate the advise and I will try it
Nah i dont know the formula, i just input the sample data of the dependent and independent and get the coefficient, using scikit-learn library
hello, i have a database of approx. 300k in-game character names (no personal info) that have been previously reviewed by moderators after being requested for approval. each row in the data is labeled as "approved" or "rejected", when it broke a name rule (i.e. bad words).
what would the best approach be to train a model to help provide an extra data metric to names pending review as "system leans reject" or "system leans approve"?
than ure better off learning that then
where can I talk regarding Machine learning?
here
ohh noice
Actually I am just a begineer
But took this ML course based on python
and I am learning it rn
I know C and C++ for now
Hi people... Newbie here
I have 25 classes and a bounding box for each class with a batch size of 32, so the prediction is of shape 32,25,4. I calculate the IOU loss by calculating the mean of all the ious of each pair of predicted - label box. I want to find a way to give more weight to the boxes predicted for some classes since the model predictes poorly on those
anybody familiar with convolutional layers?
Alright i'll try to learn it, thx for the advice 😀
Hey, I would like to create a Pandas DataFrame in which one of the columns is a list.
I have one list like this:
[1,2,3]
and another like this:
[[x,y,z],[x,y,z],[x,y,z]]
I want this:
col1:[1,2,3],col2:[[x,y,z],[x,y,z],[x,y,z]]
I get error: Per-column arrays must each be 1-dimensional
you should just about never put lists inside of a pandas dataframe
How about numpy arrays?
I basically have names and categorical data to those names. I would like to save it so I can decode the actual name later
do you have multiple names or multiple pieces of data per name?
john:[0 0 0 0 1] something like this
!e it just works for me:
import pandas as pd
df = pd.DataFrame({"a":[1,2,3],"b":[[1,2,3],[2,3,4],[4,5,6]]})
print(df.dtypes)
print(df)
@tidal bough :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | a int64
002 | b object
003 | dtype: object
004 | a b
005 | 0 1 [1, 2, 3]
006 | 1 2 [2, 3, 4]
007 | 2 3 [4, 5, 6]
it's probably a bad idea since, well, object dtype, but it works.
What is the proper way to save categorical data to decode later?
What do you mean by categorical data?
Cause neural network will output [0 1 0 0 0] and I wont know what name it is refering to
oh, you mean one-hot encoding
yeah
perhaps just map it to indices? like, [0 0 0 0 1] -> 4.
Yea and I want to save it into a dataframe
But as you say its not the proper way to do it right?
Instead of the 4 I would like to have a name
Do you have the array of names?
yep
then yeah, map these before putting them into the dataframe.
whatever ML library you're using might have a function for that; otherwise you could do, uhh
store each boolean in a separate column
what exactly are you trying to do? multiclass classification?
really ? On actual data I will have like 200 columns
yes.
# predictions is (N,k) - one of k categories for N samples
# names is (k,) - the name of each category
pred_inds = np.where(predictions)[1] # should be (N,) if there's a single nonzero element per row
pred_names = names[pred_inds]
[0]*
no, I think mine is right, the first array returned by where should just be an arange(N) if there's one nonzero per row, and hence boring.
that's a (k,) array, not an (N,k) one
oops
I think im just gonna save it as a dictionary, much simpler
pandas is a powerful tool. but it's also important to recognize when it's the wrong one.
though it's often the right one. pandas pandas pandas.
I have two pandas dataframe in Python. Each dataframes gather sensor data from a different sensors. Let say that one takes 1.000 samples per second and another one takes 1 per second. If I merge both dataframes, I have two possibilities to deal with this difference:
- Using sparse based structures: this way most of the fields would be None and thus saving lots of memory
- Filling all empty spaces with repeated values. In this case, is there a way to do it without blowing the memory? Like same way as categorical data, where huge repetition of values do not penalize
Hi I got a question about statistics, what is the relationship of probability and a standard deviation? My understanding is that if a feature has high std deviation then it is easier to predict?
I would say it is the opposite. High standard deviation means that your data can take such different values with respect to its mean value. When you have a situation like this, predictions are usually harder
If memory is starting to be a concern, I'd consider not merging them at all, and instead seeing how you can process them in a merged way without explicitly merging. E.g. in polars you could do a lazy operation involving a merge and stream it in such a way that the full merged dataframe is never constructed.
(duckdb can probably do it too)
any roadmap for data science and ml?
Columnar stores generally compress or encode (via various techniques) repeated values, so you shouldn’t need to worry about sparse structures here, just select the appropriate store, like pyarrow or parquet or whatever for your use case
That's ineresting. Gonna read about it
Secondly, use an ‘as_of’ join to combine that type of data: where data is updated on different intervals.
Ref for duckdb as_of joins: https://duckdb.org/docs/guides/sql_features/asof_join.html
I understood your first comment, but I am not sure if I got this one 😅
Read the link for as_of’s, they’re powerful but that’s a better explanation (and as_of is supported by a variety of systems).
This is the pandas as_of: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge_asof.html
I will do that! I have been working with pandas for such a long time. But never had this type of issue till now. So I have no idea about what possibilities I can find. Your feedback is being quite useful
This function seems to be really useful. I will have ended up wondering how to do this as well
Also, consider where parquet might fit in. It’s extremely powerful for data problems where you have large amounts of historical sensor data (for instance).
I read about it a few weeks ago but did not consider using it as I am not only saving a dataframe, but a whole serialized dataclass so I would need to take a look about how to integrate all of this with parquet. I will see how much space I can save and evaluate the situation! Thanks 🙂
Oh and now that I am rereading this to understand it better, my main problem right now would be minimizing the RAM memory usage. Because PyArrow or Parquet can deal with data redundancy when it is being saved. But if I merge these two dataframes and repeat all the values, the dataframe goes from expected 2GB to 60GB in RAM usage. So which alternatives would I have here?
Let's assume storing the data is not a problem
I saw that the function you told me tends to repeat all this data when merging the two dataframes. So I would also have a problem when it comes to merging as well. I will check the flags that the function admits
This is back to your first question: about how to handle repeating values. With columnar stores (like pyarrow, parquet, duckdb) that have various compression and encoding methods, the repeating values don't "cost" you anything (or much).
If you're using plain vanilla pandas, then yah, it'd probably be a concern if you're storing a large number of values.
(but there's often a memory / performance tradeoff here)
For example https://duckdb.org/2022/10/28/lightweight-compression.html#run-length-encoding-rle... and the parquet equivalent: https://parquet.apache.org/docs/file-format/data-pages/encodings/#a-namerlearun-length-encoding--bit-packing-hybrid-rle--3.. and the arrow equivalent: https://parquet.apache.org/docs/file-format/data-pages/encodings/#a-namerlearun-length-encoding--bit-packing-hybrid-rle--3
Great! Thank you so much 🙏
Hey, I have Numpy array X with shape (1000,10).
How do I reshape it into (1000,640)?
I have hot encoded data with 64 values. The actual numpy array should be (1000,10,64) but it isnt?
I hope the question makes sense...
you can't reshape a (1000,10) array into a (1000,640) because 1000 * 10 != 1000 * 640. the operation you're describing is something other than what "reshape" refers to.
you can reshape (1000,640) to (1000, 10, 64), however.
yeah true, I meant to say my array is in a different form than I would like.
so do you actually have a (1000, 10) shape array, or (1000, 640)?
(1000,10)
and you have not yet one hot encoded it?
print(X[0][0]) returns part of the hot encoding, instead of print(X[0]) returning all of it
I don't understand that sentence.
can you show the code that includes the part that performs one-hot encoding?
@grand quarry please ping me when you show the code, and if I'm able to look at it at that time, I will.
The hot encoding is a bit of a mess, but I can show that this fixes the problem (ugly but works)
all_data = []
for i in range(len(X)):
X_single_piece_of_data = []
for j in range(10):
X_single_piece_of_data.extend(X[i][j])
all_data.append(X_single_piece_of_data)
all_data = np.array(all_data)
try using a proper encoder, like from sklearn.
yeah will do that, didn't know sklearn had that, thanks
anyone here use Scipy and LMFIT before?
always ask your actual question right from the get go. not if someone has used a library that pertains to the actual question.
okay will do
since the code is a bit on the longer side, I posted in SO and am posting the link to the question here. In short, I get different answers for LMFIT vs. Scipy
Depending on the natural of the problem it's not unusual for very small differences in the convergence criteria or implementation in an algorithm can cause them to find very different local minima. You should probably start by trying both methods on something easier and convex to see how much the solution differs in that case.
well I presume for simple cases they do correlate well (these are quite standard packages). But if they diverge for my instance, that is quite concerning in terms of my landscape (i.e. my landscape is so poorly defined small deviation in implementation result in wildly different solutions).
u checked their docs?
maybe the default method of lmfit is different of scipys
I'm not sure what point you are trying to get across. If both methods found a local minima, there could always be another method out there that finds an even better one you haven't tried yet. You should at least perturb your initial conditions and get some vague sense for how resilient your function to finding very different local extrema.
I'm stating both methods should be identical and thus should provide identical results
I thought lmfit uses scipy direct (i.e. it's just a wrapper for it). But I will double check
a table in JupyterLab is showing médio instead of "médio"
How could I change it? Isn't the default encoding of jupyter the UTF-8?
hi question about clasification machine learning model

i have perforation in the land positive perforation and negative and i want to use maps pixes with raster max such hidrology, gelogical and topographic etc
my problem is okay i can use model in machine learning such a random forest, SVM etc for clasificate a map nevertheless the data of the perforation is not in the same high so the data is not homogenouse
so i want to found if a perforatin is negative if not found water or positive if it found water but also is important where or in which distance i will find water cause then i will have like a kind of map cluster clasification
u.u
the othe rproblem that i have is raleted with the amount of data cauze if i start to cluterize the data in the depth i will lose data and quantity
then maybe i can develop a model in machine learning over 50 to 100 meters cause i have more resources than one of 200 to 300 or more about 400 meters
that taken intoacout that plus topography also will afect your axis
hi, it is possible learn develop AI without deep math knowledge and freelance?
for extremely simple things you can find literal hundreds of tutorials and examples of online? yes
developing actually new stuff? no
around 300$ in month is possible if i learn it 6-9 month?
Uh you can use libraries like TensorFlow which take a lot of the math out of machine learning
But it still does involve math
how much?
However, in life, if you want to get good at something, you should spend a lot of time thoroughly learning it from the bottom to the top
A lot; I’d even argue that machine learning is 100% math
Machine learning is more mathematics than it is coding
it is not problem, i want to learn, but i have only 9 month for getting around 300$ every month
You’re trying to learn in order to make money, correct??
I wouldn't take that bet
just about any programming related entry level job has hundreds of applicants nowadays
money i need too, for paying my learning in university
but im very interesting in AI
and i love programming very much
Why specifically do you need to learn machine learning? Does it entail to the job you’re trying to work? Is this job a self employment job or are you working for a company/somebody?
but i know only scraping and create bot, but now i need earn in freelance
no, its interesting for me, and i think its lower competition than create bots or scrapers
ML/AI is quite complex and takes time to learn. If you’re trying to make quick money but don’t care, not sure if you want to go down this path
what path can u advice?
However, it is your choice. I do agree that machine learning and artificial intelligence are very interesting topics, but I would recommend learning them if you genuinely want to spend time making something out of them instead of trying to make quick money
For freelancing, there’s a ton of options. It’s your choice. I guess you could do something like cybersecurity
You could become a game developer on a platform like roblox and commission
cybersec? even worse than ML imo x-x
i always programm for me, and i hate people that programm for money, but now i cant leavy my small tawn for universitet to other country
Ay bruh I ain’t know NOTHING about cybersecurity except the self explanatory part of it
Forgive me please
i love pentesting, but its dangerous in my country
My ultimate recommendation for anyone is to program because you like and enjoy it
Money is a byproduct of hard work and success
yeah, i think it too, but i want to go university and leave my small tawn (
That goes for anything in life as well, do something because you like it and perhaps great money could be made from it, but never do something solely for the purpose of making money with you somewhat liking it
idk overall I'd probably recommend just looking for a normal (non-programming) job and looking for anything you can use your programming skills in in it
i always think this way, but now i need to earn 300$ in month
Because what will happen is that you won’t work harder than your competition because you don’t have the passion or drive like your competitors do
this
no one will accept me for real job
What is your first language?