#data-science-and-ml
1 messages · Page 30 of 1
hey, im currently learning and stuck on gradient descent local minima/maxima, i have the base code, a cost function and the domains although i cant figure out how to localise it, neither am i capable of finding the info i could implement on google, does someone mind helping?
you can use it on cuda 11
https://pytorch.org/get-started/previous-versions/#v1100 just pick the command for cuda 11.3
Could someone please take a look at this issue please 🙏
#help-croissant
I think I have a table manipulation problem down in #help-candy I don't know what to call this problem
Hi, does anyone know how to append another value to the same key, im getting this error?
append is a list function. You need to make the value of the dict a list if you want to use append.
thanks, ive now updated the code
Hello, what's the practical use of pandas.dataframe.items, iterrows, iloc and loc?
hy anyone know about encryption specfic field of json using python?
Hi, I am training a model using catboost and it trains fine when there's CPU in the hyperparameters but I get the following error when I change it to GPU. Hence, it seems that there isn't any problem with the code. Please suggest how it can be resolved
@lapis sequoia Do you know what CUDA version are you running? Tried updating catboost?
95% sure I had this exact issue before but I can't for the life of me remember how I fixed it
How should I check the Cuda version?
I tried updating the catboost part but that didn’t work. I even tried enforcing the import of the latest version of catboost but that also didn’t seem to work.
how to merge to training data sets of images
this is how i am loading the data
clothes = keras.datasets.fashion_mnist
(train_images, train_labels), (extra_images, extra_labels) = clothes.load_data()
essentially i want to do something like: ```py
train_images = np.concatenate(train_images, extra_images)
train_images = np.concatenate(train_labels, extra_labels)
you're pretty much there! you just need to specify along which axis to concatenate
what's the shape of your data sets?
its 1 dimentional
i added this
axis=0 it worked
great
Guys, I'm working a on a script that fetches text from images and stores the data from the images into a dataframe.
I'm currently using Tesseract to detect txt from images. Any other alternatives? Tesseract doesnt seem to detect small text from images
cv2 should also have a text detector for images, you can give that one a shot.
is there a way to append rows to a pandas dataframe inplace? Like without generating a new one?
Nope.
damn
The best way is to append all the data to a python list, and then convert the whole list to a dataframe once you have all the data.
o that doesnt sound too bad, I'll try that, thanks buddy
okay, will do 🙂
what about for specific symbols or signs (ex: hazard)
i'm not sure i've seen something like that trained to detect special chars. if you mean to detect it as an image, you can probably transfer train a network that does image classification
hey, my model wont stop guessing "Boot", please help me
this is the code:
from tensorflow import keras
from pathlib import Path
import tensorflow as tf
import numpy as np
import cv2
images_path = fr'{Path(__file__).parents[1]}\images'
image_path = fr'{images_path}\dress.png'
labels = [
'T-shirt',
'Pants',
'Long sleeve shirt',
'Dress',
'Coat',
'Sandal',
'Shoe',
'Bag',
'Boot'
]
label = labels[3]
""" Retrieving and loading data """
clothes = keras.datasets.fashion_mnist
(images, t_labels), (extra_images, extra_labels) = clothes.load_data()
images = np.concatenate((images , extra_images), axis=0)
t_labels= np.concatenate((t_labels, extra_labels), axis=0)
""" Making the image the correct format """
drawn_image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
drawn_image = drawn_image[0:600, 0:600]
resized_drawn_image = cv2.resize(drawn_image, (28, 28), interpolation=cv2.INTER_LINEAR)
resized_drawn_image = resized_drawn_image.reshape(-1, 28, 28)
""" Pre-processing images to be between the values of 0 - 1 and Making image White on Black """
images = images / 255
resized_drawn_image = abs(255 - resized_drawn_image) / 255
""" Creating the model """
model = keras.Sequential([keras.layers.Flatten(input_shape=(28, 28)),keras.layers.Dense(128, activation='relu'),keras.layers.Dense(10, activation='softmax')
])
""" Compiling the model """
model.compile(optimizer='adam',loss='sparse_categorical_crossentropy', metrics=['accuracy'])
""" Fitting the model """
model.fit(tf.expand_dims(images , axis=-1), t_labels, epochs=1)
""" Testing the model """
test_results = list(model.predict(resized_drawn_image)[0])
""" Getting the results """
guess_index = test_results.index(max(test_results))
if guess_index == 6:
guess_index = 2
""" Printing the results """
if labels[guess_index] == label:
print(f'\n\n The model guessed {labels[guess_index]}, the model was correct!')
else:
print(f'\n\n The model guessed {labels[guess_index]}, the correct answer was {label}.')
this is dress.png
perhaps the all knowing @wooden sail could take a look...?
edd has duties, being a god is hard work
when you do (255 - resized_drawn_image) you're inverting the colors/intensities in the image btw. this can ruin everything if you train on the images but predict on the inverted ones. i don't think you're concatenating along the correct axis either. if the images are 2D, concatenating along axis 0 makes just one tall image with many things in it, doesn't it?
try doing imshow on one of the concatenated images and see
try plotting what you've done to the images. looking at stuff can be helpful when you're wrangling data
@floral hollow if you want to "stack" a bunch of 2d arrays together into a single 3d array, use np.stack, not np.concatenate
concatenate extends an existing dimension. stack adds a new dimension.
in general, if you download someone else's project, and follow their instructions, and the instructions don't work, it means that they messed up not you 🙂
the error message does include this link: https://www.gymlibrary.dev/content/environment_creation/
it might be a good place to start reading about how openai gym works
from what ive understood its not referring to your virtual environment, but rather the "environment" that the AI agents are acting in
the "action space" seems to define what actions the agent can take, but as per the readme there is no fixed set of actions, so their environment doesnt define an action space
https://github.com/dellalibera/gym-backgammon/#actions
The valid actions that an agent can execute depend on the current state and the roll of the dice. So, there is no fixed shape for the action space.
that was my impression as well. the "environment" is something that's part of the openai gym framework, not the python environment
it seems like something that was supposed to be part of the example script.
see also https://www.gymlibrary.dev/content/basic_usage/
Every environment specifies the format of valid actions by providing an
env.action_spaceattribute. Similarly, the format of valid observations is specified byenv.observation_space.
and this is the module that defines the (two) environments that can be used https://github.com/dellalibera/gym-backgammon/blob/master/gym_backgammon/envs/backgammon_env.py
but otherwise i cant explain why their project doesnt include one, perhaps there was a change to the api that required it?
you can try either contacting the maintainer of gym-backgammon, or fixing the code so it has an action space (the module i linked above does seem to have a get_valid_actions method for determining those actions)
openai/gym#751 might be a relevant issue too
or this https://github.com/openai/gym/issues/1264
You could always allow all actions and rely on the agent to figure out that 2 of the actions do nothing in the later stage. You could also signal to the agent in the observations that it is in one setting or the other. It's likely that there are other ways to do this, but this may be the simplest one.
Implement a github project into a notebook?
Wdym?
U cant rly run multiple files in a notebook
u run a github project as one
for example, clone it and run the main .py file in terminal
gits are like interconnected dependencies
a git repository is an entire "project": a collection of files. a notebook is one of several files in a project.
Hello so, I was working on a simple JARVIS project and I got an idea to make the assistant identify the user's face and greets him by his name. For example, If I use the assistant it should greet me as my name and if my friend uses it, it should greet by his name. I am not much experienced in this field so, I need to know which library would help me do it. (and a tutorial reference would be helpful :>)
Hey, what can I do to divide two data frames by each other?
Yes
Is there a reliable way to change a data frames values from an object to an integer.. if the numbers contain commas specifically
yes
,thousands=',
it sounds more like you have a column of text data, and you need to parse that data into numbers. is that right?
Hey, can anyone help me? I have a data set with data about houses. I'm trying to create multiple plots with sns.FacetGrid, but it does everything in the last plot (Check the image). I try to make a plot for every town. The floor_area_sqm goes on the x-axis and the town goes on the y-axis.
This is the code:
grid = sns.FacetGrid(df, col="town", hue="resale_price",
col_wrap=4, height=1.5)
sns.scatterplot(x="floor_area_sqm", y="resale_price", hue="town", data=df)```
Thanks for the help!
Image:
In probability theory, the birthday problem asks for the probability that, in a set of n randomly chosen people, at least two will share a birthday. The birthday paradox is that, counterintuitively, the probability of a shared birthday exceeds 50% in a group of only 23 people.
The birthday paradox is a veridical paradox: it appears wrong, but is...
c-c-c-crazy
does anyone know what fake_audio[:, :, :audio.size(2)] does? These are pytoch tensor objects but you can probably think of them as np arrays :L
it's a slice filtering
- all values in the axis at index 0
- all values in the axis at index 1
- all values up to
audio.size(2)in the axis at index 2
ah i see
!e I would expect for it to be doing the same as ```py
import numpy as np
a = np.arange(16).reshape(2,2,4)
b = np.arange(8).reshape(2,2,2)
print(a)
print('---')
print(a[:, :, :b.shape[2]])
this repo i'm running seems to be breaking when trying to subtract the fake_audio[] with the real audio
File "/home/user/HiFi-GAN/utils/train.py", line 88, in train
step)
File "/home/user/HiFi-GAN/utils/validation.py", line 28, in validate
sc_loss, mag_loss = stft_loss(fake_audio[:, :, :audio.size(2)].squeeze(1), audio.squeeze(1))
File "/home/user/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user/HiFi-GAN/utils/stft_loss.py", line 130, in forward
sc_l, mag_l = f(x, y)
File "/home/user/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user/HiFi-GAN/utils/stft_loss.py", line 91, in forward
sc_loss = self.spectral_convergenge_loss(x_mag, y_mag)
File "/home/user/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user/HiFi-GAN/utils/stft_loss.py", line 46, in forward
return torch.norm(y_mag - x_mag, p="fro") / torch.norm(y_mag, p="fro")
RuntimeError: The size of tensor a (151) must match the size of tensor b (146) at non-singleton dimension 1```
so would i just force fake_audio to be the same dims as audio?
@agile cobalt :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | [[[ 0 1 2 3]
002 | [ 4 5 6 7]]
003 |
004 | [[ 8 9 10 11]
005 | [12 13 14 15]]]
006 | ---
007 | [[[ 0 1]
008 | [ 4 5]]
009 |
010 | [[ 8 9]
011 | [12 13]]]
cutting things off randomly is usually not a good idea
oh .-.
you must see what is the purpose and origin of each of them first
fake_mel seems to be from a model generating from a mel
audio seems to be the real audio
I'd check the model description to understand why it does not matches and see if / what they recommend then
it might be just that they have different durations, in which case you might as well cut off the longest one to match the other, but only after making sure that this is the case and the way they recommend handling that issue
i forked it do some edits: https://github.com/Chitasa/HiFi-GAN/
since i have enough audio data now i might just train from scratch
I am pretty new to description logic and I have no idea how you would solve these kind of exercises. Could anyone lend me some help?
like stel said this is pretty cool stuff. i had a classmate in grad school that created a GAN to create anime characters. his took at least 20k-30k images for decent quality
and even then you could kinda tell with some of them that they were artificial
yeah well the precision doesn't really concern me so long as the accuracy is there. I'd rather have a machine that creates 1 good image and a hundred bad ones than one that creates all ok images
but that's really good to know that 20-30k was the barrier to entry for your classmate
yeah his were full body shots though and not one specific anime series so maybe you will require less
best of luck bud

scatterplot doesn't automatically detect and use the presence of a FacetGrid. you need to call grid.map_dataframe or similar to do this, as per the examples in the FacetGrid docs http://seaborn.pydata.org/generated/seaborn.FacetGrid.html#seaborn.FacetGrid
however the scatterplot docs point to replot as the preferred way to do this: http://seaborn.pydata.org/generated/seaborn.relplot.html#seaborn.relplot
would this have worked if I filtered normally with a mask like in the last line. Instead of loc
Basically does doing df[mask] create broadcasting or a copy
Hi anyone around for a quick question?
neither, it creates a view. and it's equivalent to using .loc[mask], pandas just does some fancy type detection with plain []
dontAskToAsk
Surely its in the environment youre coding in
but not downloaded as files actually, maybe its just put on ur ram
otherwise u wud lose hard drive after doing this stuff a few times
yeah pretty sure it doenst download
yaeh it issnt download
I ask cuz after 4 hours I don't need it anymore, I could ask it directly, as you see no one was around, someone would spend their time and solve it after hours and I would thank them, have to try to conceal they spend entire time trying to solve it for nothing
building 'cartopy.trace' extension
error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for cartopy
I keep getting this error while trying to install Cartopy. I am temporarily hosting the bot in Diva where I cannot upgrade C++ Build Tools, etc. Can anyone suggest me a better way to fix this error? I tried to google it but people suggest using Conda...
and there's no way to pip install a pre-built wheel? if that's the case, then yeah, I guess you'll have to use anaconda. (even though I dislike it.)
I am analysing the position of links on a set of webpages, to explore if link position influences click-through.
I have the relative path in this format for an html file: file = './data/wpcd/wp/f/France.htm'
I am trying to use Selenium to analyse the content of the file, but it requires the full path and a specific encoding, such as: driver.get(file://C://Users//User//DataspellProjects//project//data//wpcd//wp//f//France.htm
What would be the best way to convert my relative path into a Selenium usable path, while staying platform independent
I am currently doing a regex based solution with os.getcwd(), but it doesn't feel like the right way of doing things
Selenium is out-of-scope for this channel. See #❓|how-to-get-help, but only if you've read the terms of service for all the websites you intend to scrape and verified that what you're doing is allowed.
Has anyone here used kmeans clustering for 3D image segmentation? I'm looking for python's equivalent of Matlab's imsegkmeans3
omg I got it working I'm so happy
tysm!
Does anyone know how to speed up computationally expensive functions in scipy or sklearn using nvidia GPU (so numba/cuda implementation)? Doesn't seem like they offer any GPU options atm, so is my only option to build those functions from scratch with numba syntax?
I think that is how you'd have to do it.
I am trying to do a groupby but I want to return the whole row of the dataframe not just the column I am grouping and aggregating on. df.groupby('foo')['bar'].nlargest(2) ... but I want to see the whole row. I think the solution has something to do with loc but I can't wrap my head around it. Any ideas?
wait, ur using selenium on html files and not chrome?
cant u just use BS? arent html files static pages
selenium for navigating browser
Need the position of the link on the page, not in the html
Yea, we're looking at the x,y coordinate of the link on the article, supposing a standard desktop display (window size 1920x900)
I think you're using the wrong tool
How come
BS doesn't give the coordinates, it gives the position in the HTML files (line 20, column 200), while Selenium gives (x,y) position for a "physical" page, as it would be shown to the user.
Huh, TIL you can use selenium for finding x,y
I only used for scripting navigation
element wise
are you using a .get_location function on an element?
Hi guys,
I'm pretty new here and I'd like to tell you about our Python open-source project.
My team and I are interested in the multitude of AI APIs that have emerged on the market in recent years from large cloud providers (Google, Microsoft, Amazon, etc.) but also from AI specialists (OpenAI, DeepL, Assembly AI, etc.) and that allow us to handle specific tasks: image recognition, translation, audio transcription, document parsing, etc.
We develop an API to rule them all: we standardize competing APIs into a single one so that developers can change providers whenever they want, use several APIs at the same time if needed, combine engines from different providers, etc.
To be transparent about this standardization, we decided to launch an Open Source version where we display the connectors we created to allow any AI service provider to add its own connector or to allow anyone to use our standardization for free: https://github.com/edenai/edenai-apis/ For those who are interested in these topics, I would love to have your opinion on our project and how to nourish it (please note that at the moment, only members of my team are working on it). As I said at the beginning, it's new for us 🙂
Thanks in advance,
Taha
PS: If you can star the repo that would be great and would help us a lot!
Does anyone here have a suggestion about where might be good to ask some questions about the mathematical properties of 2D DFTs/FFTs and how that relates to some image editing techniques? I've got the basics down, but I'm having some trouble identifying signal peaks in the 2D space. I'm wondering if there are some properties of the complex vector magnitudes that may assist my post-processing search. (Kind of vague, I know!)
Unless you can be more specific I don't think there is anything particularly helpful that you are looking for beyond general peak finding algorithms
i like this idea (and i love that it's open source and self hostable!!), but who is the customer? changing api providers usually is a somewhat big decision, no? how often are companies using multiple AI APIs at the same time?
i wonder what the value of making this a service is, versus simply a python client library like geopy that abstracts over the various APIs on the client side?
i feel like if you're at the point where you're mixing and matching AI APIs, you'd already have a data scientist or two on hand and might be starting to in-house some of that stuff anyway.
and how are you going to get AI API companies on board with this? they want their APIs to be distinct and differentiated. aggregating them into a common "API soup" might go against their strategic plans and they might write you out of their ToS. this was a big issue at a past company i worked at, where the product essentially depended on the goodwill of upstream API providers and we needed to carve out our product niche very very carefully so as not to step on their toes and get slapped in response, because every one of them had the power to unilaterally destroy our business if they wanted to.
of course if this is just an open source project that isn't meant to have commercial backing, then i'm all for it. it might also be a great thing for other companies to be able to offer various AI/ML APIs as an integration with their own platform, like jetbrains might want to use this inside dataspell.
in fact, maybe that's the pitch to upstream API providers: you are making their API more accessible to more people, by making it interchangeable w/ other providers and thereby allowing them to compete more directly on price and model quality. might be appealing to smaller players and unappealing to bigger players, like how netflix was originally good for movie studios until they decided to go build their own streaming platforms and netflix suffered badly.
I have code ```
conditions = [Animals["refference"] == "ABCD"]
choices = [
'ABC-{}'.format(Animals['Invoice'])
]
Animals["Type"] = np.select(conditions,choices,Animals["Type"])
.format isn't "vectorized" over pandas objects. choices is going to be a list of length 1, containing a big ugly string, i.e. the result of calling str(Animals['Invoice'])
i really don't quite know what you're trying to do here
you have a couple of other problems with this code too... why is conditions wrapped in an outer list?
since this is just a boolean lookup, you can assign directly with .loc and avoid all the complexity you've accumulated:
cond = Animals["reference"] == "ABCD"
Animals.loc[cond, "Type"] = "ABC-" + Animals.loc[cond, "Invoice"]
is that what you wanted?
if that condition is met then I want to replace in Animals["Type"] where Animals["refference"] == "ABCD" with the word ABC-(the value from that row in Animals["Invoice"])
okay, my code should do that
also, in english reference is spelled with one f, not two
but there's more then one place where this is true
yes, look at my code
cond is a pandas Series of boolean values
.loc selects multiple values from a series or data frame
you might want to re-read the pandas user guide: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html
is there a way to do this using np.select
no
it's not idiomatic and it's less efficient computationally
cause isn't np.select faster
no
did i have this discussion with you recently? or with someone else?
they saw in a youtube video that np.select was "faster" (than what?) and were fixated on using it
np.select is faster than looping and using if/else inside the loop
but np.select also requires you to construct a complete array just to take a subset of it, so that's not faster
assigning with .loc is just as fast as using np.select
both are implemented efficiently in tight C loops internally
the general principle is that vectorized numpy and pandas operations are faster than the same operation with a plain python loop
ok thanks
anyone good with computing mathmatical functions
don't ask to ask. state your question in detail and wait for someone who knows the answer to answer it. if you don't get an answer, proofread your question to make sure it's easy to answer, and then wait a while and ask again.
@umbral charm we have a guide on asking good answerable questions: https://pythondiscord.com/pages/resources/guides/asking-good-questions/
A guide for how to ask good questions in our community.
Yea ok totally read that
anyway sure my question is https://gyazo.com/3b051e39fb3bee1c6690a11cc9a6975b
can you repost that as text, not a screenshot?
this server got a latex bot?
!code you can format the code in a code block, read below:
Here's how to format Python code on Discord:
```py
print('Hello world!')
```
These are backticks, not quotes. Check this out if you can't find the backtick key.
$h$
i see, no we don't have a latex bot
Well its a mathmatical function i need to compute
i can try copy pasta but i doubt it would do much
i see. this should be pretty easy using numpy. did you at least make an attempt?
you can do it looping over lists too, which seems to be what the question is asking you to do
make an attempt and post your code
not really, but i think ive done good so far i just dont know what to do now
Yea ive did some code
def fourier(x, a_list, b_list):
for i, (j,k) in enumerate(zip(a_list, b_list)):
(jnp.cos(x) + knp.sin(x))
idk how to get it in the fancy thingy
''' def fourier(x, a_list, b_list):
for i, (j,k) in enumerate(zip(a_list, b_list)):
(jnp.cos(x) + knp.sin(x)) '''
read the info in the box 🙂
Guys...why does NLP models rely so much on softmax functions?
I've tried making one that outputs a single value, translating it with a dictionary of values + KNN, and...no success...
'''py
def fourier(x, a_list, b_list):
for i, (j,k) in enumerate(zip(a_list, b_list)):
(jnp.cos(x) + knp.sin(x))
'''
the backtick character is on the same key as ~ on us-ansi keyboards
py def fourier(x, a_list, b_list): for i, (j,k) in enumerate(zip(a_list, b_list)): (j*np.cos(x) + k*np.sin(x))
it says in bold text that they are not quotes
...
Try `(3x)
the info box even has a chunk you can copy and paste...
we do -- .latex
py def fourier(x, a_list, b_list): for i, (j,k) in enumerate(zip(a_list, b_list)): (j*np.cos(x) + k*np.sin(x))
i had no idea
in this example say if I had multiple condtions like also that when Animals["objects] == "ready access" then return Animals["Id"] for Animals["Type"] how could I combine it
@desert oar you're on fire today btw. I'm inspired 😄
the box says to use 3 of them, no spaces
i'm caffeinated and want to think about other people's work instead of my own while my brain cools off 😆
def fourier(x, a_list, b_list):
for i, (j,k) in enumerate(zip(a_list, b_list)):
(j*np.cos(x) + k*np.sin(x))
HOLY SMOKES finally
i dont know what to do from here
softmax is the only obvious way to convert an array of arbitrary numbers into an array of numbers between 0-1 that sum to 1, which is required to interpret the numbers as a probability distribution
that's why I was thinking of using np.select
--sigma
I was going to say arr == arr.max(), but I guess more than one value might be tied for max.
But why does it have to be a probability? Why not treat the "correct character/word" as a number that the model must correctly output? Or a range of numbers defined by KNN...
I don't like the fact that the softmax function make the NLP models work with inputs and outputs which has sizes so big...
right, that's literally why it's called "soft max"
it depends. np.select is one way to do that. can you give a more complete example?
Also, @serene scaffold you said you would be impressed if my model could do something with this combination of MSE + KNN...
And...well...seems like you were right 
0/10000 Current Loss: 1289.3867072020757 Current Learning Rate: 1
Gradients Average: -0.15803318163855073
1000/10000 Current Loss: 974.5605831206061 Current Learning Rate: 0.010000000000000002
Gradients Average: 0.8818579553932097
...
9000/10000 Current Loss: 1015.7047142487523 Current Learning Rate: 1.000000000000001e-18
Gradients Average: -0.559652991596319
10000/10000 Current Loss: 935.3332222202407 Current Learning Rate: 1.0000000000000011e-20
Gradients Average: 0.5533991380623683
The model doesn't seem to learn at all
.latex \frac{x - \bar x}{\sigma}
.latex ```latex
\frac{x - \bar{x}}{\sigma}
yeah idk, seems broken
Yea
conditions = [Animals["refference"] == "ABCD",
Animals["objects] == "ready access"
]
choices = [
'ABC-{}'.format(Animals['Invoice']),
Animals["Id"]
]
Animals["Type"] = np.select(conditions,choices,Animals["Type"])
So how would one computer this in the function
Write a function fourier(x, a_list, b_list). a_list should be a list of coefficients for cosine functions, and b_list should be a list of coefficients for sine functions. The result should be an array of values matching the input array x.
print(original_sentence)
print(output)
('私 の a i は 話 し て 歌 っ た し て ゲ ー ム を し ま す <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> <EOS>')
('ん ア 前 リ ポ 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確 確')
It's curious, though... the range of values in the dictionary goes from -45 to 45(Originally it was -1 to 1).
And that kanji that keeps repeating has value 21.74.
i would say numpy is the easiest way, but you could also do it in a for loop and using the math library to compute the sines and cosines
will you be given the coefficients a_n and b_n or do you need to compute them yourself in a separate function?
could u teach me
the numpy way, like ive did
def fourier(x, a_list, b_list):
for i, (j,k) in enumerate(zip(a_list, b_list)):
j*np.cos(x) + k*np.sin(x)
i just dont know what to do from here
okay, so... how do you encode a word as a number?
Yes
I create a dictionary where each word is assigned to a number, then I use that to translate the numbers generated by the model to a word
sure, that's one way to do it
And, to make sure the model's output can be translated by the dictionary, I use a KNN
but this doesn't work as it returns a big string
think about this: what is a summation?
what exactly is your question? you forgot to use the index i, but otherwise it seems ok
i told you why. i explained what 'ABC-{}'.format(Animals['Invoice']) does and why it's not what you want.
yeah ik but what should I replace with instead
okay, so how does the model tell you what word it chose? it chooses word k by assigning the highest score to the word in the kth position of the output, right?
adding things togeather
i showed you: "ABC-" + Animals["Invoice"]. the .format method is not vectorized over pandas objects, and explained why you were getting a big string. but pandas does have support for +
yes, so what is YOUR question?
your code was already very close, except for the index i that you have not multiplied
what else is troubling you about it?
it might help to use the same variable names as in the math expression
Suppose that, in the dictionary, I got the word a assigned to the value 3.465 and the word b assigned to the value 5.231.
If the model outputs the number 4, this number 4 will be passed to the KNN that was fitted into the dictionary and this KNN will tell me "the model output is 4, which is closer to 3.465 so the model's output is 3.465".
Then, I pass this "KNN translation" into the dictionary and I'll get that the model's output is word a
the x in my defined function is unused, idk where im supposed to put it
bro IDKE man my teachers on crack half the time
the x is there, what do you mean?
so where do i put the i
yeah they meant i
well that's probably why mortta is confused
would it be
.latex should be [
\sum_{n = 0}^N a_n \cos(nx)
]
i pressed enter too early, oops
.latex $
\sum_{i = 0}^n a_i \cos(i x) + b_i \ sin(i x)
$
.latex should be [
\sum_{n = 0}^N a_n \text{cos}(nx) + b_n \text{sin}(nx)
]
man what did i mess up
yeah this bot sux lol
the bot is kinda dead i think
i forgot the sine term but you get the idea. notice i used different indices in the sum
using i as an index here is cursed because in fourier i is the complex unit, usually. so let's use n and N
def fourier(x, a_list, b_list):
for i, (j,k) in enumerate(zip(a_list, b_list)):
return j*np.cos(i*x) + k*np.sin(i*x)
so would this be right
sure, that should work. but you also have to iterate over x
x is a list
I know that AI has had practical application in helping devs write docs
thats the thing IDK WHAT x is supposed to be
do they use some service for tihs or train their own model?
a list.
do you know what a fourier series is?
no LOL SHE
what happens when the devs are working on proprietary software and don't want their code being used
she just gives us really hard equTIONs to write down a function for
ill give u another example
LIke wtf MAN IDK what that means
i did that one tho
well, there is a LOT of stuff in the task you were given that doesn't really make sense tbh
Yea i figured
she even gets mad
when i do math.pi
instead of numpy.pi
so what do i do next
well, i would say what's next is you email your teacher cuz her fourier series is wrong
this won't work unless x is simply range(L) or something, meaning you're fourier transforming f(x) = x
ok lets say if we just forget that she said fourier
could the function still be made
yes
So i say
you have to notice this is a vector equation
we just forget that fourier is a thing its just she defined the function that way
x has several entries
for each value of x, you have to apply this function with a sum of sines and cosines
ok
thats understandable
but how do we know how many times
we hve to add up sins and cosines
what tells us that
you already took care of that
the length of a_list and b_list
that's the capital N in the sum
Yea
man it works on overleaf. this bot is tripping
.latex should be
[
x_k = \sum_{n = 0}^N a_n \cos(nx_k) + b_n \sin(nx_k)
]
to iterate over each lists at the same time usefor a, b in zip(a_list, b_list): ... and since you also need the index wrap enumerate around it. iterating over a range might seem better in this case but it isn't. then you calculate using the math module and you append the result to a list, which you then take the sum of and return it
How do I learn the math needed for machine learning? What is a good book or website for someone whose math is intermediate
anybody can help?
you already did what you needed to do, your code was correct. now just repeat it for each value in x
grab a piece of paper and a pencil and go through the logic of the code you shared
Companion webpage to the book “Mathematics for Machine Learning”. Copyright 2020 by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong. Published by Cambridge University Press.
thanks🤝
@white jacinth there are also the usual sources for learning calculus and linear algebra: MIT open courseware and the 3b1b "essence of" series
there are also plenty of good stats texbooks, eg. OpenIntro Statistics
thanks for helping 
i know that AI has had a lot of practical application in recent times in helping developers write documentation
Do devs use some service for this? or do they train their own model(especially when the software they work on is proprietary and don't want to use public services)
with the + operator it still returned as a string do u think it's cause that column Animals["Invoice"] is returning a number so I just wrap with str to look like this "ABC-" + str(Animals["Invoice"])
if Animals["Invoice"] is a string, then + will work. if it's not a string, you will get an error saying that you can't add a number to a string
if the elements are actually numbers, you can use Animals["Invoice"].astype(str), which applies str to each element individually
as above, str(Animals["Invoice"]) turns the entire series into a big string which isn not what you want
in general you can also use .apply or .map to apply any python function to each element individually
Animals["Invoice"].astype(str) is equivalent to Animals["Invoice"].apply(str)
AHH OK
SO than after i did it for each value OF x
do i add them all up?
okay, but how do you get the model to output 4?
no, the result is a list
Uh...through forward propagation?
It's the output of the final layer, which is a linear/fully connected/dense layer
but would i not have to sum the list?
or does it not want me too?
How does reinforcement learning work in alpha tensor ?
read the task you sent me yourself 😛 the result should be an array equal to x
Yea
so would the sum of the array of x be the actual answer to the math function
but it just wants me to put it in an array
no, why would it be?
idk
BUt aslo you know how i use enumarate, it stars from 0, surley we want it to start from 1?
there is nothing for you to make up. now that we corrected the indices, all you have to do is read the math expression
you'll get the answer to this by just reading the math
okay, you're saying that the model output should be a single number? the problem is that 4 is not "bigger" than 3 here, and that 3.5 is not meaningful, because there's no natural sequencing of words when encoded as numbers this way
you need a different arrangement that avoids accidentally introducing concepts that don't exist in the underlying data
so the natural way to do this is to encode each word as a vector of all 0s, with 1 in the position of the word number
so if you have 10 words in your vocabulary, the 6th word will be all 0s, with 1 in the 6th position
moreover, it's very appealing probabilistically to model each word "slot" as a random variable with a probability distribution over all possible words
the 0-1 encoding is what happens when the probability of the kth word is 1, forcing all other probabilities to 0
it's a very convenient and natural way to work with this kind of data, and there's no better alternative
so how do you know the model predicts word 4? if the score in the 4th position of the output layer is the highest
if you just want to find the word with the highest score, you don't need softmax
but if you want to treat the model output as a probability distribution or voting % over words, then you need softmax to transform the output numbers
i dont understand what do u mean by read the math
and that transformation is important because our loss functions generally require a probability distribution or voting %. so we need that transformation for scoring our models even if we don't need it for making predictions.
idk if a function starts at n=0 or n=1
ah it starts at 1
so would i make my enumartacy start at one
the bottom part of the summation notation tells you where it starts
it looks like it starts at 0 to me!
that's what edd meant by read the math. you need to actually read it 😉
if you did want to start enumerate at 1, there's an option for it
??
!d enumerate
enumerate(iterable, start=0)```
Return an enumerate object. *iterable* must be a sequence, an [iterator](https://docs.python.org/3/glossary.html#term-iterator), or some other object which supports iteration. The [`__next__()`](https://docs.python.org/3/library/stdtypes.html#iterator.__next__ "iterator.__next__") method of the iterator returned by [`enumerate()`](https://docs.python.org/3/library/functions.html#enumerate "enumerate") returns a tuple containing a count (from *start* which defaults to 0) and the values obtained from iterating over *iterable*.
```py
>>> seasons = ['Spring', 'Summer', 'Fall', 'Winter']
>>> list(enumerate(seasons))
[(0, 'Spring'), (1, 'Summer'), (2, 'Fall'), (3, 'Winter')]
>>> list(enumerate(seasons, start=1))
[(1, 'Spring'), (2, 'Summer'), (3, 'Fall'), (4, 'Winter')]
``` Equivalent to...
WHAT
but you don't need that for this problem!
Can i invite the python bot to my disocrd?>
you're supposed to start at 0
yea it says below the sum of all signm
i think you'd have to host your own copy, it's open source. you can also use #bot-commands
so i cna just copy pasta?
is it python 3.7 or 3.10
it literally says it starts at 0
the math expression literally tells you it starts at n=0
YEA I read the maths :O
IM SORRy i dont take maths
is there like a calculator i could use to test im right
Ok, but in classic NLP models this is done only by the embedding matrix, right? It's the embedding layer that creates a scale of which word has a closest meaning to another?
So, a softmax wouldn't be really necessary, as long as I use an embedding layer?
Isn't this the same as one-hot encoding?
I don't understand why this would be any different from, like, index-encoding...or simply using a directionary of values like I'm doing
yes, and it's why one-hot encoding is used
index encoding is mathematically identical to one-hot encoding
@wooden sail does x have to be the same lenght as a_list and b_list
Like...this is like the model is predicting the correct class for the word 4th, using an one-hot encoding
But why can't I do this way?
you can, but mathematically (literally with matrices, vectors, and tensors) there's no way to do this other than with softmax or one-hot
"index encoding" and sparse one-hot encoding are both implemented efficiently in computers as lookups
but mathematically they are still one-hot encodings and vector/matrix multiplications thereof
that is, index encoding is a trick for efficiently implementing certain operations with one-hot encoded data
that's a different way to encode text. the 0-1 encoding is "sparse" and tends to have a huge number of columns in the matrix. a vector embedding tends to have many fewer dimensions and no sparsity.
what happens if you get a vector embedding as output that is not exactly one of the known words? how do you translate that vector back into a word?
KNN, which will convert that vector to the closest value possible
you can make up a new word that represents that particular linear combination of basis vectors for the embedding space... but that's not really useful
sure, you can use vector embeddings in that case
KNN makes this in a way that a range [X,Y] = word A
But...as far as I'm testing it, it doesn't seem to work...and I still don't get why.
but you need to be clear here: the output from a KNN model is not a vector embedding, it's a "word id", i.e. an index
sure, you can then go fetch the vector embedding associated w/ the matched word if you want
but in KNN the "fitted model" is just whatever index or tree structure you need for doing proximity lookups, and the output is just the id number of the matched entity
Yes, indeed. So it should work, right?
theoretically yeah. if your code doesn't work then maybe you're doing something wrong in the code.
where are you seeing that softmax is required for KNN?
My idea would be that, if the model could achieve "perfection", it would output the exact same value for that word in the dictionary. Since this isn't possible, I use the KNN so the model will get the correct output if it throws an output that can be close to that value
I'm not using a softmax. I want to avoid it so I won't be working with big outputs.
what do you mean by big outputs?
But my models aren't working this way. And every code and article I see about NLP won't use KNN, just softmax in the model itself...which gets big outputs
Output with sizes/shapes that are too big.
yeah, that's why people use dense embeddings in the first place
If my vocabulary dictionary has, like, 100 words. If I use the classic method(one-hot encoding)+softmax, my model will generate an output with shape [1, 100] when trying to predict each word, right?
What if I'm using a vocabulary with more than 10.000 words?
I want to avoid softmax because of that
you need softmax for pretty much any other model because that's how they're all designed! in a model that needs to predict a word by performing a sequence of matrix multiplications (like a neural network) there's no sensible way to structure the output layer otherwise
even transformer models, which mostly operate entirely on pre-encoded dense word vectors (not one-hot inputs), still need a softmax output layer
Then why does in price prediction and computer vision softmax isn't necessary?
SRGAN relies purely on ReLUs
because softmax is necessary when you have to choose between "categories"
there's no "greater than" or "sum" in such a space of categories. the only way to encode that sensibly is one-hot.
That's why I'm trying to not use a "category", but simply assign a value to a word.
whereas prices are real numbers (or at least an approximation thereof) so you can just output a number
in computer vision, softmax is necessary for classification tasks: choosing among categories that lack additional mathematical structure
yes, that's literally what vector embeddings are. but the problem i stated above is: what happens if you produce an output that doesn't correspond exactly to a known word? either you have to post-process the results with KNN, or make up a word to represent the output.
the latter option is silly and not feasible or useful
the former option might be interesting but i suspect that people don't do it because it's completely okay to have an output layer with 10k or 100k things in it
it's a bit like predicting whole numbers by real numbers + rounding. you can do it, but the results might be funky
Hm... I see... Then I'll try see if I can use a vector embedding layer. I hope Pytorch has this one.
yes, pytorch and keras both have stuff like this built in
remember: a vector embedding output layer is literally just a densely connected layer
Oh...
there might be some research on post-processing "dense" outputs with KNN to recover "sparse" outputs, but i am not aware of it
think about it mathematically
So that explains the Attention layers...
Why they're mostly composed of dense layers...
Then, my embedding vector layer would receive a single value, multiply that value by the same number of weights as my vocab size, and then output a single value?
what do you mean by a single value?
if you want to create your own embeddings, then you consume one-hot encoded input (or tf-idf or hashed encoding or something else) and emit a dense vector for each one-hot encoded input
Random noise(Generator Text), word encoded value(as I made my dictionary)...
i'm not sure what you mean
you always need either one-hot encoded input or index-encoded input (which is equivalent to one-hot as i explained above)
you can use something else like tf-idf or hashed encoding too if you want, but the point is you need some way to encode the words as numbers
if you just map each word to a position on the number line you introduce fake and arbitrary structure among words, so you cannot do that
therefore you must use something else, like one-hot or hashing or tf-idf
My input would be a sentence that has been encoded like I did with my dictionary(values between -1 and 1). Then I could pass this into a linear layer that would have output the same size as my vocabulary size(my dictionary length), and then pass this into another linear layer to output a single value.
encoding each word as a position on a number line is creating a 1-dimensional vector embedding. unless you do that in a way that finds sensible structure (e.g. with a neural network using one-hot data as input), you will be introducing arbitrary artificial meaningless structure into your data and your results will be useless
there are other options besides NNs, e.g. matrix factorization
Can I use something like this?
{'私': -45.0, 'の': -43.98876404494382, '犬': -42.97752808988764, 'は': -41.96629213483146, '骨': -40.95505617977528}
sure, but how did you get those numbers?
Each character assigned to a single value
those numbers are already a vector embedding with 1 dimension
def _create_dictionary(self, words):
idx2word = []
word2idx = {}
for word in words:
if word not in word2idx:
idx2word.append(word)
word2idx[word] = len(idx2word) - 1
word2idx['<EOS>'] = len(idx2word) # Adding an End of Sentence tag to improve model's accuracy
return word2idx
maximum = max(dictionary.values())
for word, value in dictionary.items():
scaled_value = (value-0)*2.0 / (maximum - 0)-1.0
dictionary[word] = scaled_value * ((len(dictionary)+1)//2)
right, this is introducing arbitrary mathematical structure where none exists
you've just made up a vector embedding that has no real-world meaning
I just multiplied the [-1,1] for 45 to give the model a wider range to "miss"
you randomly scattered words on a number line. or maybe you scattered them alphabetically, or according to some other system that has no inherent structural meaning.
but your model doesn't know that your numbers are made up and meaningless. the linear algebra underlying the models will find meaning where none exists, and your results will be entirely arbitrary and dependent on the made-up number line scattering you did.
Like... suppose that the word a has value 0.51 and the word b has value 0.52, if the model outputs 0.516 KNN will translate this to b, but if it outputs 0.514, to a
if you swap the order of two words in the list, it would change your model outputs! that's totally messed up
yes, but that's totally arbitrary
why should the order of the alphabet matter? the alphabet is based on some shit the phoenecians made up 4000 years ago, it's not meaningful in understanding text.
why should a and b be next to each other? what if you put all the vowels first?
Wouldn't the relation between those numbers be deciphered by the vector embedding layer?
moreover, what does it mean to add a and b together? you get some number, but does that number make sense? note that -45, 45 isn't even a valid vector space because it's not closed over addition, so none of the math will work anyway
no, because the vector embedding layer will see a bunch of numbers on a number line and assume they should be treated exactly like numbers on a number line. the math is the math, you can't change it.
The codes I see people doing usually relies on numbers that go from [0, vocab_size-1], so...there's still the same problem you're saying
But I believe this is deciphered by the embedding matrix, isn't it?
because word k is mapped to [0, ..., 1, ..., 0] where the 1 goes in the kth position
you don't actually feed a sequence of numbers into the model
Hm... I see...
there's no little math gremlin deciding how to interpret your data on a case-by-case basis
But shouldn't the model be able to detect that "in this case, the number 0.51 is the right one. On this other case, the number 20.4 is the right one"?
the math is what it is. the axioms and theorems of linear algebra, real analysis, etc. are what they are. when combined with the binary operations that computers are capable of, you are stuck using whatever tools you are given. you cannot freely reinterpret them. they do what they do, and you must put them together exactly as they are to make useful outputs.
because linear algebra simply doesn't work that way
however tree-based models such as random forests and gradient boosting do work that way, because they work by splitting the number line in half over and over at some optimal point
however, that still depends on the model having a meaning for "bigger than" or "less than"
if a is not meaningfully "bigger than" c, then splitting at "halfway between a and c" is not meaningful
I see... so this relation between "bigger than" and "less than" can only be breaken by using categories?
precisely
Then...how could I use vector embedding?
[1, 0] is not bigger or less than [0, 1], it's just different. and in fact, in a linear algebra sense, they are the same size because they both have a magnitude (or "norm") of 1
are you trying to create one, or use one as input to a KNN model, or something else?
Uh... Idk. I want my model to receive a vector for a word and output the vector for another word.
For now, I'm just testing those ideas with a translator ENG-JP model.
you're trying to make your own english-japanese translator?
Yes, just a small one, for studying.
My vocab size is like...6 sentences. I don't want a google translator
But then... I suppose I'll have to try other ways to create my vocab dicts...for both the input(ENG) and the output(JP)
there are many pre-trained vector embedding models for english, and probably a few for japanese. look in the Spacy library for ones you can use easily.
I don't want to use pre-trained content
you won't be able to do much with 6 sentences
I don't have to do much.
If I wanted, I would use more sentences
I'm actually trying to learn more for a Reinforcement Learning model. I've tried the same idea for a RL model I'm testing.
well you could start with the basics and use word2vec
But, after this, I'll have to remake its data, since each action for a RL model is a category, they're not "bigger than" or "smaller than", just like happens with words
What does the word2vec does that is different from what I'm doing with my dictionary?
Nevermind. I suppose that it'll attribute closer values to related words (man, woman, girl, boy)
Again, this would be done by an embedding layer, right?
I think my only problem, then, is how to convert my string to a value...so this value can be processed by an vector embedding layer.
precisely, that's exactly what vector embeddings do. it answers the question of "what does it mean for two words to be near each other" by specifically trying to build a model in which words that appear near each other have numerical representations that are near each other
note that character n-grams will probably work better for international text than just words
look into fasttext, it's a fast and easy to use implementation of this
you have to start with one-hot or indexes (again, indexes being a tidy way to represent one-hot in a computer)

There's no way to escape from softmax
Even to run away from it, I'll have to rely on it.
there's no way to escape from the fact that if you have 10k words in your vocabulary, at some point somewhere you need to actually represent all 10k of those words
you can think of softmax as an implementation detail inside fasttext or gensim or whatever you want to use to create your word vectors
remember: stick to the math
I appreciate it!
better than sticking to meth, amirite?
Also...uh...
I would use index-encoding to train a vector embedding layer, which would receive, as input, the index, let's say...3.
This 3 would be passed into a linear layer/FCC, output something with size (1, vocab_size), this output would be passed into a softmax, and then I'd pass this output to a Categorical Cross Entropy Loss, comparing it with the input(the index 3)?
Or should I just pass this 3 to a FCC which would output something with size (1, vocab_size), pass this output to another FCC which would output a single vector?
the 3 would be implicitly converted to [0, 0, 0, 1, 0, ..., 0] and then multiplied by a weight matrix. you can implement this efficiently as looking up a weight vector in a dictionary, but mathematically that's what it's doing.
pass this 3 to a FCC
that's what i spent all this time explaining would not work
3 -> [0, 0, 0, 1, 0, ..., 0] -> multiply by weight matrix -> dense output
Ok, even if I use index-encoding?
Oh wait...isn't this more or less what the Self-Attention does, in a nutshell?
index encoding is a computational trick to perform the "3 -> one-hot -> multiply by weight matrix" part without actually having to construct the one-hot array
but mathematically it's the same
no, this is how you produce the vectors that are used as inputs to a transformer, including the "self-attention" units
self-attention works on "stacks" of dense vector embeddings
mathematically it would work on one-hot data but the results wouldn't be good at all
like the matrix multiplications would technically work fine, but the results wouldn't be useful, and it would be computationally horrible
Interesting...
What would happen if I apply softmax to this 3 before passing it as input for the embedding layer?
Would it encode it in some way? Or would it simply mess everything up?
i suggest looking at the definition of the softmax function and coming to a conclusion yourself 🙂
i need to get back to work for the afternoon. but again: look at the math. ideas follow from math.
math is not a magic gremlin with opinions. it is a system of strictly defined rules.
if you focus on the math you'll never be wrong.
I know... but I was testing my wrong idea based on the math of neural networks
:)
Thanks for the class, teacher!
is this the right place to ask about plotly and pandas stuff?
yes
You can use screenshots to show plots, but please don't do it for dataframes. do df.head().to_dict('list') and give the text.
not just print(df)? it's got all rows and columns neatly arranged
it's not copy and pastable if you want to use it in code.
well, the list is a bit too big to copy and paste into here
!paste
Pasting large amounts of code
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
example
In [1]: df = pd.DataFrame({'a': ['hello world', 'goodbye world'], 'b': [1, 2]})
In [2]: df
Out[2]:
a b
0 hello world 1
1 goodbye world 2
In [3]: pd.read_clipboard()
Out[3]:
a b
0 hello world 1
1 goodbye world 2
print(df) will also clip some of the columns, making it useless if those columns are critical for understanding the question.
they probably are not, but here goes: https://paste.pythondiscord.com/idixumihan
the error I'm getting is ```ValueError: Value of 'y' is not the name of a column in 'data_frame'. Expected one of [0, 1, 2, 4, 6, 7, 8, 9, 11, 13, 14, 16, 18, 19, 20, 21, 22, 23, 24, 26, 27, 28, 29, 30, 31, 32, 33, 35, 36, 37, 38, 39, 40, 41, 43] but received: Elo
@serene scaffold I'm stumped what I should put in for y=, but the data is rather simple, so I don't know why I'm having trouble
@latent dirge can you also do print(df.head().index) and put that in the chat?
'Kristina Kucova'],
dtype='object')```
list seems incomplete, though
here's the print(df) for what it's worth:
0 1 ... 41 43
Robin Montgomery 1500 1500.000000 ... 1465.762038 1465.762038
Ann Li 1500 1531.631651 ... 1455.911286 1455.911286
Kaia Kanepi 1500 1500.000000 ... 1560.092503 1560.092503
Camila Giorgi 1500 1500.000000 ... 1507.349554 1507.349554
Kristina Kucova 1500 1484.000000 ... 1452.818621 1452.818621
... ... ... ... ... ...
Harmony Tan 1500 1500.000000 ... 1434.059565 1434.059565
Julia Grabher 1500 1500.000000 ... 1508.970744 1508.970744
Venus Williams 1500 1500.000000 ... 1440.504802 1440.504802
Magdalena Frech 1500 1500.000000 ... 1457.160548 1457.160548
Anastasia Pavlyuchenkova 1500 1500.000000 ... 1484.493448 1484.493448```
yes, because of .head(). what is Elo supposed to be? the index for a row?
initially I thought x= and y= were simply titles for the corresponding axis, so this is a holdover, but I don't know what to put in instead
maybe I should use a list made out of the first column
Is this basically what you're trying to make (but one line for every player, not just the first five)?
yes, exactly
all I did was df.T.plot.line()
well, how do I get the same result with plotly express?
idk tbh. never used plotly
you might have to reshape the data so that each row is one point rather than a whole line
basically transpose the entire thing?
no, .T tranposes it. in this case, you would melt it.
In [5]: df.melt()
Out[5]:
variable value
0 0 1500.000000
1 0 1500.000000
2 0 1500.000000
3 0 1500.000000
4 0 1500.000000
.. ... ...
170 43 1481.045325
171 43 1545.959786
172 43 1483.237985
173 43 1468.006673
174 43 1570.508573
so I'm no better off than before 😕
I guess I can download plotly
Hey
im guessing you guys are the scipy.optimize.curvefit people?
I got a question
don't worry about it, I'll keep asking, maybe someone else knows what to do
can you show the import statement for px
don't wait for a commitment to ask.
import plotly.express as px @serene scaffold
Erm
my cureve fit doesnt work and i dont know why
the graph is complicated, but i dont see why it means it shouldnt work
my covariances are fked up and i end up just getting a straight line
can u helP?
what are you doing so far? but yes, having a "complex" graph can make the problem difficult if it can't be linearized or otherwise parametrized in a "nice" way
@latent dirge I got this just doing px.line(df.T) , where abcde are what I'm using for the names of people in your data.
Ive just did everything, ive unpacked the data, plotted it, made a function, use the scipy.optimize.curvefit function but it doesnt give nice values
yes, it works, that transpose thing was what I was missing, but now I have to figure out how to do the animation, but I'll ask another time
show your code
ill show my graph and code
Good luck and thanks for your cooperation!
import scipy.optimize
from matplotlib import pyplot as plt
import numpy as np
import math
from scipy.optimize import curve_fit
wavelength, intensity, uncertainty = np.loadtxt('line7.csv', unpack = True, skiprows = 1)
plt.errorbar(wavelength, intensity, yerr = uncertainty, fmt = 'ob')
plt.savefig('line7.JPG')
def line(lam, lam_l, w, a, c):
return a*np.exp(-np.log2(2*((lam - lam_l) / w))**2) + c
popt, pcov = scipy.optimize.curve_fit(f = line, xdata = wavelength, ydata = intensity, sigma = uncertainty, p0 = [1,1,1,1])
print(popt)
x = np.linspace(375, 380, 100)
plt.plot(x, line(x, popt[0], popt[1], popt[2], popt[3]))
plt.show()
my code
my graph
why is my curevfit line
just straight
i'm surprised that doesn't give an error, since you gave only 4 parameters as p0
thats how it works
but anyway, the way you're optimizing, you're telling the function to optimize all of lam, lam_l, w, a, c, which is not what you want
oh that's my bad, i misremembered how it works then. let me read the docs really quick and get back to you
all g
been ages since u done this?
def miniMax(BoardSpot, depth, isMaximizing,alpha,beta,playerPiece):
winCheck = cpuCheckWin()
if(playerPiece == 'x'):
opponentPiece = 'o'
if(winCheck == 'x'):
return 100
elif(winCheck == 'o'):
return -100
elif(winCheck == 'tie'):
return 0
if(playerPiece == 'o'):
opponentPiece = 'x'
if(winCheck == 'x'):
return -100
elif(winCheck == 'o'):
return 100
elif(winCheck == 'tie'):
return 0
if(isMaximizing):
i = 1
bestScore = float('-inf')
while(i<=len(BoardSpot)):
if(BoardSpot[i] == "-"):
BoardSpot[i] = playerPiece
score = miniMax(BoardSpot,depth-1,False,alpha,beta,playerPiece)
BoardSpot[i] = "-"
bestScore = max(bestScore, score)
alpha = max( alpha, score)
if(beta <= alpha):
break
i+=1
return bestScore
heres my code for alphabeta ai in tictac toe but my output is ending the game very early and I am stumped
heres the output
what are the values of wavelength?
u want the .txt?
seems you learn the params using wavelength as the xdata, but then you try to plot using x = np.linspace(...) instead. perhaps the exponential is already approximately zero for your values of x
Wavelength (nm) Intensity (arbitrary units) Uncertainty (arbitrary units)
375.00 0.1300 0.1000
375.10 0.0648 0.1000
375.20 -0.0143 0.1000
375.31 0.0651 0.1000
375.41 0.0791 0.1000
375.51 0.1587 0.1000
375.61 0.1839 0.1000
375.71 0.1931 0.1000
375.82 0.1286 0.1000
375.92 0.1885 0.1000
376.02 0.0246 0.1000
376.12 0.2253 0.1000
376.22 0.1513 0.1000
376.33 0.0702 0.1000
376.43 0.1489 0.1000
376.53 0.0924 0.1000
376.63 0.2132 0.1000
376.73 0.2520 0.1000
376.84 0.3187 0.1000
376.94 -0.0385 0.1000
377.04 -0.0382 0.1000
377.14 0.0761 0.1000
377.24 0.2059 0.1000
377.35 0.4269 0.1000
377.45 0.6331 0.1000
377.55 0.7251 0.1000
377.65 1.1436 0.1000
377.76 1.2805 0.1000
377.86 1.0059 0.1000
377.96 0.7350 0.1000
378.06 0.3562 0.1000
378.16 0.1891 0.1000
378.27 0.1523 0.1000
378.37 0.1492 0.1000
378.47 0.1214 0.1000
378.57 0.1121 0.1000
378.67 0.0330 0.1000
378.78 0.1378 0.1000
378.88 0.1122 0.1000
378.98 0.2129 0.1000
379.08 0.2199 0.1000
379.18 0.1185 0.1000
379.29 0.0625 0.1000
379.39 0.0361 0.1000
379.49 0.1423 0.1000
379.59 0.1077 0.1000
379.69 0.0656 0.1000
379.80 0.1044 0.1000
379.90 0.0380 0.1000
380.00 0.1698 0.1000
hmm that should work
Yea i know it works for all my other curves that dont look nice
I just dont know why it doesnt work for this
could be the initial guess is very bad, too. curve_fit uses levenberg-marquardt, a quasi-newton method, for optimization. if you start far away from the true solution and the problem is non-convex, you can end up at a local minimizer that may be bad
Yea i thought of this too
But i cant just guess 4 values
that have to be totally different im guessing
how does one do that
are you sure that log2 is needed in the argument of the exp btw? you sure you don't want this to be a gaussian?
aight, you already have a model then.
YEa
well, for example. a good guess for C is 0. for lam_l, something like 378
The question is just asking me for the optimal values from the scipy.optmize
Yea but i keep getting inf for pcov
and for w, go with 1 i guess
Ok
378ish
Nope
all it does, it give me my optimal values
as my guesses, and my uncertainties as infinite
honestly i might as well just 5 points on the equation and solve simulataneously
good luck with that 😛 the equation is highly nonlinear, and because the uncertainty is nonzero, there is no guarantee it has an exact solution anyway
well
im out of options, hopefully there is a calculator online
but i dont know what else to do
u know anyone else whos smart in this discord surley there is
is the square here acting on the argument of the log or on the result of the log?
squaring everything inside log base 2
you could says its 2 log base 2 if you wanted to bring the square down
mhm
how can I make a plotly animation out of this:
whatever I put in for animation_frame= only gives me an error
what did you change
the guesses a bit
but like its a bit to the right
why is it only doing half the graph
the log goes negative
than how does the equation even work
i#m trying to figure that out
cuz one can do a change of log base there to take a natural log instead
and then it essentially turns into a quadratic divided by log2(e)
which doesn't seem to describe the data well
yeah
ill try ln
@wooden sail
can you share your code again of your model function
this is super fishy
def line(lam, lam_l, w, a, c):
return a*np.exp(-np.log(2*((lam - lam_l) / w))**2) + c
i changed it to log(e)
but it was log(2) b4
How does it meant to be root3 for the inner triangle height
I feel embarrassed 16 year olds can do this and I can’t
try the homework or maths discorrd
Hello,
I'm trying to create a tornado diagram to visualize my features' correlation, I'm not super proficient in matplotlib yet. Does anyone have some suggestions or some generic code where they've done something similar?
try making a triangle whose vertices are at the centers of the circles. then note that the height of this triangle + 2r is equal to 2
the triangle is isosceles with side lengths 2r
I know that
off the top of my head, at least. do double check it
that'S enough to solve for r though
R is based off of that triangle
no
No people answers got in meters
that triangle is based off of r
I swear it, 20 commenters replying in meters using trig
Make me feel dumb
It’s for kids
it is
you just said yourself you have an equation for the total height in terms of r. just solve for r now
what's the problem?
R = h? Wtf
2r
yes
Base = r now
Need 90 deg for pythag
that's if you take half of the isosceles. that gives you a special triangle
a 30 60 90 with hypotenuse 2r and base r
sure. an equilateral is also isosceles, but yes
Wait so 2r is also the lengthy of the non arc part of band
If it’s perpendicular??
Make a rectangle down
From corner to corner
6 so far meters
So now it’s purely the three arcs added to that?
Why are people doing pythag and getting root3 r
Where tf is root3 from
Wudnt it be root1
... because of the 30 60 90 triangle i drew there and told you about before
sqrt( (2r)^2 - (r)^2 ) = sqrt(3r^2) = sqrt(3) r
Why is 2 - 1 three
because you're forgetting order of operations
Ah okay
Hey thanks for the help, me and my friends are gonna attempt this tmr with friends
Yes that is why I’m doing algebra for children bro….
Thanks for the help
Back to precalc in a week anyway if I stay on track
sorry i couldn't help with the scipy opt problem though, my brain is fried for the day. best of luck with that
I’m very excited to get to approximating functions next year, that’s the main thing I’m looking forward to in my book
For some reason in the uk that is further and not core maths so never saw it before but it’s only few hundred pages in
Don’t be ridiculous you tried your best and u did kinda help me
still better than i can draw
above is generated from a gan
its supposed to be a person
chimera
i was trying to use tensorflow to make an rnn to predict stock prices to learn more about rnn's. but it's weird. i made this csv file of dummy stock price data (data.csv), and price_predictor.py trains it on it. the model is 4 lstm layers (64 units and relu activation each) followed by 0.2 dropout (each), and a 1 unit dense layer with sigmoid. data.csv is split into a numpy array of shape (60, 1), which represents 60 days (rows) of the stock price. the y value for that is the day after. the test data is the 365 days (rows) of data. it trains okay, loss drops to around 0.100 (for 1 epoch training) but accuracy is also constantly going down. the main problem is then when i run model.predict() it returns an empty numpy array? why is that?
https://paste.pythondiscord.com/eyicahisub (price_predictor.py)
here's some code to visualize the dummy data as well py from matplotlib import pyplot as plt import pandas as pd import numpy as np dataset = pd.read_csv('./data.csv').to_numpy() plt.title('Test stock price data') plt.xlabel('Price') plt.ylabel('Days') plt.plot(dataset, color='blue') plt.show()
adj_matrix: numpy.typing.NDArray[np.float64]
adj_matrix = self_cos_sim(sentences_encs)
adj_matrix = np.triu(adj_matrix)
np.fill_diagonal(adj_matrix, 0)
adj_matrix[adj_matrix < threshold] = 0
This create-then-modify-3-times of adj_matrix feels like an anti-pattern.
Is there a way I can optimize this, fuse some or all operations?
Please ping me if you reply
i'd like to know too
Learn how to use TensorFlow 2.0 in this full tutorial course for beginners. This course is designed for Python programmers looking to enhance their knowledge and skills in machine learning and artificial intelligence.
Throughout the 8 modules in this course you will learn about fundamental concepts and methods in ML & AI like core learning alg...
@undone ocean @silent fable
!d scipy.spatial.pdist
No documentation found for the requested symbol.
That's pairwise distance...I don't see how it'd make my snippet more direct?
it looks like you are computing pairwise distance using a function you wrote?
all the rest follows from not working with distance matrices efficiently
follow the links to read about squareform
use scipy to efficiently compute the cosine distances (?) will also produce the right data structure
then you only have two steps: 1) compute the distance matrix, 2) zero out the elements below threshold
hey guys i could use some advise here
import spacy
import xml.etree.ElementTree as ET
tree = ET.parse('topics-rnd5_covid-complete.xml')
root = tree.getroot()
for topic in root.findall('topic'):
nlp = spacy.load("en_core_web_sm")
for topic in root.findall('topic'):
query = topic.find('query').text
text = query
doc = nlp(text)
print(doc.text)
Hello I have a question: I have an XML document were I want to change everything inside the attribute query and save it later (changing in terms of let it run into the tokenizer of spacy)
right now i was able to pull up the xml and acess the attribute "query"
but when I run the tokenizer It only give me the output of the last row which is can be solved by a for loop?
but even if how can i replace the values inside the attribute then?
https://stackoverflow.com/a/40244340 does this help?
- I need similarities, so I guess I'd have to do
adj_matrix = 1 - pdist(sentences_encs, 'cosine') adj_matrixis used to create anetworkx.Digraphsuch asnx.Digraph(adj_matrix). Not surenetworkx's api can figure out the same graph from a(m,)-shaped sequence of pairwise distances
you can call squareform on it to get a matrix back out of it. but another option would be to construct a scipy sparse matrix, if this is a bigger dataset
in general i think the answer to your first question is "no", unless you do a double loop manually (maybe using numba if you need it to be fast)
import xml.etree.ElementTree as et # import the elementtree module
root = et.fromstring(command_request) # fromString parses xml from string to an element, command request can be xml request string
root.find("cat").text = "dog" #find the element tag as cat and replace it with the string to be replaced.
et.tostring(root) # converts the element to a string
So my cat would be „query“ here but for my „dog“ how can I replace the string with my tokenized items if I have 50 query attributes that I want to change
I'll look this up tomorrow, thank you
Anyone?
predictions = []
prediction = model.predict(x_test)
...
print(predictions)
do you see it?
How is AMD doing in the AI/DL space these days? Have they made any gains on Nvidia or is it still pretty much a waste from a consumer perspective?
It would be a bit more work for a development to use amd accelerators as most stuff use Nvidia. I know only one local company that uses AMD GPU for deep learning, they use tensor flow. After each tf release someone needs to port it to amd platform. Don't remember who does it
Does anyone have recommendations on which situations (research, enterprise development, etc.) call for what machine learning/artificial intelligence framework? And possibly which ones would be best for more specific fields like medical research or robotics vision
research for example. More specifically, the generic battle between Tensorflow and PyTorch, or this one new one I've seen called Jax, but I may be unaware of any other good ones Idk. I'm just seeking information and am curious to learn more about the space. I've read some articles but would like some opinions here.
I wanna know
Like this,I imported the iris data set from the sklearn
How can I further use pandas to load the iris data set?
I personally prefer use pandas to load the data set
What format is it now?
type(data)
You want
load_iris( as_frame=True)
I just read it here https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html
Examples using sklearn.datasets.load_iris: Release Highlights for scikit-learn 0.24 Release Highlights for scikit-learn 0.24 Release Highlights for scikit-learn 0.22 Release Highlights for scikit-l...
trying to plot this rn using python
if anyone in here could help with that, hmu i have the code ive written so far i just need to make adjustments
It's getting better but still not as good as cuda
pytorch now has official rocm builds as of 1.8
and a lot of other libraries have simple ports that allow them to use rocm
for the most part though most places use nvidia still
why this error showing?
as the image says, your array is only 1D
print array.shape
hi all! Probably a silly question but… I am desperately looking for the actual source code of this scipy function https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.kv.html
I was looking through scipy‘s github repo but was unable to clearly identify where scipy.special.kv is implemented. And the existence of various kinds of Bessel functions doesn’t make it less ambiguous 😅
the docs say it's a wrapper for a library function called zbesk, so maybe search for that (and you might need to find the source of the wrapped library)
Can someone help me with Bagging ?
basically im trying to use bagging without the sk learn functions
does it need to be accurate
just plot functions that look like it
so some log graphs?
not sure what u mean
there arent specific values?
im not just plotting random values, its reading exact values from the csv and plotting those
then simply input data much easier
yeah the problem is the countries
all the different countries are in the same column
I want them to be their own separate lines
i have no idea how to go about doing that
one column per country?
just a quick stack over flow search, does this work: df = df.pivot(columns='movie', index='date', values='value')
youd need country in the columns parameter
comes up key error date
ah wait sorry
i removed date and values and it shows up this error
KeyError: "None of ['Country'] are in the columns"
Hi,
I need help in detecting wires coming from poles. In outdoor conditions, i know I can use edge detector like canny edge filter and do some processing to detect horizontal edges but my concern is the wire are very thin. Is there any better way to detect the wires? I have attached a sample image.
hello i need help, like if i Want to execute a function "EXAMPLE" with a hotkey like if i pres F10 it will execute "EXAMPLE"? how to Do that?
hey guys I'm trying to try some emotion recognition from text, anyone got a model I can use as reference? or a prebuilt model? (I couldnt find any on google)
are there any alternatives of %store magic command in python which can be used to float variables from one notebook to another?
what is your use case?
looks like store is based on pickle,
you can ofc. pickle your data manually, but not sure if this would offer any benefit to you
I want to specify an s3 path(including timestamp) in one notebook and use that s3 path in another notebook to load data. I don't want to use %store to store the path link because multiple people are working on same project and thus, the variable name could be disturbed by anyone. Can you please tell me about the manual process that you mentioned?
I would like to append dataframe to CSV file present in Aws S3, using AWS lambda, can anyone help me regarding this. #python
This is an example of storing a dictionary to a file using pickle,
import pickle
db_to_store = {"a string": "/tmp",
"a float": 1.2}
with open('my_db.pickle', 'wb') as f:
pickle.dump(db_to_store, f, pickle.HIGHEST_PROTOCOL)
and this is how to load it
with open(r"my_db.pickle", "rb") as f:
db = pickle.load(f)
print(db["a string"])
This I got it.
My CSV file is present in AWS S3 BUCKET
I need to append in that file
Thanks. Please let me know if there are any other ways/commands to achieve the same as well
is transformers better than rnn and LSTM, no exceptions?
Hi, beginner here.
I'm doing some price analysis (pandas + numpy) and I've noticed that one of my function is quite slow. The function checks if the current price is above or below different levels that are calculated per dataframe. It isn't the cleanest function I presume, but it would be cool to find out if it is possible to improve the performance of it with either cleaner code or using numpy in some way? A, B, C, D are placeholders. The function "rates" price strength.
https://paste.pythondiscord.com/ijajefehem
are neural networks hand written?
pretend i'm a researcher. I have a problem I want to solve. Do i usually take upon an existing model and iteratively improve on them?
Or do i start frmo scratch and build upon it from the bottom?
not sure what you mean by "hand written". libraries like pytorch do most of the math.
hand written in the sense of like
not the implementation code
but the architecture of the network
how many layers you want and which building blocks you use
depends on the project
im guessing researchers are more likely to want to design new architectures right?
where as ML devs are more likely to fine tune something to a business need
It really depends. "Attention is all you need" was a groundbreaking paper, but it was written mostly by Google employees.
@desert oar
Though I suppose those Google employees were hired specifically to be university-style researchers
I'd be surprised if that wasn't the case
how can I create an algorithm with reinforcement learning?
that question sounds a bit vague no?
is there a specific problem you want to approach?
Anyone know how to work with jinja2
I know a bit about Jinja2, but rookie. What are you trying to do?
Why do you need to do that?
Why not just store in some pickle file? Or even simple file?
yeah, in research, this is usually what the effort goes into. you come up with an architecture that seems sensible due to the statistical properties it assumes and/or enforces, and iterate on it
the individual building blocks are usually well understood and recycled as needed. if you do bleeding edge research though, you make entirely new things and you also need to implement them yourself
Hi all , what would be the best method to debug / figure out why my keras models loss / MAE is so high ?
Trying to use a model to predict Emissions from certain columns in a dataset and it can get quite close atm but i cant get the MAE lower than 11
Fuckkkkk
my flowchart is something like this:
-
data processing mistakes. units are wrong, forgot to scale something, applied scaling twice, forgot to apply some transformation to outputs. also blatant bugs, like swapping train and test sets.
-
model fitting didn't converge. it's bouncing around and not going down, or it's going up, or there were numerical warnings reported in the training process. this means you have a seriously bad issues either with the model not fitting the data, or something poorly designed in the training process (e.g. bad hyperparameters, badly-behaved data).
-
the model actually doesn't fit the data. this is where you have to start doing actual data analysis to figure out what you did wrong: looking at pairwise association measures, doing statistical analyses, looking for erroneous data points (hesitant to say "outliers"), fitting the model to simulated data to make sure your model design even works, etc.
My process so far was to gather the data and split into predictors and target and then make sure the data is clean so no nulls etc , i have attempted to normalise / scale my predictors but idk if its the best method for it as seen in the screenshot. When i fit the model the MAE does change by going up and down so that could be the issue maybe ?
This shows the value going up and down
In terms of my model i don't think theres any issues with it unless its to do with the density values , this is whats causing me a bit of a headache
Hie. I am working with ngrok flask in python for web development.
I am facing an error please help!
So i installed ngrok and ran for a text. It ran nicely. But when i try to render it for a template page. It is unable to access the webpage. Internal error. And when its stuck in the ngrok directory where there are folders like bin, boot, dev etc.
Could someone please help me out here.
Hi guys when calculating the standard error should you do ddof=1 or leave it as the default on np.std()?
i heard that you should do ddof =1, however i thought that the standard deviation in the calculation uses the population variance, so i wouldve thought theres no ddof=1. Since a population variance doesnt have n-1, where as sample does. Thanks
i thought that the standard deviation in the calculation uses the population variance
In the what calculation? If you're using a calculation that needs the population variance, then use the defaultddof=0. If you need sample variance, useddof=1.
If you mean what's used when people calculate standard errors... I'd hope it's the sample variance, because that's the statistically correct thing to do - dividing by n-1 instead of n (sample variance) gets you an unbiased estimator of the real variance of the distribution.
@desert oar I'm testing what you told me yesterday, but...I'm still having the same issue with my output(the model outputs a single character every time). Maybe I messed up something? Or maybe I should get rid of LSTMs and try to use exclusively FCC layers?
My dictionary of words/vectors is something like this:
{'私': 6.819466590881348, 'の': 6.763469219207764, '犬': 7.155118465423584, 'は': 7.7465996742248535, '骨': 6.333333492279053, 'が': 6.707737922668457, '好': 7.280846118927002}
Each value is a vector generated by my word2vec model, which receives as input a word that has been one-hot encoded from a word list and outputs a vector. The model is actually a single FCC layer, since my data is so small.
I didn't use a softmax at all because Pytorch's Cross Entropy has a LogSoftmax included in its function.
With this dictionary, I get a vector for an english sentence and pass it as input for the translator model(which has LSTMs layers), and then try to output a vector for a japanese sentence.
Oh, ok...Using FCC layers instead of LSTMs, and using more layers with more features generates a better output...though I feel like I'm trying to kill a bird with a rocket launcher...
Hey guys, I was wondering how to read from a text file where each the index and number of columns in each row is uncertain?
my goal is to find the highest score of chemistry, it is a bit tricky...
i got a df with ~200 datapoints, currently i search for >= and <= values then use first_index[0] and last_index[len(last_index)-1] to get the indexes and set the range for the filtered df.
After that i create a new df with np.NaN values and np.linspace() to finally concat the two dfs and do an interpolation of the missing NaN values.
So my question would be if u guys know a way to make it smoother
thanks, are there any pandas' functions available to use?
delete the for loop, and then make a boolean series for the >=/<= logic you had in mind.
i defined a function its not in a for loop
so you're using apply or what? can you just show the code?
give me a few mins
I am not sure how to do it since the columns are not corresponding
anyone here use plotly or has used it?
remember to always ask your actual question; remember to never "ask to ask".
I had never used plotly prior to yesterday, but I was still able to partially answer your question. Pre-filtering for answerers without giving the actual question just wastes everyone's time.
array_1 = df.index[df["X"] >= -140]
first_index = array_1[0]
array_2 = df.index[df["X"] <= 170]
last_index = array_2[len(array_2)-1]
new_df = df.loc[first_index:last_index]
new_df.at[first_index, "X"] = -140.00
new_df.at[last_index, "X"] = 170.00
new_df = new_df[['G"', "X"]]
x = new_df.iloc[:, 1]
y = new_df.iloc[:, 0]
f = interp1d(x, y, kind='cubic')
x_int = np.linspace(start=x.min(), stop=x.max(), num=621)
y_int = f(x_int)
int_df = pd.DataFrame({"X": x_int, 'Y': y_int})
x_range = np.linspace(start=-160, stop=-140.5, num=40)
range_df = pd.DataFrame({"X":x_range, 'Y':np.NaN})
concat_df = pd.concat([range_df, int_df])
concat_df = concat_df.reset_index(drop=True)
concat_df.at[concat_df.index[0], 'Y'] = concat_df['Y'][concat_df.index[-1]]
concat_df = concat_df.interpolate(method='linear', axis=0)```
array_170 isn't defined.
sorry im on mobile
From https://plotly.com/python/v3/gapminder-example/ there's this line of code
a_column = Column(list(dataset_by_year_and_cont[col_name]), column_name)
The problem I have is that the Column type doesn't seem to exist in plotly, and I don't quite know what to do next
there might be clever ways to make it more succinct, but tbh I wouldn't bother.
Depends strongly on your use case. With image or language processing you are much more likely to use an existing network as the basis.
That said, if you come to me with some data that is not in one of those categories, I will try several other machine learning approaches (starting with linear regression) before I even consider making a neural network.
You would be surprised at how much machine learning is just fancy talk for linear regression.
even the concat part?
it feels so wrong 😄
@serene scaffold
you use python default libraries for this not pandas
but yes actually there is a way you read pandas from txt sep=',' probably
MAYBE
moooooods
Hello, please don't post unapproved advertising per rule 6. If you think this was removed by accident, DM @sonic vapor
cheers @steady basalt
I think one time I read in w pandas
Have u tried looking into pandas from text
I just use read_csv()
and got this but am stuck here
So u want one col per subject
yeah
but anyway, my final goal is to find who has the highest chemistry grade
I was wondering how to place the chemistry to one single column?
there must be better ways to do but the fastest way for me is; make the first columns elements as a list row by row, i mean, divide the string from scapes
and then you will get 3 seperate lists at one column and rows, with using lambda func etc, you can combine the first 2 list which includes name and surnames or just combine the list elements from 0 to -1
because your last list element will be the name of the lesson
you can do the same thing to the other columns by that way you can seperate the grades from lesson names
U can also expand out elements but I’m not sure if that’s useful here idk if it creates a tonne of columns
expand gonna create same rows, its hard to handle. btw i did it, hold i will send the codes
I strongly encourage you to learn how to read files line by line
It is a much more general way than relying on something like pandas, and with unstructured data like this you have to write code to manually parse each line anyways
with open('grades.txt') as f:
for line in f:
if 'chemistry:' in line:
chemistry_grade = line.split('chemistry:')[1][:2]
name = line.split(' ')[:2]
print(name[0], name[1], chemistry_grade)
It seems his txt isn’t line by line when he pasted it
But yeah this takes a bit of fuckery
thanks guys, I am trying all you guys suggested
can I use open to read a file from a url?