#data-science-and-ml
1 messages ยท Page 308 of 1
with open((('ep'+(str(int((list(set(list(f['episode_id'])))).index(id)) + 1)))+'.txt'), 'w') as episode:
this could be
with open(f'ep{id_}.txt', 'w') as f:
my system is too weak
can anybody run the code on their system
and send me a tar or zip of all the txt files generated
if the file is too large for discord then upload it to github or google drive and send me a link
I'd be forever grateful if anybody did this for me
@lapis sequoia
What does that do
Also, can someone help me with my problem
I'm trying to create an invoice generator based on a spreadsheet with the columns "name", "product", "quantity" and "date".
Each product has a price and each row represents a customer's purchase
I want to set the price locally such that the invoices generated reflect this
I was thinking about listing each product once in a csv file with the corresponding prices
Is this the best way of tackling this problem?
I would have a separate column but it's just more time
how can I return graph inline on a webpage from my local python function
I am getting a pop out instead
I tried using matplotlib inline
anyone
Hello. I am trying to find projects (dissertations which involve Autonomous systems) so i can pick a specific topic for research. Here are some quick details about projects involving autonomous systems. For persistently autonomous systems, planning to achieve goals and executing plans is not enough. A persistently autonomous AI also requires the means to formulate, select, or reject goals over time. In addition, goals may be suggested by human operators, or other collaborating agents. To perform predictive decision-making online, autonomous agents must also be able to parse what data they can sense into a coherent model of the world. For example, by parsing a 2-dimensional map as a topological semantic map, or historical navigation data to estimates of navigation durations. Projects in this area investigate the tools required to make persistently autonomous systems.
Example Project: Automated Modelling for Autonomous Systems
The goal of this project is to develop tools to automatically build models from data, to be used with automated systems that use search-based AI to operate in the real world. The project could employ a variety of techniques such as clustering, recommender systems, neural networks. Application areas include (but are not restricted to) robotic inspection, disaster recovery, and operating in ocean and space domains.
is there a way to avoid this loop over all parameters......cuz in reality there are too many parameters .......wont it be slow to apply gradient check
but shouldn't that be enough? all you have to do is keep driving, and you'll eventually reach the destination
but knowing the gradients doesn't really help- say we were right at the foot of a deep slope like so: / ---/ <-- we're herethe gradient would tell us to go really far, even though the minimum is only 1 or 2 units away
well, the issue with my analogy is this. you dont actually "drive". you "teleport".
each jump has to be a teleport, because you dont know "how far" to drive. driving is a bunch of spontaneous instantaneous movements yeah.
where we could always look out the window
now imagine if suppose the windows got tinted and darkened. and you had to completely stop the car for them to un-tint
so you had to set a timer for yourself... i'll look out every 10km. or something like that
i dont even know. anyways. yes. jumps. not drives. you're not rolling down a pre-defined landscape, because if you knew the landscape you'd already know the destination
but still, all you have to know is the general direction (the sign of the gradient) and move that way right?
"move that way" doesn't exist.
oh yeah
how do we move if we only have to teleport
yep. which are discrete shifts. not a smooth movement
you could "mimic" a smooth movement by making the learning rate super small
and then it just takes a few years
but you'll get there ๐
but that still doesn't explain why we multiply it by the gradient as well
well the gradient has the sign
Does anybody know an alternative to tensorflow "tf.io.gfile.GFile"?
and the steepness has the inclination of like the ...hot cold game.
true
you're steep when you're very far.
I can't use the API to open the checkpoints file
(or so we hope. its not actually always* true. local minias exist)
but it's still generally more useful to take the gradient than disregard it emperically
ok cool
is it possible to train GANs network on Google Colab?
Hi! I need opinions on a thing. Do y'all have a preference between PyTorch and Tensorflow? My use case is specifically for processing image data.
Hey guys, I am trying to build a LSTM RNN for speech recognition, however the labels are not evenly distributed, basically there are a lot of "Silent" sounds in the train/test set
Train:
I would suggest go with pytorch
Validation
thanks! any particular reasoning?
I was thinking about splitting the network into two parts, one for the silent label and the other for others
like in pytorch you don't have to write computation graph on your own pytorch does by itself but in tensorflow you have to do it on your own. I found tensorflow lil bit confusing.
thanks, I appreciate it. It'll help me justify my decision in a meeting tomorrow ๐
np
Was curious about anyones thoughts on the prediction values reaching some limit like is shown in the right most plot. Anyone have any ideas on what may cause this? Perhaps just bad data or could it be a certain activation function or any other part of the structure of the network?
Dashed line are the true values, colormapped scatter are the predicted values
those are pretty graphics
I'm having issues trying to run abs(complex128) inside of a numba.jit decorator.
any suggestions?
how does the numba.jit or numba.cuda.jit decorator work, does it precompile something? is the issue that c doesn't have a complex128 datatype?
@dapper halo I'm just learning about all this now. If this is meant to be a plot of accuracy after training, isn't the point where accuracy trends off like that usually indicative of some sort of overtraining?
this channel is quite as active as #python-discussion
@glad raft I'm sure you understand no one is being paid to "help" in here, right? This isn't a paid tech support service...
People help when / if they know.
This is not a plot of the loss function, it is just an x,y generated from the predicted vs true values...so accuracy,yes but not what you're thinking of. So it is concerning that the predicting range doesn't encompass the full range. Also its only trained on 5 epochs so I dont think theres a shot in hell it's overfitting haha.
Its probably just due to the degeneracies in my data though.
can always plug that venmo account ๐
I have no idea if this violates any terms....so dont take it as anything more than a joke ahah
@exotic maple did I suggest that something included a fee? I didn't mean to if I did
It seemed that you were upset about the lack of response. This is a free channel of mostly enthusiasts and a few knowledgeable people. If people can/want they help, that's all. Perhaps I took it too literally
Oh no I wasn't upset just babbling to myself because I was tired. Sorry for the confusion
ha, no prob. Maybe I'm upset myself because of this stupid plotly chart -grumbles-
I haven't even started with plotly yet, I'm still trying to figure out speedup and gpu parallelization. I think I'll always be jealous of the people that make pretty graphics
Arent both 2nd and 3rd atatement correct
The answer given is just atatement 2
Why is statement 3 wrong
I did make a 1000 level Mandelbrot pyplot contourf plot though
Can someone plz clear it.....thanks
what is the use case?
A lot of models and techniques are supported out-of-the-box in TF, so it might be easier to use that. but for research-level models - pytorch all the way
Isn't one iteration more than one epoch? Does one iteration run through all epochs?
What's wrong with statement one?
An epoch has to do with dropouts right?
dropouts reduce the amount of available nodes in a given layer.
An epoch is just how many different times the network will loop through your whole dataset to get a better fit.
yo does anyone know an entry level tensorflow course which doesnt use a premade dataset
specifically: image recognition with keras
im not sure you can teach machine learning without a premade dataset
hey i am really new to machine learning and neural networks with python.. could anyone recommend an explanatory course?(free oc)
https://www.springboard.com/resources/learning-paths/machine-learning-python/
a quick bing search led me to this
thx
np
a lot
Hey guys I really want to do something with HAR (Human Activity recognition) I seen something about a camera being able to identify different yoga poses someone does. Does anyone know what I have to learn/know to be able to do this or where to start?
Hello everyone, I am planning to do some project in DL( specifically image classification) and I am looking into depth perception(in an image), zero-shot recognition, optimal transport theory, something related to model compression etc. I am new to this field and a prof I discussed with gave me some of the above topics to look into. If anyone is familiar with the above topics or knows something related, can you tell me any related topics that are interesting to work on? (Also the project is research-based and I am an undergrad)
my ml model generates "fake" simpsons scripts
can you please run my code on your machine and send me the scripts or a link to them
@lapis sequoia Have you tried Google Collab?
yes
then something is prob wrong with your code. check out for memory leaks and incorrect data chunking/loading
no, google colab works properly. That's not what I meant.
then, what's the prob?
Will cost optimisation be slower when using high value of beta (0.9) rather then small (0.5) when using gradient descent with momentum?
The site at https://discuss.pytorch.org/t/how-to-convert-audio-e-g-wav-to-tensor-and-back/18345 has experienced a network protocol violation that cannot be repaired.
The page you are trying to view cannot be shown because an error in the data transmission was detected.
Please contact the website owners to inform them of this problem.
And then people ask me why I don't use torch. Figures would happen to me especially when I am in a hurry
Hey guys I really want to start with machine learning for my own do you have like a good documentation for pytorch or tensorflow ?
(I learned the theoretical part oft machine learning and deep learning but I can't find anything good about the 2 libraries)
I would appreciate if you could help me : )
I have this primer for some basic DS/ML libraries: https://colab.research.google.com/github/yandexdataschool/Practical_RL/blob/coursera/week1_intro/primer/recap_ml.ipynb
And these ones for TF and Pytorch:
https://colab.research.google.com/github/yandexdataschool/Practical_RL/blob/coursera/week1_intro/primer/recap_tensorflow.ipynb
https://colab.research.google.com/github/yandexdataschool/Practical_RL/blob/coursera/week1_intro/primer/recap_pytorch.ipynb
All of these come from the Practical RL course on coursera - it assumes the participants are already somewhat familiar with this stuff, so provides primers in case they aren't.
Though if you want "documentation" - well, both of these libraries have extensive docs, with guides and examples.
Thank you I searched for this but I just wanted to learn how I use the libraries and I think my question is done now thx
I can't open any PyTorch docs on my side ๐ (protocol violation error)
Can anyone tell me how to crop/pad a tensor?
basically, I have multiple tensors which I want to stack - but they have different shapes
uhh, can't you just use indexing?
torch.Size([2, 1321967]), [2, 1323119] ....etc
so it's on on the horizontal axis. any idea how I can do that?
you mean how exactly?
like,
min_len = min(t.shape[1] for t in tensor_list)
shape = (slice(None), slice(None,min_len)) # this is a tuple of slices equivalent to [:,:min_len]
cropped_tensors = [t[shape] for t in tensor_list]
ooh, so each index in the shape var corresponds to the axis eh?
Yup
It's just normal slicing, equivalent to t[:,:min_len]
actually, uhh
why didn't I just do that lol
you only need to use slice directly when, say, there's a variable number of axes too
here, you can just do t[:,:min_len]
I did that - thanx a ton for the help! ๐
Would you also happen to know if torch can natively pad tensors like in TF?
Hey guys, I'm going through some past paper questions and am struggling to make sense of it
Hey yall was just wondering if anyone is good with the Altair visualization library?
There's https://pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.pad, apparently
you can also just allocate a tensor of the target size and copy the original tensor into it using slice assignment:
res = torch.zeros(target_shape,dtype=orig.dtype)
copy_tup = tuple(slice(None,l) for l in orig.shape) # this copies to the left-top corner, change the positions if necessary
res[copy_tup] = orig
any career advice here in this field ?
Hey , I just joined this server! Seems like a really great place for learning and exchanging. I got a question, I don't know if this is the right place but I was asking myself what could be the most fundamental, let's say book, that I could read on Artificial Intelligence. Like good fundamental information to start the whole process of learning something about AI, Coding, Computer Science... thank you in advance.
what's your background experience?
I got no experience
Hi yall!
Does anyone have any tips on how to optimize the below operation?
def calc_wm(W,X):
N = len(W)
sum_W = np.sum(W)
sum_WX = np.sum(W*X)
if N>0 and sum_W>0:
return sum_WX / sum_W
return np.nan
def calc_wmse(W,X):
N = len(W)
sum_W = np.sum(W)
m_W = np.mean(W)
wm_X = calc_wm(W,X)
if N>1 and wm_X!=np.nan:
wmse = np.sum((W*X - m_W*wm_X)**2)
wmse -= 2*wm_X * np.sum((W-m_W)*(W*X - m_W*wm_X))
wmse += wm_X**2 * np.sum((W-m_W)**2)
wmse *= N / ((N-1)*sum_W**2)
return wmse
return np.nan
wmse = (
df.groupby(['grp'])
.apply(lambda x: calc_wmse(x['weight'], x['val']))
.reset_index()
)
It calculates the weighted mean average standard error. Takes +60 sec to run on test df with 10M rows. The full set has 1.4B rows.
The weighted mean average formula is taken from this stats stackoverflow thread (https://bit.ly/3t2hBch).
Things I've tried:
- haven't gotten a vectorized solution working ๐ฆ
- numba.jit made the operation faster on a single group, but passed into groupby.apply was even slower
Things I haven't tried:
- using
pd.evalto handle the calculations directly in df memory - any attempts to use Cython to optimize the function
any advice would slap hard! ๐ thanks so much
okay okay, ill give you my background and see if it helps clarify sources of questions
I have a BS in Math and CS and work in a proto-data-engineering role.
I don't do too much data science directly, and instead work with the data scientists to get their pipelines working. I get to have a more dev-ops type of role instead of an analytics one!
Hello, I'm a beginner at ML and I can't seem to grasp a few things about feature scaling. I have googled a lot but I can't seem to be satisfied with the explanation. I understand WHY feature scaling is important. But why do we have to scale the test set using the scaling parameters of the training set? It just doesn't make sense to me. Thanks
try numba.jit with nopython=True on calc_wm and calc_wmse, and make sure to pass x['weight'].to_numpy() so that numba only has to deal with "plain" numpy arrays
more importantly @polar dock , how big is each group, and how many groups do you have?
if you have lots of small groups, you can leave each within-group computation alone and you will want to parallelize across many groups. if you have a few very large groups, you will want to make each within-group computation as efficient as possible using something like numba
Thank you for your background ๐
OH i didn't know think about the numba engine, good point! I did make sure to turn the inputs to numpy arrays, the documentation is explicit that it's required.
For this test segment there are 14,200 groups of 750 rows in each group.
This is already running inside a multiprocessing Pool. I have some 180,000 segments to be processed. Hence, I don't think I can parallelize the operations for any single segment.
oh... you mean this is already a task being done inside a parallel "worker"?
yeah lol
If I had more time I'd just try to learn Dask
but it leadership dumped this on me pretty last minute
@polar dock possible to drop the ifs in some way?
@polar dock i don't know if returning np.nan is a good idea, or if you should use != to compare anything to nan
i think nan != nan always
i agree that doing something different with these 'ifs' can help
maybe inlining these two functions to avoid having to use nan to 'signal'
not sure if numba supports python None in nopython mode
@numba.jit(nopython=True)
def calc_wmse(W,X):
N = W.size
sum_W = np.sum(W)
m_W = np.mean(W)
sum_W = np.sum(W)
sum_WX = np.sum(W*X)
if N == 0 or sum_W <= 0.0:
result = np.nan
else:
calc1 = W*X - m_W*wm_X
calc2 = W - m_W
wmse = np.sum(calc1**2)
wmse -= 2*wm_X * np.sum(calc2*calc1)
wmse += wm_X**2 * np.sum(calc2**2)
wmse *= N / ((N-1) * sum_W**2)
result = wmse
return result
def calc_wmse_grp(grp):
return calc_wmse(
grp['weight'].to_numpy(),
grp['val'].to_numpy()
)
wmse = (
df.groupby(['grp'])
.apply(calc_wmse_grp)
.reset_index()
)
this is a bit cleaner imo, might be faster but you'd have to benchmark
what kind of structure would you use to store 1000-1000000s of particle locations, velocities, rotational velocities, and physical properties so that you could make use of numpy's ufuncs?
it saves a few passes over the data and caches a few calculations
just an array?
a pandas or dask dataframe for working with the data, and for saving to disk a sqlite database, parquet file, or hdf5 file, depending on your read/write needs
you can use "raw" numpy arrays too but pandas has some nice features that can make certain operations easier (e.g. groupby as shown above)
do any of those come complete with kd tree nearest neighbor search?
nope, none of them @glad raft
wishful thinking i guess lol
would you think it would be a good idea to define a particle state class in which the primary object was an number particle x number property dataframe or "raw" array or to define a class of particle and build a dataframe of particle type object?
if you have a dataframe, you can use the column names and datatypes as a "schema" that describes a particle state
it depends on your code though
if you're doing lots of complicated single-particle operations, then yes it would be a nice idea to have a class that represents a particle, and you can write results to the sqlite database
if you are doing a big vectorized operation across many particles at once, the single particle class won't help much
thank you so much. That's what I thought. I suppose I was thinking to use a particle class and a state class in whch the state class that holds indices is responsible for the nearest neighbor searches and time stepping and the particle class holds information about the state of each particle. But I didn't know if there would be any overhead expense in or quite how i would things would be inherited
for a nearest neighbor search, use a proper kd or ball tree implementation
like in scikit-learn
yes there will be significant overhead in using a big list of class instances compared to a numpy array
you can reduce that overhead somewhat by defining a class with __slots__ - but it might be more helpful if you gave more detail about the kind of computations you are doing
A friend of mine was going over my CV and suggested replacing the word "scrape" (in the context of scraping data sources on the web using beautiful soup etc.) with "ingest" or "ETL". I just wanted a sanity check that the terminology is common place and would be a reasonable hit on a keyword scan. Ingest in particular is new to me
its a generally unpleasant sounding word
ingesting data is more abstract
theres nothing wrong with "webscraping" or "scraping PDFs" etc
Okay, thanks for the feedback
@desert oar I'm trying to build a particle dynamics simulator. At first I'm going to be using a single particle that will bounce around in a cube. Then I will try to build on so that there is a non-interactive discrete element method where the number of particles will increase to 100s, 1000s, and probably no further on my laptop, but the premise will be maintained in that the particles will bounce around in the cube. If i can get that working i will start the nearest neighbor searches to identify which particles are colliding if any and then update their total force values and continue. The goal is to develop a fundamental understanding of class structures in python and molecular and discrete element methods at the same time. tangentially I've been brushing up on mpi4py, cuda, numba, and cython, but i've only got a few days so i won't take this experiment too far right away.
Hi y'all, I'm building an application to learn pandas better. I have a solution, and the code works, but I think it is a more Pythonic solution rather than a pandas one. I'm trying to utilize the pandas logic instead of the python logic. Any thoughts are helpful! Here is the code I have:
results = {}
grouped = electdf.groupby(["year", "state"])
for key, group in grouped:
year, state = key
group['vote_remaining'] = group['electoral_votes'] - group['vote_int'].sum()
remaining = group['vote_remaining'].iloc[0]
top_fracs = group['vote_frac'].nlargest(remaining)
group['total'] = (group['vote_frac'].isin(top_fracs)).astype(int) + group['vote_int']
if year not in results:
results[year] = {}
for candidate, evotes in zip(group['candidate'], group['total']):
if candidate not in results[year] and evotes:
results[year][candidate] = 0
if evotes:
results[year][candidate] += evotes
can you explain what this code is meant to do in English, just to help me orient myself?
sure. It is electoral college data, and it allocates the leftover electoral votes for each year/state. So for instance 1976 Alabama has 9 electoral votes. Jimmy Carter wins 5, Gerald Ford wins 3, so there is one left over. This allocates that remaining vote to Gerald Ford, who has the highest fractional remainder.
so, proportional allocation of electoral votes based on percent share of the popular vote?
exactly
what is the shape of the data? like what columns are there and what data types are in them?
the best way to answer that question is to provide a few lines of the CSV
year state candidate percvotes electoral_votes perc_evotes vote_frac vote_int
1976 ALABAMA CARTER, JIMMY 55.727269 9 5.015454 0.015454 5
1976 ALABAMA FORD, GERALD 42.614871 9 3.835338 0.835338 3
1976 ALABAMA MADDOX, LESTER 0.777613 9 0.069985 0.069985 0
1976 ALABAMA BUBAR, BENJAMIN 0.563808 9 0.050743 0.050743 0
1976 ALABAMA HALL, GUS 0.165194 9 0.014867 0.014867 0
it's slightly difficult to read due to the formatting in Discord, sorry
That's alright.
I'm just trying to think of a formula for proportional integer division
there are some other columns that calculate the proportions to get to this data shown here. It starts with the total votes and votes per candidate
I'll take a look at this again in a bit
ok cool, thanks! I'll put the code on github so I can shoot the full code.
Contribute to CthulhuJr/Proportional-Electoral-College-Vote development by creating an account on GitHub.
yeah I redid the whole thing using some smarter calls. I'll post it in a second
Yessir!
Okay okay, so I got a vectorized solution working!
def vectorized_weighted_mean(df):
df['WX'] = df.eval("W * X")
res = df.groupby('sid', as_index=False).agg({
'W': ['count', 'sum', 'mean'],
'X': 'sum',
'WX': 'sum'
})
res.columns = ['sid', 'count', 'sum_W', 'mean_W','sum_X', 'sum_WX']
res["X_wm"] = res.query("count > 0 & sum_W > 0").eval("sum_WX / sum_W")
return pd.merge(
df[['sid', 'W', 'WX']],
res[['sid', 'count', 'mean_W', 'sum_W', 'X_wm']],
on='sid'
)
def vectorized_weighted_mean_standard_error(df):
df = vectorized_weighted_mean(df)
df['A'] = df.eval('(WX - mean_W*X_wm)**2')
df['B'] = df.eval("(W - mean_W) * (WX - mean_W*X_wm)")
df['C'] = df.eval("(W - mean_W)**2")
res = df.groupby('sid', as_index=False).agg({
'A': 'sum',
'B': 'sum',
'C': 'sum',
})
df = pd.merge(df[['sid', 'count', 'sum_W', 'X_wm']], res, on='sid')
df['coefficient'] = df.eval("count / ((count - 1)*sum_W**2)")
df['X_wmse'] = df.eval("(A - 2*X_wm*B + (X_wm**2)*C)*coefficient")
return df[['sid', 'X_wm', 'X_wmse']]
posting it here in case anyone was curious
@olive orbit took into the apply method of GroupBy objects.
in this case, .map would be appropriate, I believe
you would know
what about cases where there's a tie?
in that case it sounds like you would want a Particle class and a handful of instances of that class, and you would write the state history of each particle to your sqlite database
it doesn't make much of a difference in small scale operations tbh but I feel it promotes conceptual clarity
vs agg and filter
this is looking much better! you might also want to look into numexpr, df.eval might be using it under the hood but im not sure
i totally forgot about numexpr before
there's also numpy einsum which can be ridiculously fast for certain operations
@desert oar that sounds like a good plan saving just a portion of the particle properties instead of all of them will save me a lot of space too
vectorising stuff is quite fun
@glad raft i recommend using attrs for the particles
here's a very basic sketch of one of many many ways to write code like this
import math
import attr # pip install attrs
# using slots=True will save memory and reduce runtime overhead
@attr.ib(slots=True)
class Particle:
""" A simulated particle """
xpos = attr.ib()
ypos = attr.ib()
velocity_angle = attr.ib()
velocity_magnitude = attr.ib()
mass = attr.ib()
def step_forward(self, seconds=1):
""" Update x and y based on velocity """
...
# hypothetical function to generate a list of particles
particles = generate_particles(n=1000)
# hypothetical code to "evolve" each particle state
for _ in range(10000):
for particle in particles:
particle.step_forward()
obviously im leaving out a lot of things and you probably can do this more efficiently by pre-filling a giant numpy array with 0s then running through the simulation row-wise but that would be way uglier
The left over votes go to the highest fractional votes, then next highest, etc. Because it is so many decimal places I think it's unlikely to have an actual tie
wait what
so they are assigned like
uh.
how do leftover votes even come about
@desert oar I'm going to look into attrs and slotting. thank you
The whole numbers are portioned out based on the proportion, but because there are lots of candidates you end up with a lot of them with several that don't get a full point. So it becomes a ranked choice vote with the fractions to give out the full total
If that makes sense
ah okay
got it
whoaaa. You can vectorize custom operations using df.eval() ?? Holy crap that would have been extremely useful to know for me a few days ago lol
so like
basically you just want to increment the n top entries (sorted by percentage vote) by 1
I'm about to drive home but I'll be there in a few
Precisely. For each group of year and state
nice, numexpr is very cool
i wish it had an object-oriented dsl instead of magic strings though
but one could easily create the former to emit the latter
never have i ever seen so much bioinformatics before
time to try to do this analysis in R i guess
๐ฅด
but im doing the model building in python
Julia doesnt have enough genomics packages
๐ฅด
what kind of syntax
would you like to see
c = df.Num('count')
w = df.Num('sum_W')
df['coefficient'] = df.eval(c / ((c - 1) * w**2))
i like syntax checking, syntax highlighting, and generally being able to use language-level tooling
same reason i like "first class" regex objects, makes it easier to build tooling around it
and because eval only takes strings there's no highlighting
yeah this is hypothetical of course
internally it'd be something like
c = df.Num('count')
w = df.Num('sum_W')
expr = c / ((c - 1) * w**2)
df['coefficient'] = df.eval(expr.compile())
there is plenty of prior art for this kind of API, see e.g. pyspark
as well as my own experimental library that i never put work into https://github.com/gwerbin/pandas-anaphora
https://numexpr.readthedocs.io/en/latest/intro.html it looks like internally numexpr already compiles the magic string to python code, so maybe you can hook into it and feed it a raw python AST using the ast module
it might already be possible w/ some lower-level apis https://numexpr.readthedocs.io/en/latest/api.html
will have to put it on the todo list
another example of this kind of api https://pypi.org/project/glom/ (specifically glom.T)
As someone who's self-taught and still struggles with some OOP stuff in Python, I gotta respect people who build libraries lol
it's the kind of thing that gets easier the more you learn
it's like music or a language
reminds me of some of the other libraries that make pandas adapt some ideas from R's tidyverse. If I recall corrently, sparksql's dataframe api draws from the tidyverse
I like sparksql's dataframe api when working on implementing business logic, but prefer all the short cuts that the panda's api has for the quick iterations that statistical/data work requires. I am hoping Wes McKinney turns apache arrow into a lazy dataframe with a sparksql like api(and the dask guys build a distributed version of it).....
if it cool to push your project on here, I wrote an implementation of the step wise feature selection algorithm that can scale with dask. Honestly, Mlxtend's is better, but it was removed . Scikit-learn will be getting an step wise feature selection algorithm soon, but they only parallelized the cross validation, and not the search itself. https://github.com/pr38/dask_backward_feature_selection
Hi. I have a question about computer vision. I need to get the roi(region of interest) of image in coordinates for computing intersection over union. I wrote some code that computes iou, and it works fine on my examples. But how should I get the coordinates? I have the RoI for every image in my dataset. How to use it for something like roi-finding training?
Guys so I wonder, if I have to get an image input and return a string input, would I just use a Conv2d in keras for the input and a list of (Dense) numbers , representing the ascii codes of the output?
I can't assign ID to each image, they all are different texts
Anyone that could help me with a data science problem for uni? I have amazon review data and have to predict the score that the review gave using a number of variables, one of which is the text of the review. In training I'm able to get a very high accuracy with CV, but when I apply it on hackerrank I score very poorly.
Is there an equivalent of ml-agent in python? By this I mean a library which can perform reinforcement learning if yo set up the environment and the rewards system?
I found this one but I'm not sure if it's the best or not https://pygame-learning-environment.readthedocs.io/en/latest/index.html
I just want to do a very simple game with pygame where the agent can move on a board and try some learning stuff on it
So as in ML-agents I'd like to be able to chose the informations sent to the agent, the amount of output etc
If you know a good library which can do that let me know
Is the Sum() function syntax only usable during an iteration? Or can it be used in multiple ways outside a iteration(For loop)?
X_train[..., np.newaxis]
what does the ... mean?
sum is a function. It accepts any iterable and sums it. You can use it in quite a few ways.
But, it doesn't accept ANY iterable. It must be converted into a float or integer.
But, thanks for the answer. :)
If you're summing something that can't be summed with an int (the default starting value of sum is 0), you need to provide a starting value, yeah:
from dataclasses import dataclass
@dataclass
class A:
val: int
def __add__(self,other):
return A(self.val+other.val)
sum((A(i) for i in range(10)),A(0)) # A(val=45)
Ah, ok. Gotcha!
Could someone help me figure out what's going wrong with the WolframAlpha.py module?
I'm making a query of how are you, and here's the data I get:
{'success': False, 'error': False, 'numpods': 0, 'datatypes': '', 'timedout': '', 'timedoutpods': '', 'timing': 0.932, 'parsetiming': 0.301, 'parsetimedout': False, 'recalculate': '', 'id': '', 'parseidserver': '41', 'host': 'https://www4b.wolframalpha.com', 'server': '41', 'related': '', 'version': '2.6', 'inputstring': 'how+are+you', 'tips': [{'text': 'Avoid concatenation in math expressions'}, {'text': 'Use r*x rather than rx, and q*x^2 rather than qx2'}]}
However, when using the W|A API explorer, I get different results
Full programmable access to Wolfram|Alpha capabilities. Includes disambiguation, drilldown, asynchronous results delivery. Pick output options, format
I'm getting a proper result there
Hi! does someone now were to start learning to make a AI? i now the basics of python
First of all I would go for the math
Derivatives, Sigmoid, whatever
Then learn about Neural Networks
And after that learn a library
Like tensorflow
you now a web were to start or video?
I don't know any websites/videos for the math, just search them . But for neural networks, search Luis Serrano's Neural network tutorials on Youtube
I suggest you watch the neural network videos first, they don't contain much math
After that, go learn about libraries , and when you encounter a math thing, either ignore it and just let it be in your code, or go learn it
ok, thanks!
Hello
I'm searching for a cloud platform where i can work with SQLite dataset as my computer is not capable to handle the operation.
Linode offers VPSs so you can access SQLite
Alternatively, if you're willing to use a different language rather than python, you can use Wolfram Cloud @hoary wigeon
They're both not too expensive
Do you already have an high school math knowledge?
I want a cloud platform where i can fire query for sqlite dataset atleast 8GB ram
Oh, sorry, didn't got it
Wait a moment
Hi
on my it takes more than 10min
can someone help me in a numpy issue?
@hoary wigeon You can use google cloud
does it support sqlite ?
Just have to convert your SQLite database to a MySQL one
There are scripts that do that
is there a chat where I could ask data mining qs?
This one
ok thanks, im wondering what are the data mining tasks, do they include operations like count and sum etc. or just prediction and classification?
i.e. ML algorithms
Question about pandas, is one way faster than the other?
df = df.assign(col3=lambda x: (x['col1'] + x['col2']))
df['col3'] = df['col1'] + df['col2']
The second one
Can I elaborate with someone on ginis index and information gain? and Why gini is slower in my implementation? Cuz I think it shoudl be faster
I think data mining is finding insights in data in general. And that could include a lot of things.
is there any difference between normal dropout and alpha dropout? (ping 2 reply thx)
@paper lake unironically have to do a bioinformatics data analysis this weekend

when you want to use python but forced to use R

i want to compare two pdfs for similairty
i did using tf-idf and cosine similarity
how do i do it with jaccard?do i have to user tf idf with it?
Any reference to python code for nlp based model for reading totals from restaurant invoice in pdf form?
Hey anyone knows how to easily make positive/negative skewed distribution? (Using it as an example)
So basically extracting text from an image inside the pdf?
Just a bit confused about what it sklearn means by "flat" and "non-flat" geometry
It says that it's the "metric-used" in the column header
but how can this be flat or not flat?
firstdate = pd.to_datetime(['01.01.2004'], format='%d.%m.%Y')
seconddate = pd.to_datetime(['01.01.2005'], format='%d.%m.%Y')
print(type(firstdate))
print(f"the sum of orders for 2005, for sellers from Poland\n"
f" {df2[(df2.Kraj == 'Polska') & (not firstdate > df2['Data zamowienia'] < seconddate)][['Utarg']].sum()}"))
``` Anyone know how to format this date to make this print work? In the Excell sheet column I have just the dates written in this way (I mean separated by dot). PYTHON [PANDAS]
raise ValueError("Lengths must match")
ValueError: Lengths must match```
I would recommend posting this in one of the many help channels
can anyone pls help:
https://stackoverflow.com/questions/67353295/how-to-calculate-the-loss-in-gan-for-my-model
Does the Flatten layer of a CNN turn the entire tensor into a mx1 vector?
hey guys i am trying to create a dataframe in parallel from random indices of an existing dataframe. i know how to use a dataframe in parallel but creating one that way is different. any thoughts?
What do you mean "parallel"? You have to eventually re-combine the pieces in the main process
Normally i use pd.concat to combine dataframes
Yep it was directly inspired by pyspark moreso than dplyr
A "lazy" pandas would be interesting. The other day I discussed building this actually, where you build up operations that "compile" to a numexpr operation
I think the non laziness of pandas is better by default
Laziness is only valuable for really big datasets and/or really expensive operations, in which case maybe dask is better anyway
It would also be nice to have more stuff in the form of portable libraries so you can use the same arrow-backed data frames in python r julia lua rust c++ whatever
Imagine, monadic dataframe operations in haskell backed by arrow!
Why do you need GAN for that lol?
Hi
I am working in an offline environment.
I currently want to work with geopandas (using conda and cp36)
After installing the dependencies (fiona, shapely etc), I tried to import the module but came up with the following error:
" from fiona.ogrext import Iterator, ItemsIterator, KeysIterator
ImportError: DLL load failed: The specified module could not be found."
I am currently using python 3.6.5
With packages:
fiona: 1.8.6
gdal: 3.0.1
geopandas: 0.3.0 (Have tried using 0.4.0\0.8.0 aswell)
I have looked here as well (https://github.com/Toblerity/Fiona/issues/402), but couldn't solve the problem.
Does anyone here know what is the problem and how can it be solver?
Thanks in advance
Hello
Yes
i have updated my code: pls check
https://stackoverflow.com/questions/67353295/how-to-calculate-loss-and-backpropogate-in-gan
Any clue why discriminator is learning.
what libraries should i learn for data science? help#
@true plover Check out Sklearn, it's pretty beginner friendly since a lot of things work out of the box
then Pandas for data manipulation and numpy for general usefulness
I've been reading this paper( DOI: 10.1002/widm.30) which talks about density-based clustering algorithms in terms of probability density functions like this:
Bit confused though, as when I've been looking at DBSCAN it doesn't really seem to be doing anything like this?
I think i remember talking to @grave frost about this algorithm ages ago?
In this case, the tecnique that you have to use is OCR, take a look at this: https://github.com/courao/ocr.pytorch
This is an image from ocr.pytorch's readme.md
Does anyone know about pathwise coordinate descent for optimization? I am struggling trying to understand it
HAHHAHAHAHAHA sorry late response but thats foony
reminds me of a meme we posted in julia about R and Python data scientists
never been active here or in this server really
im busy usually in julia server
its ok. this is representative of how its going
an empty data frame

p.s. - it wasnt supposed to be empty

i can just dm you next time. probs easier
Hey!
Can someone tell where is a good place for a beginner in AI
hey, is there a way to see what's being passed from 1 layer to another as a model is being evaluated?
i just want to see what's actually happeneing in there
thanks!
I'm a wannabe Data visualization and tableau coder
cool
Yes, in PyTorch you can just say to print the current output after any layer
I'm going to give the textbook elements of statistical learning a try. Anyone have any experience with it?
It seems like Stanford hosts a free copy so it's worth a go maybe
@glad raft Don't have much experience with statistics, but if you want to study it, Mathematica is a good software for it
maybe tf.print and execute eagerly as a flag in model.fit?
I'm going to try to read the first 100 pages today. I'll let you all know what I think
Great, thank you
Is it possible for 2 features to have high correlation coefficient but low VIF or vice-versa?
Anybody has some tipps/resources on anomaly detection in high dimensional time series data?
Anyone has an idea, why for the generator no gradients are calculated?
@tf.function
def train_step(real_samples):
z = tf.random.normal(shape=(batch_size, z_dim))
gen_samples = generator(z)
combined_samples = tf.concat([gen_samples, real_samples], axis=0)
labels = tf.concat([tf.ones((batch_size, 1)), tf.zeros((real_samples.shape[0],1))], axis=0)
with tf.GradientTape() as tape:
predictions = discriminator(combined_samples)
d_loss = loss_fn(labels, predictions)
grads = tape.gradient(d_loss, discriminator.trainable_weights)
d_optimizer.apply_gradients(zip(grads, discriminator.trainable_weights))
z = tf.random.normal(shape=(batch_size, z_dim))
label_zeros = tf.zeros(batch_size,1)
with tf.GradientTape() as tape:
predictions = discriminator(generator(z))
g_loss = loss_fn(label_zeros, predictions)
# tape.gradient returns None for alle trainable variables
grads = tape.gradient(g_loss, generator.trainable_weights)
print(predictions)
print(g_loss)
print(grads)
# on executing this line the error occures
g_optimizer.apply_gradients(zip(grads, generator.trainable_weights))
return d_loss, g_loss, gen_samples
the lines, where the gradients should be calculated as well as the line where the actual error occurs are commented. The error is: " ValueError: No gradients provided for any variable: ['gan_generator/lstm/lstm_cell/kernel:0', 'gan_generator/lstm/lstm_cell/recurrent_kernel:0', 'gan_generator/lstm/lstm_cell/bias:0', 'gan_generator/dense/kernel:0', 'gan_generator/dense/bias:0']."
Outputs for the print statements are:
Tensor("gan_discriminator/dense_1/Sigmoid_1:0", shape=(128, 1), dtype=float32)
Tensor("binary_crossentropy_1/weighted_loss/value:0", shape=(), dtype=float32)
[None, None, None, None, None]
Dtypes and shapes are okay, everything used is a Tensor-object. No np.arrays used.
I just realized that Elon Musk posts random jargon shite:-
A major part of real-world AI has to be solved to make unsupervised, generalized full self-driving work, as the entire road system is designed for biological neural nets with optical imagers
๐
Anyways, QQ: Does TF model not work with pre-batched tf.dataset? it's the first time I ever had this problem
Im doing actually this
train_data = tf.convert_to_tensor(train_data)
dataset = tf.data.Dataset.from_tensor_slices(train_data)
dataset = dataset.shuffle(buffer_size=1024).batch(batch_size)```
Apparently, I can either buffer or pre-batch
What are hyperparameters? Is it the number of layers and the number of neurons per layer?
Have the same question about hyperparameter optimization?
What suprises me is, that its pretty much an adapted version of the tensorflow documentations code snippets
But its not working
@daring kiln to the best of my knowledge hyper parameter optimization happens as either a guess and check approach or classically looking at the posterior with bayesian inference. But I'm not sure what hyper parameters are so optimizing them is beyond me. I think Elisse Jennings from llnl gave a presentation on it that you can see on Youtube
@glad raft thanks a bunch for the reply. I'm gonna go check it out.
I can give you examples, maybe it helps. In the context of neural networks there are the trainable parameters, also called weights. And then there are the kinda defining parameters, which tell the model how to behave, which are mostly untrainable. Some of them are the no. of epochs, batch_size, learning_rate and so on
The trainable parameters get adjusted by the optimizer through backpropagation. The hyper parameters are the ones you have to optimize.
hyper parameters are for example: number of neurons/ layers, learning rate, batch_size, epochs etc.
That's super helpful thank you
Both you guys helped me understand a concept been struggling to understand for a while. Thanks!
So, I tried to import gym for reinforcement learning and it completely doesn't work no matter if I'm in a virtual environment or not or anything else
could i have an example?
sorry, i was talking about tensorflow lol
Ah, sorry
So far i have multiple words that possibly mean the same thing. First is there any recognizable difference between statistical learning, machine learning, and deep learning? And what's the primary difference between machine learning and ai?
Oh and neutral networks
AI: doing tasks that are normally done by humans
ML: programs that "learn" from the data, rather than being rigid algorithms
they have a ton of overlap nowadays, but there are AI algorithms that aren't at all ML, like the first ever chatbot programs that were hardcoded, or, for that matter, any kind of manually coded decision tree.
i hope posting links is okay, here is a venn diagram to show
Are there quality ais that aren't ml based? Would they be self modifying code based?
AI can also imply an algorithm that does not learn though, like a chess "AI".
Ai chess as a predictive model that creates a state tree and then minimizes move risk?
Sorry if these questions are really rudimentary but I think developing a language is important to learning a topic
Deep learning is a buzzword that had no meaning, but then everyone sort of agreed on a meaning later. The term was first used in psychology, but it's first use in algorithms is something like the early 80s and it was used to refer to the depth of a search algorithm.
In the early days, AI pretty much meant everything described by @tidal bough s first point. For examples simple rule based systems
Today when someone speaks of AI its almost always at minimum a ML thing he is talking about, if not even Deep Learning. Most modern AI stuff is pretty much Deep Learning/ Reinforcement learning based, thats why.
Look at the aboves venn diagram, makes it much more clear
(Automatic theorem prover)
Does reinforcement imply initially supervised followed by ongoing unsupervised?
Its a bit like conditioning a dog
reinforcement learning means that you make the program learn by giving it a reward signal and letting it explor, rather than by providing examples of what to do. The algorithms used in RL are very different from what you see in the other fields
I probably won't be getting to unsupervised or rl for a few weeks at least
Reinforcement learning is arguable the most AI part of ML, it tackles head on the issue of creating an automaton (making it very hard / unsolved and also not very immediately applicable like unsupervised and supervised learning).
RL doesn't need to involve any example data (from humans) to train on - in fact, if we take AlphaGo and AlphaGo Zero's (https://deepmind.com/blog/article/muzero-mastering-go-chess-shogi-and-atari-without-rules) example as the trend, providing examples of human play significantly speeds up the learning, but reduces the level the model can reach in the long run (because it gets used to "normal", "human" strategies)
(It's also part of other fields outside of ML)
I have been collecting experience with ml for quite a while now and I think rl is another thing in itself. Its quite different
@tidal bough I'll read that as soon as I can
ML does not imply AI, nor does AI imply ML. ML is really just anything that learns (probably approximately).
@iron basalt that's the aesthetic of uncertainty quantification right? To shed light on the probably part
Technically any program that stores anything and uses it later is "ML".
But if someone says ML they most likely mean that its focus is on making the most of that data.
(pretty much always induction)
Are most ml algorithms neural networks, or do neural networks comprise of a distinctly small subset of ml algorithms?
They are a small subset of ML.
And fall under the smaller subset of "biologically inspired"
Neural Networks are only a subset of ML, but just like with ML being only a subset of AI, it's so disproportionately researched that it's almost implied that if you're doing ML in this day and age, it'd be quite a surprise if it's not a NN
Yes right now neural networks are the most popular by far (but does not imply the best, nor the future of ML).
Couldnt say it better
It is in my understanding that in neural networks the number of hidden layers effects the possible "geometry" of the solution space. If you had a simple binary classification problem, would you be able to get a circular dividing layer with any number of hidden layers using a squared error loss function?
An ML researcher probably does more than just NNs, but rather all kinds of strange ideas (both new and old ideas / often some of the best things are a revival of an old idea but with modern compute power).
@iron basalt I may be one of those in the foreseeable future, but my first area of expertise is applied mathematics
(e.g. Tsetlin automata which have been shown to out perform NNs and also be much more computationally efficient, but the idea is from the 60s Soviet Union technology).
Im thinking of all the great forgotten 90's algorithms which died because of a lack in computing power. They get revived nowadays and end up in state of the art algorithms
A lot of it is from far earlier than you might expect, AI has been a thing for thousands of years.
yeah, sure
VAEs are just 1880s~ tech but with modern compute (see https://en.wikipedia.org/wiki/Helmholtz_machine).
(people have been doing ANNs for a long time)
Thats crazy. I wonder what these guys would say if theyd know what impact their inventions have today
A lot of the best stuff is from the 60s probably due to space race and all that cold war stuff that drove people to make some crazy stuff.
Do you think the current exascale and quantum computing races effect the current drive for robust ml algorithms at all?
Quantum computers are not needed for AGI IMO.
maybe they might ๐คท
Much more useful and impressive would be large scale reservoir computers (not too hard to make and would allow for some ridiculously massive NNs, etc): https://en.wikipedia.org/wiki/Reservoir_computing
Reservoir computing is a framework for computation derived from recurrent neural network theory that maps input signals into higher dimensional computational spaces through the dynamics of a fixed, non-linear system called a reservoir. After the input signal is fed into the reservoir, which is treated as a "black box," a simple readout mechanism...
SNN's aren't very compute effecient, and advances in HTM seem....slow (but breakthroughs nonetheless )
I don't think a cat has a quantum computer in their head.
we already know that the brain is a reservoir computer.
agreed, but biological mechanisms are waay more efficient - both in terms of performance and energy
I like to lean towards a Biological/HTM hybrid than it running on computers/QC IMO
Reservoir computers are as efficient as it gets. They happen naturally too.
Paddles on water is a reservoir computer.
If you could use aurora next year, how might you adapt your ml codes to best utilize computational resources? Would you want to use it at all?
One neat way to make a reservoir computer is to simply stack a bunch of glass panes (optical reservoir computer), it's so efficient because it has no power source. The input (the light) gives it all the energy it needs.
I don't get them ๐
Anyhow this is starting to get tangential to the topic so I will stop here.
Absolutesly no one has a suggestion to my question? ๐ฆ
#data-science-and-ml message
I think in a stochastic gradient decent method they just guess a point and go with it
I read somewhere that sometimes people will stop describing them as decent methods because they don't necessarily pick something that results in a decrease in the ??? loss function ??? But with large data sets you can see speedup by doing it
.... just a guess..... @red hound
Hey anyone knows how to plot on 2 subplots with seaborn?
fig, (ax1, ax2) = plt.subplots(1,2, figsize=(10, 4))
sns.displot(ax=ax1, data=r, bins=20)
sns.displot(ax=ax2, data=i, bins=20)```
this is probably the better place to ask this question.
So I have been working on a typing test app that allows me to record the key up and down events for my Machine Learning class.
The goal is to determine users by how they type.
The original prompt I mad was WAY too long/difficult
See it here.
#python-discussion message
The new prompt is much easier, 3 posts down.
My question is, how do I get a lot of people to take the test?
Does the Flatten layer of a CNN turn the entire tensor into a mx1 vector?
easy explanation with example: https://keras.io/api/layers/reshaping_layers/flatten/
Anyone know of a good course on how to choose best types of visualizations for particular data. For example if my dataset involves Products (toys), what would be the best way to visualize all the types of manufacturers in that data? Are pie charts the way to go?
What like multiple toy items with many manufacturers producing the same item?
Pie charts are almost a no-go lol.
In general, there's no straightforward "best way" to visualize something. It all depends in your use-case, the story you want to tell and your audience
It's better if you ask people in your industry / field for tips on visualization
so this is what happens when i try to send a file rn
the thing is, the "pictures", "videos", and stuff aren't there as i use onedrive for my files & stuff
so they aren't the default directories
anyone know how to add this stuff to the file picker?
Hi, would this be the right place to ask about how to get started with machine learning python? My goal is quite specific is to find a combination of numbers for the best result
so i am trying to implement machine learning to my indicator in tradingview
best result is Sharpe Ratio > 1.2
so trying to optimise my parameters for different markets and getting data feeded into it
apologies i am very new to ML and trying to learn my ropes around
thank you again
how can I help myself regarding finding a research project as my final project for uni? There are no tutor suggested topics and I'm 90% I'm incapable of finding one
Hey I was getting an assertion error in one of my jupyter blocks even when I has already asserted that float in my earlier blocks without any error
Could someone help?
that assert is in the function you are using, not in the lines below
Its also in the type of b It didn't fit in the screenshot
copy and paste the error then
AssertionError Traceback (most recent call last)
<ipython-input-29-4f77d2aa60f4> in <module>
1 dim = 2
----> 2 w, b = initialize_with_zeros(dim)
3
4 assert type(b) == float
5 print ("w = " + str(w))
<ipython-input-28-ddd6819e9a1a> in initialize_with_zeros(dim)
20 b = 0
21 # YOUR CODE ENDS HERE
---> 22 assert(isinstance(b,float))
23 assert(w.shape == (dim,1))
24 return w, b
AssertionError:
nope, it's in the function.
<ipython-input-28-ddd6819e9a1a> in initialize_with_zeros(dim)
what should i do then?
---> 22 assert(isinstance(b,float))
This is the assert you are failing. try to figure out what and why that argument doesn't serve your needs
any idea how to use spacy
to do what
I want to create a bot like chat bot in cmd
what do you want to be able to talk about with the chat bot
I need like to be about to have a convo pretty basic
and the main thing
to train it show it can help me with like stock trading thoughts
or smt like htat
but for starters just basic convo
so you want a chat bot that can have basic conversations, and help you trade stocks?
I would make the stock thing a separate project
ye ye sounds good
but I dont even know how to use spacy to do the convo
I tried to use chatterbot
but corpus wasnt working for me
you probably wouldn't use spacy for this, no
I've used spacy to find sentence boundaries and get features like part-of-speech.
so i can use it for a more advanced like chatbot
you can also use it to get embedding representations of words and sentences, which gives you an entry point to do deep learning with language.
I'm not sure that making a chat bot is a great first project, unless you have a really specific set of topics in mind.
I need to make a chatbot like its a project
you know what would be fun? an ngram sentence generator.
I basic chatbot and then I can train it to somehow helping in trading aspect
ngram sentece?
markov chain sentence generation is fun and pretty quick to code
ngrams are contiguous sequences of n tokens
[(ngrams, are), (are, contiguous), (contiguous sequences), (sequences of), (of n), (n tokens)]
these are bigrams.
oh ye i googled it
i should have thought , quite clear . he was heading back to gryffindor tower , and harry , somehow struck anew by how tall krum was , elaborated . were friends . shes not my girlfriend and she never has been . its just that dobbys plans arent always that safe . dont you remember , moody told us to be careful what we put in writing . we just cant guarantee owls arent being intercepted anymore . all right , all right . flint nearly kills the gryffindor seeker , which could happen to anyone , im sure
here's what I generated the last time I tried my hand at them; trained on the first HP book
So any idea guys on how I can start doing the chatbot
like any good library
or like a way to fixx this error
File "c:\python38\lib\site-packages\chatterbot\trainers.py", line 135, in train
for corpus, categories, file_path in load_corpus(*data_file_paths):
File "c:\python38\lib\site-packages\chatterbot\corpus.py", line 84, in load_corpus
corpus_data = read_corpus(file_path)
File "c:\python38\lib\site-packages\chatterbot\corpus.py", line 58, in read_corpus
with io.open(file_name, encoding='utf-8') as data_file:
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\Kyriakos\\chatterbot_corpus\\data\\english'```
What does non-flat geometry from the table in https://scikit-learn.org/stable/modules/clustering.html#spectral-clustering
Geometry seems to be a "metric", but how is it non-flat?
Suppose all your points lie along a sheet of paper (a 2d manifold in 3d space). Now suppose that instead of being flat, that sheet is rolled into a tube, the ends almost touching. In 3d space, the distance between the ends is almost zero, but if you only count the distance along the sheet (i.e. the path needs to lie inside the paper) then these points are very far from each other, on the opposite ends of the sheet.
To capture the manifold properties in cases like this, you need to use a metric that uses the points, rather than just normal 3d distance.
yup, that's exactly what I'm talking about
ok awesome, thank you. So when an algorithm uses "non-flat geometry" it's talking about this cylinder?
When it's flat geometry then it's like a normal plane
Basically the non-flat metrics it's talking about are those that don't use euclidean distance
Like measuring the distance only along shortest links between known points.
These metrics will try to capture the red distance instead of the blue one, which is more correct - after all, if the dataset is actually distributed along the plane, the points at the two ends are going to be not similar at all, despite being close in 3d space (blue distance)
Thank you this explanation is really lovely
How could you tell if a situation required this non-flat metric then?
Would you have to do some plotting first to figure that out?
That's basically what manifold learning is - it's a branch of unsupervised learning that tries to find a not necessarily linear hypersurface the data fits on. I'm not sure how to detect whether the results you get from it are good - presumably there are some scores you can get. Plotting to see in general isn't possible if you have more than 3 features (it's not exactly easy to plot 4d space).
Have each point in the fourth dimension represent a frame in an animation maybe?
but anyways, thank you for the explanation!
how can i plot my pd df in a tabular form? and also save it as an image. i use plt.table but i have this:
@untold oriole what do you mean by "plot", what output would you want?
this is pretty much always the problem w/ unsupervised learning - at some point, you do still need to "supervise" it, even if informally or ad-hoc
i have a dataset that contains job ads that are fraudulent and non fraudulent. what kind of algorithm could i use to make a model that predicts whether the ad is fradulent?
i tried using KNN but the lack of binary categories made it near impossible (atleast for me )
@tepid rapids logistic regression
im unable to use logistic as its for a group assignment and my other group member is using it x)
that doesn't make sense, what's the project?
there are lots of different models that are similar to logistic regression in that you have input features, predict a yes/no (binary) target, and minimize a loss function to find optimal parameters
we have an idea and we have to use 3 different algorithms to create it
doesn't necessarily have to be linear
do you know what logistic regression actually is? how it works?
maybe you can use traditional logistic regression, xgboost, and a basic neural network with one hidden layer
theyre all different "algorithms" but the models still have the same basic shape
but of course this is the content of your project - evaluate different models, try them, see what works and explain why
i will try that thank you ๐
Anyone know of a good way to deal with unique nan-values in a 2D data array? I wanna do SVD on the data, but since some rows have different nan values it complicates things.
I was thinking of simply setting them to a constant value, but I am unsure if this causes the outcome of the SVD to change
someone posted an interesting iterative method to approach this problem on stackoverflow: https://stackoverflow.com/questions/35577553/how-to-fill-nan-values-in-numeric-array-to-apply-svd
Okay, so seems values can't just blindly be set constant. I'll look into that, thanks!
hello
so about chatbot api's
you need to find it
or better you can create your own chatbot api itself
do you have an idea of how chatbots work?
and what kind of chatbot's there are?
how? i have no clue how to make an API
no not much
You need to use flask to create a microservice and host it
!pypi chatbot
Lol not this
So there are I guess 5 kinds of chatbot
id rather use an existing API
isn't using an existing API easier
So chatbots are of many kinds
Question based chatbot
Rule based chatbot
Contextual based chatbot
NLP chatbots
there are not many good ones
and the good ones are paid
or not for public use
!pypi chatterbot
this is a chatbot engine which uses NLP and its self learning
!pypi chatterbot-corpus
so it's paid?
then i shall use it
but how do I get started with this API
no
NLP is called Natural language processing
oh
It is a combination of Rule based and Contextual chatbot
Now what is Rule based chatbot you may ask
Rule based chatbot are chatbot that's have rules, and a database of prewritten answers
So a chatbot engine will take message input and then Use it algorithm to choose out the best response from its database
And it can even learn stuff
Yes, i need this
But when it comes to large and complex chatbots, Rule based chatbots dont work properly
i want anything that works
this is a chatbot ai and uses AIML
AIML is a chatbot langauge you can say
like Html
you can create tags of responses
and make it learn responses too
and the best thing about this is that
the author and the creator has a prewritten large database of conversations
and responses
you can get easily
for eg
i want something purely ai that uses the API
wait why is this so complex
chatbots are complex
why can't i use a simple Api and get started
I told you the reason above
There is Kuki_ai
Gpt3
the best chatbots you can get your hands on
Gpt3 is so good that it can generate and read code
poems
etc etc
But Gpt3 is not opens
source
and not available for public
Gpt3 has an api service than you can enroll for
but you need to wait months to get it
and it's not a 100% guarantee you get
it
now you want a simple free api
there are a lot of apis
some random ml chatb9t
bruhapi xyz
but they are bad
Even cleverbot
ehy
Yes i need anything that's simple and that can work
they give out random responses
you want a chatbot that gives out random responses half the time
bruh
that's not ai and that doesn't work
i want something that works good
Ofc it is tough and complicated
but chatbots at the end, when they work
the efforts are worth it
I can show you videos
of how to make a simple chatbot
let me get it wait
it's okay now, i have to go sleep
thank you for explaining me everything
I'll ping you tomorrow if i plan to work on this
Python Chat Bot Tutorial - Chatbot with Deep Learning (Part 1)
bruh
Ever wanted to create an AI Chat bot? This python chatbot tutorial will show you how to create a chatbot with python using deep learning .
Playlist: https://www.youtube.com/watch?v=wypVcNIH6D4&list=PLzMcBGfZo4-ndH9FoC4YWHGXG5RZekt-Q
Download JSON File: https://techwithtim.net/wp-content/uploads/2019/05/json-file.zip
Text-Based Tutorial: http...
this guy made a very good chatbot
and you can create your own response model
I didnt explain you much today
but get some slee0
sleep
We discuss this tomorrow
okie
hello, I'm doing a curse about data science in the university. i need to do a project of data science, i need to use crawler/api (or both) to create a database to run models on the data. and i need to use machine learning after i crawl the data. i'm looking for a subject for 2 days and i didn't find something with enough information to crawl and to use machine learning on and that it wont be a pain in the a@@ (sorry)... i would love to hear you boys and girls if you have any subject for me and maybe a data websites ๐ ๐
i would appriciate it if you dm me so i wont miss your answer ๐
@oak violet don't worry too much about the topic, spend more time worrying about doing the work. just use the wikipedia API, you can "crawl" it by following links in the articles
@desert oar where can i find this wikipedia API?
@oak violet https://www.mediawiki.org/wiki/API:Main_page
wait, thats the wrong one i think
maybe i mis-remembered. either they have a read-only content API, or they are just willing to let you scrape the html site if you don't abuse their servers
it's somewhere in their ToS i think
alright thanks alot ๐ @desert oar
Anybody know of a python library that can extract MGDF from audio?
Hi all. College freshman interested in NLP and second language aquisition. Don't have any relevant experience with it. Ideally I'd like to learn how to develop/improve existing segmentation techniques for languages without spaces and detecting multi word expressions. I hear the Jurafsky book, Stanford cs224, Coursera machine learning, deep learning thrown around a lot but no idea where to begin. Ideally looking for resources/resources that will put me in the right spot as soon as possible without too much detour. Anyone have any experience in this/suggestions? Again, I have no relevant experience in this. Thank you very much
@exotic robin so you're interested in how to identify word, sentence, and morpheme boundaries in languages that don't separate words with whitespace?
precisely
@exotic robin let me think on that and get back to you.
What language btw? Hindi? Arabic?
Japanese, chinese primarily for now, but Arabic Hindi would be a cool task later on
Look to see if spacy has models for either of those
Yes for JApanese Chinese
No to Arabic Hindi
I've tried working with spacy before and it doesn't really do what I want
At least for japanese it uses a 3rd party morphological analyser
I think I have to reinvent or improve the existing technologies at a fundamental level
I'm also not just interested in segmentation but, also semantics as well, recognizing multi word expression and saying, hey, this part of teh sentence is a common colloquialism, but doesnt actually have grammatical breakdown
Japanese dependency stuff isn't very high accuracy at all, it fails even more in spoken language. Arabic especially with dialects is poor accuracy
Aren't the writing conventions for arabic dialects not really standardized?
That's a problem, yes
That was what I was told when I studied arabic
I suppose that's a way of putting it, although I haven't looked too much into the specific implications
So your overarching goal is to identify idioms?
My overarching goal is to develop a second language aquisition system that can intelligently, with the aid of NLP AI etc reccomend a user content that is only slightly more difficult than what they can understand, with an internal grading system that keeps track of understood and known words with user feedback. In the case known words or grammar strucutres are homophonic/spelled the same, the system keeps tracks of dependency and use types to see what scenarios the user has seen the words before
I want to be able to extract and annotate information in this manner:
"Yeah, bro. Whatever floats your boat."
is comprised of
"Yeah (Interjection), bro (noun). Whatever (determiner) floats your boat (expression) ."
which can be further broken down into
Floats your boat -> comprised of Float (verb) your (pronoun) boat (noun)
So if the user knows the words (float, your, boat), the algorithm will suggest this sentence, to allow the user to develop another meaning of the words within an expression.
If an expression can not be gramatically broken down (which is a specific type of multi word expression (I can link a paper about this stuff in Japanese)), then the system knows that.
The system will ideally need to understand, perhaps with lots of training data and human-built categorization, how to separate between different gramattical contexts when the word is spelled/sounded the same way.
This sounds like a really cool project, actually
There are many details but I condensed for brevity
An example in english
"I want to be a clown for Halloween"
"When will supper be ready"
Are two different uses of the word be
from sklearn.svm import SVC
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.metrics import plot_roc_curve
import matplotlib.pyplot as plt
X, y = load_wine(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
clf2 = SVC(kernel="linear", random_state=0)
clf2.fit(X_train, y_train)
plot_roc_curve(clf2, X_test, y_test)
plt.show()
I keep on getting the error ValueError: SVC should be a binary classifier in line plot_roc_curve(clf2, X_test, y_test), what's the problem here?
roc curve
looks at its arguments in sklearn
i dont remember it taking the classfier as an argument
i did already
import matplotlib.pyplot as plt
from sklearn import datasets, metrics, model_selection, svm
X, y = datasets.make_classification(random_state=0)
X_train, X_test, y_train, y_test = model_selection.train_test_split(
X, y, random_state=0)
clf = svm.SVC(random_state=0)
clf.fit(X_train, y_train)
metrics.plot_roc_curve(clf, X_test, y_test)
plt.show()
is the official example (i literally copy pasted)
how amny targets do you have?
target?
i just loaded the sklearn wine dataset
sorry im rly new to ml i have no idea what target means
In a classifying problem your target is what you're trying to classify
i also tried the iris set but same error
classification problem
yes because those ar enot binary datasets
ok think it like this
a binary target is
"WINE" "NOT WINE" the algorithm can only have 2 outcomes.
that's why it's binary
a multiclass (non binary) problem is when you have many targets. for example the iris dataset (setosa, versicolor, etc)
if the plot only accepts binary, yeah iris wont wok
hey guys i am getting a weard thing here
yesterday i made a keras model and now i am folowing the same path
but in the output i am getting 1 outputs instead of 1
this way i am being unable to train with my dataset
Hey @sharp turret!
It looks like you tried to attach file type(s) that we do not allow (.svg). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.
Feel free to ask in #community-meta if you think this is a mistake.
just a dumb question, opencv is related to this channel?
Hello guys!! Check this article on top python libraries to automate exploratory data analysis:
do y'all recommedn the google coursera courses for data analytics / cleanup / manipulation for a relatively new beginner?
do i post help question here for pandas?
I have a data frame in Pandas that I would like to add a dynamic variable to every row of one of the columns. Please, could someone advise how this can be achieved?
Does anybody have a real time example of how to use FFT? Something like this: https://realpython.com/python-scipy-fft/ but in real time
Hey, I am trying to make a discord AI chatbot for general convo
Is there any good already made
or like an open source I can download to help me
If not, what is the best wai to make one?
Is there any kind of solution out there which I could tweak a little bit according to my needs?
We're looking for a Lead Consultant - Data Scientist to join our Capgemini Invent team in Melbourne. This role will see you developing insightful models for various projects with a focus initially on risk modelling using Python and R.
You'll have 2+ years experience in Data Science with a history of successful data science implementations, ideally in a consulting environment.
dm me for more information, feel free to share!
how can i pass all my columns wich are float64 type to a string with pandas?
i have 867 columns, if i try to do it with one by one would be a impossible task
You can use .dtypes on a dataframe
As i have this lot of columns, it doesnt display all columns types for me, so i want to select with a command wich will localize all columns which is float64 type and then use .astype() to pass to a string
@inner estuary seems like select_dtypes is what you need? https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.select_dtypes.html
thanks man, helped a lot ๐ฆพ
How do I learn Python?
You can start at codecademy.com. That's how I got started.
ima learn through books...
codecademy is alright if you're an absolute beginner and don't mind the mind-numblingly slow pace. You also have to pay for the python 3 course.
!resources
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
Honestly, I havn't been to codecademy in so many years but damn, didn't know they charge for Python 3.
Just visited it. Lots of changes since I've been there. I can't believe Python 2 is still up to learn for free. smh
I'm working on a discord bot and I'd like to embed charts from pchartjs. Is anyone aware of a way to export chart objects straight to a .png or something? I've used matplotlib for it in the past but would like to try out pychartjs for this project. The documentation doesn't seem to mention it at all.
Nevermind. I found out about quickchart.io ๐
Information retrieval/NER?
oof. you can use some torch implementation to speed up FFT on CUDA
but it's pretty fast on CPU too in my expereince
recommend you do basic NLP first before jumping on a chatbot
what are the reasons that we use loglikehood rather than likelihood apart from underflow?
and what module do i need to use? nlkt?
for what?
I need to make a chatbot as I said, any tutorial or useful article, i need to make a chatbot for discord
like a pretty basic one
if it's ultra basic and naive, a rule based one would be appropriate
or you could hard-code the responses too
anything more "Ai-ish" would be fine-tuning or using pre-trained models
I need like a bot that will keep a conversation with the user
Catch as much as he can say, and dont fail so easily to reply
what kind of specifications is that?
I mean, all I need to make is a "smart" chatbot, keep a normal convo, nothing more special than that
well, you can use a pre-trained model then
OK where do I get that?
bruh, look it up
I see a lot of people in here asking how to make chat bots, but I think a lot of people underestimate how difficult that is and overestimate how useful one would actually be.
for context: making a chatbot that can have a "normal" conversation is still an open area of research and not something we actually know how to do, even with the huge amounts of research time an computing power available to the top researchers at tech companies
there are limited situations in which you can make kind-of functional chat bots (like SmarterChild from back in the AIM days...) using traditional AI techniques, as well as bring some deep learning to bear. but i agree, it's kind of a rabbit hole as far as hobby projects go.
I mean, I dont need to optimize it, I need to know how to do it
I dont need it for a personal use, I have been asked to code it
And ofc I know its not ez thats y I am asking ;p
Thanks! will have a look into this? Are there any python libraries you would recommend that deal with that? Or is there some kind of library? I just need some buzzwords to get me started :))
Spacy
Thank you!!
Hi
How can I use pyplot but add a title, a xlabel and ylabel, and the actual plot data all in one line?
plt.title('Volume en fonction de n')
plt.xlabel('dimension')
plt.ylabel('Volume')
plt.plot((D := np.arange(0,20,1)),[(2**i) * sum(np.sqrt(sum([val**2 for val in p])) < 1 for p in np.random.rand((10000*int(np.exp(i/5))),i))/(10000*int(np.exp(i/5))) for i in D])
plt.show()```
in one line, if possible
is there any reason why you want it in one line? That code looks decently readable as it is
Please help me solve this error..
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from xgboost import XGBClassifier
from sklearn.model_selection import KFold, cross_val_score
from sklearn.metrics import accuracy_score
from xgboost import plot_tree
df = pd.read_csv('D:/Python/pima-indians-diabetes.csv')
x = df.drop(columns='Class variable (0 or 1)')
y = df['Class variable (0 or 1)']
model = XGBClassifier()
model.fit(x, y)
plot_tree(model, num_trees=5)```
ExecutableNotFound Traceback (most recent call last)
<ipython-input-6-de2ad7aac029> in <module>
27
28 # score the model
---> 29 plot_tree(model, num_trees=5)
ExecutableNotFound: failed to execute ['dot', '-Kdot', '-Tpng'], make sure the Graphviz executables are on your systems' PATH```
if we do binary classification on animals(say cats) do we know which node of NN is responsible for which learned part.......
i mean if we train on black cat classifier and then apply to white cat it wont work...........so theres gotta be some node that are responsible for the color part............or is it that the probability on testing is collectively contribution of all nodes for all feature collectively like shape color eye etc etc?
seems like a great research topic if we know which node it contributing to what if each node has different things to access in image.
Hi
Can you help me in matrix?
I need some help
n = 6
mtx1 = []
print("Input elements as per rows: ")
for i in range(n):
y = []
for j in range(n):
y.append(int(input("Input elements: ")))
mtx1.append(y)
for i in range(n):
for j in range(n):
print(mtx1[i][j], end=" ")
print()
def check():
if (mtx1[i][j]==1)%2==0:
print("even")
check()
User must input only 0s and 1s every row
It should display the matrix and print whether every row and every column have an even number of 1s or not.
I'm blank. completely. I have only wrote the code of making the matrix. It displays it but I can't display whether a row or a column has even 1s or not
The problem is (as far as I remember) that most NN models are pretty much black boxes. There's a lot of going research in making this more transparent
SHAP is a library thats good for ML explainability
Huggingface
Hi, could anyone help me how to test my NLP model for Sentiment Analysis on newly added data? I have trained my dataset on labeled data with 3 categories and i have downloaded data for test from twitter where i would like to predict a class. test data set does not have any label . Both of the dataset were treated equally. I preprocessed them and trained them on word2vec model(BUT SEPARATEDLY) where i used embeddings of train dataset in my classfier in embedding layer. I was considering transfer learning in this. but i do not know what i should do . I was thinking about replacing weights of pre-trained classifier model with test embeddings weights. because when i do predict on my actual sequences from sentence there is always an error with indices or am I doing something wrong?
yeah i think research topic would be a gr8
we can already visualize the feature maps by different layers in a Network
I wouldn't go too deep, but if you look up "visulaized attention maps" for CNN's, you can also see which part the NN 'focuses' on
label = label.to(device)
AttributeError: 'tuple' object has no attribute 'to'
do i need to transform label to tensor for fix this error?
looks like you'll need to provide more context. label isn't something you can move to a device because it's a tuple
I assume you're trying to do GPU computation?
yeah
what library?
pytorch
you can move a tensor to a GPU.
why is the final clustering arrangement better than the first one for KMeans?
is there more context for this question?
So the question is "why is the second one better"?
already moved a tensor to a GPU, but i get same error
label = label.to(device='cuda')
AttributeError: 'tuple' object has no attribute 'to'
yh
Do you have any feelings about why the second one might be better so far?
is it because the intra cluster sample scatter smaller?
so label is still a tuple and not a tensor.
I've never heard it put that way, but that sounds good to me.
i.e. this
but i actually transform it to tensor
this is my transform:
train_transforms = transforms.Compose([
transforms.Resize((224, 224)),
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
])
This is a pretty obvious divide in the samples that the first image doesn't account for.
you wouldn't be getting AttributeError: 'tuple' object has no attribute 'to' if label wasn't a tuple.
also what are the reasons that we use loglikehood rather than likelihood apart from underflow?
true but i set K=3
it sounds like you're asking questions directly from an assignment. how much have you mulled them over?
These are not from an assignment but from a lab
but I think that -ve log likelihood has derivs that are easier to calculate but what is the advantage of using it as a quality metric rather than using the likelihood?
hi
why is it helpful to use transpose on matrix things such as x = np.random.normal(0, 1, 500) y = np.random.normal(0, 1 ,500) X = np.vstack((x,y)).T
xbar, ybar = x.mean(), y.mean()
return np.sum((x - xbar)*(y - ybar))/(len(x) - 1)
#covariance matrix
def cov_mat(X):
return np.array([[cov(X[0],X[0]),cov(X[0],X[1])], \
[cov(X[1],X[0]),cov(X[1],X[1])]])
#calculate covariance matrix
cov_mat(X.T) # (or with np.cov(X.T))``` just wondering why when you don't transpose it becomes fucked
does plt.show() dispose figures and axes? I have nested loops, there's a scatter plot before the inner most loop, and inside it there are more plots and the call to show(), but at every iteration only the inner most plots are still visible, the scatter outside the loop is not, why?
can we put a title on a 3-d scatter plot ?
Is there a way to have multiple tensorflow versions installed (and ready to choose) on one system?
I have several conda envs and the only thing reliable is the pip version as conda keeps making weird things. The pip version installed is global (activating a conda env with say 2.3 still results in running on pips 2.4.1)
I need multiple versions as tf produces weird bugs depending on code and version. Some things run in 2.3 which crash in 2.4.1. Some run in 2.3 but not in 2.1 or 2.4.1 and so on
not entirely sure what you're describing. show your code. plt.show() creates a new plot and show its, so yes it will discard anything created in a previous plot
I had to plot everything in the inner most loop
is there a way to make like layered plots?
layered meaning?
for example, keeping some scattered points even after calling plt.show(), without having to plot the points again
well
that depends on what you're trying to do.
with vstack each source array becomes a column (axis 1) in the result
by convention, covariance is calculated over axis 1
and an array of shape (2, 500) means you have 2 samples and 500 features
like if I could have independent axes on the same figure
transposing this means you have 500 samples and 2 features
this, of course, affects the shape of the covariance matrix
do you know the difference
between the object-oriented approach (using Figure and Axes objects directly) and the state-based approach (everything through plt)
the latter of which I believe mimics the MATLAB style?
also, what environment are you running your code in?
I'll look into the OOP approach
Jupyter
look into interactive mode
wait
Jupyter LAB
or Jupyter NOTEBOOK?
fig, ax = plt.subplots()
?
you can overlay multiple things on the same Axis
as well as plot multiple axes on a grid
it wasn't working, and I didn't commit it
what "wasn't working" about it?
fig, ax = plt.subplots()
ax.scatter(x, y)
ax.scatter(u, v)
for loop:
plt.scatter(...) # this kept disappearing
for loop:
plt.scatter(...)
plt.show()
fig, axes = plt.subplots(2, 2)
ax[0,0].scatter(x, y)
ax[0,0].scatter(u, v)
ax[1,1].plot(a, b)
ax[1,1].scatter(c, d))
etc.
what is the desired outcome? you want all of this on the same set of axes?
can you share your actual code?
it's the repo I shared, but as I said I didn't commit my attempt to use axes
ok then. in general: to plot on the same set of axes, use the underlying Axis object and its associated methods, instead of plt. Use plt.subplots to create a new axis, or multiple axes in a grid
Axes
Axis is something else
for loop:
# scatter star
for loop:
# scatter crosses
# plot black and yellow line
plt.show()
# plot green line
I wanted to keep the star, the crosses, and yellow and green lines (these have different lifetimes tho)
if you do show() inside the loop it will create a new plot at each iteration of the loop
are you trying to create an animation?
yeah that's the point
not in this case because it's a notebook
what's the relevance
okay okay that's how it is right now, thanks
it does
unless I could go back and forth for each iteration interactively, I want to take my time understanding each iteration
hm
you could
use Jupyter widgets
to interactively control the "flow of time" of your plot
that's probz what I would do
but you have to be comfortable working with that kind of stuff
interesting I'll look into that
added complexity
anyway the point is
in interactive mode
you don't call plt.show
or fig.show
you just make changes to the Figure/Axes
and things happen
but you need to modify the backend
MPL uses
%matplotlib notebook
it's been like a year+ since I worked with this though
okok gotta look into that, thanks
there should be a channel for plotting related stuff
this is it
doesn't the majority of that get lost between actual data science and AI stuff?
nevertheless
this is the best channel for MPL, plotly, etc.
also a bit niche
a channel just for plotting stuff...
if it matters to you, you can take it up @ #community-meta though
hahahaha it'd be dead lol
people also post in help channels, but you will see more relevant activity here
(and have more relevant help)
the "how do i chat bot" questions dont usually go anywhere anyway
guys
i done something
some of you may remember about me talking i was going to make a reinforcement learning
i had built my own environment to work with retroarch
it still has manny rough edges
but it is kinda working
unfortunately i am really bad at making agents and models so it is taking a wille to be able to show you my results
and it needs a little set up so its not just post here and you are ready to try yourself
but i am working in some way you will be able to easly build models for it with simplified parameters and working examples
but i am more focused in modular score read where with some keywords it will be easy to mout it for other games
i will soon be launching it on git hub
and i count with your help to expand it for more score methods and more agents so we can make tournaments on twitch of ai competition
if you're looping to create the axes, why not use the OOP approach, create all your figures, and call .show at the end?
fuzz wuzzy?
@astral path close enough
nvm i solved it already
https://pypi.org/project/fuzzywuzzy everyone's favorite unmaintained implementation of one ad-hoc string comparison algorithm that works unreasonably well for how ugly it is
ah yes, they're still using the root logger, in delightfully flagrant disregard of good practices https://github.com/seatgeek/fuzzywuzzy/blob/master/fuzzywuzzy/process.py#L81
fuzzywuzzy/process.py line 81
logging.warning(u"Applied processor reduces input query to empty string, "```
edit distance is one of those things that i've found extremely easy to understand intuitively, but I can't code it for crap lol
they use other libraries for the actual edit distance implementation anyway
How to fix this.
UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure. plt.show()
I have python-tk installed
