#data-science-and-ml

proud wing Dec 30, 2023, 7:20 PM

#

i was just trying to come up with a general case but i figured it would be way slower.

#

well you mean splitting the array into sub arrays right?

topaz turtle Dec 30, 2023, 7:20 PM

#

proud wing well you mean splitting the array into sub arrays right?

well yes

#

i'm calling them tensors but ye they're essentially just n-dimensional arrays

proud wing Dec 30, 2023, 7:21 PM

#

thats just being efficient

#

i already played with that too 🙂

#

@serene scaffold Good day to you!

topaz turtle Dec 30, 2023, 7:21 PM

#

proud wing i already played with that too 🙂

the splitting into subtensors to fit into cache part?

proud wing Dec 30, 2023, 7:22 PM

#

yea

#

i started with a max_dimension of 32

#

to see if I could contract

serene scaffold Dec 30, 2023, 7:27 PM

#

proud wing <@253696366952316929> Good day to you!

Good day, fellow intelligentleman

proud wing Dec 30, 2023, 7:29 PM

#

The maximum size of an array is limited by the sys.maxsize variable, which is typically around 2^31 bytes.

#

If you're passing that up rn... you'd need an einstein and a chalk board with extensive proofs experience to prove the case beyond that... 😄

#

I'm a computer science lover with a nerdery for math so if the computer cant do it, I move on haha

topaz turtle Dec 30, 2023, 7:34 PM

#

proud wing If you're passing that up rn... you'd need an einstein and a chalk board with ex...

i mean sure, it's just that in my mind if i can come up with a way to do that for like 3 or 4 dimensional tensors contractions, i can easily generalise that to any arbitrary numbe or dimensions that i'd want my library to support

#

currently attempting to fish out of numpy developers how they acc did it lol

#

bc their C code is incomprehensible mess bc it essentially only functions to support the python wrapper

#

so i can't just take a look at it myself lol

#

😭

proud wing Dec 30, 2023, 7:35 PM

#

The buffer protocol, which is an abtraction basically

#

I was thinking if my method using TRIS works faster than numpy even for the case we're evaluating

#

that might be quite valuable

topaz turtle Dec 30, 2023, 7:37 PM

#

ye def

proud wing Dec 30, 2023, 7:37 PM

#

it would probably become adopted by a lot of people as it could speed up inference

topaz turtle Dec 30, 2023, 7:38 PM

#

i noticed that an arbitrary numpy tensor contraction takes about the same as a matrix multiplication with the same number of elements, so their tensor contraction is at the limit of matmul speed

#

so it would be quite astonishing if you made it faster

proud wing Dec 30, 2023, 7:38 PM

#

Here's a useful trick with numpy btw. Use x.array_interface to explore the array

#

We might need to use numpsys dispatch mechanism

#

numpys

#

To check if a Numpy function can be overridden via array_ufunc, you can use allows_array_ufunc_override.

#

Looks like if you want to push numpy to its breaking point, there's some workaround.

#

I haven't explored it much though

#

If you go really low level with numpy, you get into some fun words like jiffies

#

@serene scaffold How many jiffies does it take to turn on a light bulb::D

#

https://github.com/numpy/numpy/blob/v1.26.0/numpy/core/code_generators/generate_numpy_api.py shows examples how they wrote C to abstract the api

GitHub

numpy/numpy/core/code_generators/generate_numpy_api.py at v1.26.0 ·...

The fundamental package for scientific computing with Python. - numpy/numpy

#

They seem to have designed things fairly well to handle cases up to 3 dimensions

#

https://github.com/numpy/numpy/blob/d35cd07ea997f033b2d89d349734c61f5de54b0d/numpy/core/shape_base.py#L855

arctic wedgeBOT Dec 30, 2023, 7:59 PM

#

numpy/core/shape_base.py line 855

# It was found through benchmarking that making an array of final size```

proud wing Dec 30, 2023, 8:00 PM

#

Also worth reading: https://github.com/numpy/numpy/blob/v1.26.0/numpy/core/einsumfunc.py

GitHub

numpy/numpy/core/einsumfunc.py at v1.26.0 · numpy/numpy

The fundamental package for scientific computing with Python. - numpy/numpy

#

I didn't know this but apparently _can_dot Checks if we can use BLAS (np.tensordot) call and its beneficial to do so.

#

They wrote: " If the operations is BLAS level 1 or 2 and is not already aligned we default back to einsum as the memory movement to copy is more costly than the operation itself."

#

Also you can see their exploration of einsum and testing it with various algorithm pathings (eg. greedy) or pre-computing the optimal path and repeatedly applying it https://github.com/numpy/numpy/blob/d35cd07ea997f033b2d89d349734c61f5de54b0d/numpy/core/einsumfunc.py#L1334

arctic wedgeBOT Dec 30, 2023, 8:09 PM

#

numpy/core/einsumfunc.py line 1334

Chained array operations. For more complicated contractions, speed ups```

proud wing Dec 30, 2023, 8:27 PM

#

Einops is a really big advancement in easing tensor manipulation

pearl barn Dec 30, 2023, 8:32 PM

#

guys how can i run this pages locally on my anaconda

#

https://jovian.com/learn/data-analysis-with-python-zero-to-pandas/lesson/lesson-1-introduction-to-programming-with-python

Lesson 1 - Introduction to Programming with Python | Jovian

First steps with Python & Jupyter notebooks
Arithmetic, conditional & logical operators in Python
Quick tour with Variables and common data types

#

is this better or should i switch to alex the analyst videos because i spent along time to only learn the basics and i still didn't finish learning braching with if else elif

proud wing Dec 30, 2023, 8:37 PM

#

@pearl barn

You mean like: conda -n create "your_environment" ?

#

Just do something like conda -n create testingenv python=3.10
conda activate testingenv

then just type python on the console or make a new python script

#

python on the local console will allow you to explore those functions

#

@pearl barn

pearl barn Dec 30, 2023, 9:06 PM

#

Didn't understand anything you are so good in this

exotic star Dec 30, 2023, 10:20 PM

#

how can u make a program in python that can scan 100 plants(a list of plants u made) and recognize them?
my big brother works on a project with a few other people they need to hire a programmer for this and asked me if i can all though i'm few months in and idk much, my big brother is in nanotech so he doesn't know anything about programming and i'm curious about this

trim saddle Dec 30, 2023, 10:43 PM

#

pearl barn Didn't understand anything you are so good in this

Create a anaconda environment, see @proud wing video, or google for how to do that
install jupyterlab with conda install -c conda-forge jupyterlab
start jupyter lab and open the jupyter notebook from your tutorial (have to download the .ipynb file)

proud wing Dec 30, 2023, 10:58 PM

#

@exotic star There's already apps that do this on the app store. You'd need to train a classifier model on plants that are labeled, with an extensive exsting database of plants. It's not easily done with limited experience from scratch.

exotic star Dec 30, 2023, 10:59 PM

#

proud wing <@858344229092589628> There's already apps that do this on the app store. You'd ...

If you were to hire a programmer, how much would it cost

#

to make handle the programming only(not the design of the app)

proud wing Dec 30, 2023, 11:01 PM

#

You could check out fiverr or toptal and ask around there, you might get a quote from someone capable.

exotic star Dec 30, 2023, 11:01 PM

#

proud wing You could check out fiverr or toptal and ask around there, you might get a quote...

got it, ty for the help

covert finch Dec 31, 2023, 12:30 AM

#

hello everyone! I have been a stats programmer working (almost) exclusively with pandas for the last 7 years. I recently began a new job working out of a databricks environment, in which 95% of the notebooks are writted in pyspark. I am looking to sharpen my knowledge of pyspark, but I am having trouble setting up an environment to practice on my local machine since. Does anyone have any questions on how I can do a little pyspark studying outside of my org?

agile owl Dec 31, 2023, 1:53 AM

#

you have to install spark separately then install pyspark

covert finch Dec 31, 2023, 3:24 AM

#

yeah, Java too right? I just cant seem to get my environment correct

agile owl Dec 31, 2023, 3:25 AM

#

why dont you say what the actual issue is because there's a lot of guides out there on how to set it up including the one from the spark docs themselves

covert finch Dec 31, 2023, 3:29 AM

#

sure just give me one second if you dont mind

#

import pyspark
import os
os.environ["JAVA_HOME"] = r"C:\Program Files (x86)\Java\jre-1.8"
os.environ["SPARK_HOME"] = r"C:\Users\Vince\Downloads\spark-3.5.0-bin-hadoop3.tgz"
import findspark
findspark.init()
from pyspark.sql import SparkSession
from pyspark import SQLContext

spark = SparkSession.builder.master("local[*]").getOrCreate()
df=spark.read.options(delimiter=",", header=True).csv("fake_ae1.csv")
df.show()

FileNotFoundError Traceback (most recent call last)
Cell In[3], line 10
7 from pyspark.sql import SparkSession
8 from pyspark import SQLContext
---> 10 spark = SparkSession.builder.master("local[*]").getOrCreate()

#

this was in jupyter notebook in a virtual env

pearl barn Dec 31, 2023, 7:26 AM

#

trim saddle 1. Create a anaconda environment, see <@1129507424886345879> video, or google fo...

How to download .ipynb it's already on jovian website but don't wanna to use it

trim saddle Dec 31, 2023, 10:06 AM

#

pearl barn How to download .ipynb it's already on jovian website but don't wanna to use it

Idk if theresan option on jovian, but worst case, you can just copy the cells from online into a new local notebook

pearl barn Dec 31, 2023, 10:51 AM

#

It shows run code locally but nothing happens after that

trim saddle Dec 31, 2023, 11:58 AM

#

What does it show in your jupyter notebook locally? Can you provide code/picture, what it Shows and ehat you expected

potent pollen Dec 31, 2023, 2:45 PM

#

Guys I'm desperate

#

I've created a librairy to create and use neural networks, and one to make my IA play Tetirs. I've been careful to the details that could cause a disfunction on my IA. The librairy I created is for unsupervised learning, and the problem is that after 500 generations of 124 agents, they don't play any better. I search on internet, and some tell me that Tetris cannot be learned with an unsupervised algorithm. It really bother me, since I've been working on it for months : are you able to explain the fact that my IA don't learn, or is it just that she cannot learn by herself Tetris?

serene scaffold Dec 31, 2023, 3:14 PM

#

potent pollen I've created a librairy to create and use neural networks, and one to make my IA...

by unsupervised, do you mean reinforcement?

keen delta Dec 31, 2023, 3:33 PM

#

Hey guys, i am trying to do a machine learning project and I'm stuck in part and not able to figure out what to do. The project is about pdf question answering using llms. I need help

potent pollen Dec 31, 2023, 6:00 PM

#

serene scaffold by unsupervised, do you mean reinforcement?

Yes

serene scaffold Dec 31, 2023, 6:02 PM

#

potent pollen Yes

you can train a tetris playing agent with reinforcement learning.
Remember that artificial intelligence is AI in English, not IA.

serene scaffold Dec 31, 2023, 6:03 PM

#

keen delta Hey guys, i am trying to do a machine learning project and I'm stuck in part and...

how far did you get, and what's the current stumbling block?
Remember that using LLMs is quite challenging.

potent pollen Dec 31, 2023, 6:04 PM

#

serene scaffold you can train a tetris playing agent with reinforcement learning. Remember that ...

Ok so I'm just bad, I will work on it. Thank you !

serene scaffold Dec 31, 2023, 6:05 PM

#

potent pollen Ok so I'm just bad, I will work on it. Thank you !

You are great

potent pollen Dec 31, 2023, 6:11 PM

#

serene scaffold You are great

Thanks, you're right. I tried to do my AI with a RNN, is it compulsory to use a CNN in our opinion ?

serene scaffold Dec 31, 2023, 6:12 PM

#

potent pollen Thanks, you're right. I tried to do my AI with a RNN, is it compulsory to use a ...

I've never done anything with reinforcement learning, so idk

small wedge Dec 31, 2023, 6:16 PM

#

potent pollen Thanks, you're right. I tried to do my AI with a RNN, is it compulsory to use a ...

It usually takes way more time to use a CNN in my experience. Generally it's easier for a model to learn when the input space is smaller.

past meteor Dec 31, 2023, 6:16 PM

#

potent pollen Thanks, you're right. I tried to do my AI with a RNN, is it compulsory to use a ...

If you're operating on raw pixels for Tetris then a CNN might make more sense than an RNN

past meteor Dec 31, 2023, 6:18 PM

#

potent pollen I've created a librairy to create and use neural networks, and one to make my IA...

It seems like this isn't even reinforcement learning but it's actually a genetic algorithm, is this correct?

potent pollen Dec 31, 2023, 6:18 PM

#

past meteor If you're operating on raw pixels for Tetris then a CNN might make more sense th...

Well thats what I did, I entered 34 neurons with processed data, but no results. My algorithm is mabe just too simple, as I was saying I'm not using Tensor flow or anything

potent pollen Dec 31, 2023, 6:18 PM

#

past meteor It seems like this isn't even reinforcement learning but it's actually a genetic...

Yes, my bad

#

Well done

past meteor Dec 31, 2023, 6:19 PM

#

Why are you using a genetic algorithm for this? Any particular reason?

potent pollen Dec 31, 2023, 6:20 PM

#

It was just an algotihm that I knowed how to code, I thought it would have work

#

Is that the problem ?

past meteor Dec 31, 2023, 6:22 PM

#

potent pollen It was just an algotihm that I knowed how to code, I thought it would have work

It can work but it's a bit "wasteful". It's a black box optimiser (look this up) and there're options that can actually use more information of the game and learn in a more "directed" way, for instance deep Q Net (DQN)

#

A genetic algo might work still, mostly because Tetris isn't a super hard problem I suppose

potent pollen Dec 31, 2023, 6:24 PM

#

Ok then I will check on what you tell me, even though it would be a relief for me that my alg works

keen delta Dec 31, 2023, 6:42 PM

#

serene scaffold how far did you get, and what's the current stumbling block? Remember that using...

so i am trying to make a pdf question answering chatbot using google gemini's llm. I have made one with openai and falcon-7b llm and they're working perfectly however when i'm trying to make one with google gemini, i'm getting a lot of errors. I am using streamlit and the issue is that when i'm trying to run the code, i am getting no output

agile owl Dec 31, 2023, 6:42 PM

#

what can cause this type of results in reinforcement learning

#

I have a branch for the reward wonder if the branch and nonlinearity there is causing this

#

it's an accumulator and if it falls below 70% of its original balance I give it a heavy reward penalty

#

otherwise the reward is the net return divided by an estimate of the variance of the return

#

I'm using PPO with LSTM from sb3-contribs

#

the observations are the variance driver, a variance estimate, and a handful of exogeneous state variables

#

Not sure how to design the reward to get what I want if I need to take out the branch for it to work properly

#

I played with genetic algorithms for a bit and they showed some promise at first but I think they are just variance machines

#

any information I gleaned from them that stood up out-of-sample was marginal

#

I ended up using reinforcement learning for my problem instead even though I had some fun messing around with evolutionary algos

#

for genetic algos to work I would actually have to be able to parameterize the underlying process in a way that aligns with what is causing variance in reality which is a nearly impossible task

#

reinforcement learning basically does that part for you

#

I've come to the conclusion after a couple months that genetic algos are basically trash for most things like zestar said

#

my reinforcement learner is trying to put the appropriate coefficient on the variance driver over time to form a series of products that optimize mean over variance over time but I have some special rules I want to add on top of that

#

I'd rather it do nothing than explore into spaces where it can end up in a certain state, basically like a game over for a video game

#

but even penalizing ending up in that state doesn't seem to prevent it reliably and it also has some really weird results on the learning curve

#

is that just the nature of the beast with reinforcement learning and I need to tinker with the weights of the reward if I want to branch it?

final kiln Dec 31, 2023, 7:18 PM

#

got my gpu quota request accepted, cloudwatch all setup, logs broadcast back to the master node which I just log.info so that it gets printed to the runner logs. two missing pieces are the runner workflow with secrets and finish implementing the fault tolerance in the training loop, it's actually already implemented and tested, I just need some way of having aws notify the program that the instance is gonna be terminated in 2 min, other than that signal its all done.

agile owl Dec 31, 2023, 7:40 PM

#

like what does this mean in reinforcement learning

#

when the rewards actually go down if you train it too long

#

if I keep letting it explore in theory should it eventually hit a new optimum

#

or is it "stuck"

worldly dawn Dec 31, 2023, 10:25 PM

#

potent pollen I've created a librairy to create and use neural networks, and one to make my IA...

There are quite a few dependencies on how you model the agents, how you combine/mutate/evolve your population, on how you measure fitness, etc.

potent pollen Dec 31, 2023, 10:26 PM

#

worldly dawn There are quite a few dependencies on how you model the agents, how you combine/...

Yeah I know, but I didn't have a choice but to create my own, since I wanted to make it run on my calculator

worldly dawn Dec 31, 2023, 10:27 PM

#

potent pollen Yeah I know, but I didn't have a choice but to create my own, since I wanted to ...

Sure, but there are entire books dedicated to genetic algorithms.
Without providing more info on your implementation, there won't be much that can be said

potent pollen Dec 31, 2023, 10:29 PM

#

I am just modifing the weights and biases of the network using a uniform function, so it's basically random modifications ig

worldly dawn Dec 31, 2023, 10:30 PM

#

oh so no selection or combination

echo mesa Dec 31, 2023, 10:30 PM

#

Hello guys I'm facing with a problem. I'm currently in secondary school and at the end of school we are gonna learn calculus, currently I'm doing machine learning compilers and so on which would require the knowledge of calculus what I am wondering about is whether I should learn calculus on my own right now or I should wait till I learn it in school, would learning calculus be a waste of time on my own if I'm gonna learn it in school anyways or should I pursue it?

proud wing Dec 31, 2023, 10:31 PM

#

@echo mesa why are you asking us?

#

@echo mesa You should learn what you are truly interested and excited to learn. Not what 'school' tells you to learn.

#

Some areas of math that will be useful for you if you want to do some really wild stuff with machine learning include linear algebra and tensor calculus.

final kiln Dec 31, 2023, 10:32 PM

#

echo mesa Hello guys I'm facing with a problem. I'm currently in secondary school and at ...

Yes

proud wing Dec 31, 2023, 10:33 PM

#

Hey @final kiln where did you get your quota incrceased at?

final kiln Dec 31, 2023, 10:33 PM

#

proud wing Hey <@935270247366271027> where did you get your quota incrceased at?

In the quota page. Just search quota on AWS console

proud wing Dec 31, 2023, 10:33 PM

#

I want to test my general approach for calculating tensor contractions in C vs Numpys... if its faster I will probably need to publish a paper on it

#

@final kiln oh i just meant which provider are you using 🙂

#

@final kiln I use Google cloud for most things cloud, except ML

final kiln Dec 31, 2023, 10:34 PM

#

proud wing <@935270247366271027> oh i just meant which provider are you using 🙂

I have funding for AWS. I co founded a 501c3 and we get tons of free stuff to do open source

final kiln Dec 31, 2023, 10:35 PM

#

proud wing I want to test my general approach for calculating tensor contractions in C vs N...

C is usually faster

#

What results did you get ?

iron basalt Dec 31, 2023, 10:36 PM

#

proud wing I want to test my general approach for calculating tensor contractions in C vs N...

If your C code is slower than Numpy then your C implementation is wrong.

final kiln Dec 31, 2023, 10:37 PM

#

That is usually how it goes

iron basalt Dec 31, 2023, 10:37 PM

#

(And/or compiler flags)

iron basalt Dec 31, 2023, 10:37 PM

#

iron basalt If your C code is slower than Numpy then your C implementation is wrong.

This goes for pretty much everything vs C, except maybe assembly if you try hard enough.

#

(Although you then usually are doing assembly inline in C via compiler extensions)

final kiln Dec 31, 2023, 10:38 PM

#

My impression is that you need to be some sort of wizard to beat the compiler

iron basalt Dec 31, 2023, 10:39 PM

#

Not really, compilers are good at repetitive optimizations that would take a human way too long to apply to the whole huge code base, but with time a human will always win.

#

It's like saying that ChatGPT writes better code (which is probably something people will start saying too).

#

It's about time spent doing it though.

#

But most of the time performance issues are from the 80% gains given by computational complexity (big O needs to be reasonable (but not best)) and not having the CPU do more work than needed in general (e.g. by choosing C over Python) for the same end result.

worldly dawn Dec 31, 2023, 10:41 PM

#

it gets muddy very quickly. There are all sorts of optimizations, including those that benefit from observing the behavior of the program

iron basalt Dec 31, 2023, 10:43 PM

#

iron basalt But most of the time performance issues are from the 80% gains given by computat...

(And knowing the general stuff like the CPU cache / are you compute bound or memory bound?)

final kiln Dec 31, 2023, 10:45 PM

#

Uhm, all I know is avoid branches, keep data local and compact, inline stuff.

#

Has served me quite well

iron basalt Dec 31, 2023, 10:46 PM

#

final kiln Uhm, all I know is avoid branches, keep data local and compact, inline stuff.

The gist is that your CPU can operate on data so fast that what often really matters is getting the data to the CPU in the first place. The CPU has its own local memory (the cache) to make this faster. To help the CPU you want to make memory access predictable and also it fetches memory in chunks, so contiguous memory.

#

(e.g. loading a single integer from RAM takes so long that in that time your CPU can do hundreds of additions / multiplications)

final kiln Dec 31, 2023, 10:47 PM

#

iron basalt The gist is that your CPU can operate on data so fast that what often really mat...

I am aware. RAM fetches r slow

#

Like some hundred cycles or something like that

worldly dawn Dec 31, 2023, 10:47 PM

#

also pipelines have an impact

iron basalt Dec 31, 2023, 10:47 PM

#

Since your CPU has SIMD now and multiple threads and more, it can do a ridiculous amount of stuff in that same amount of time.

#

For that to become the bottleneck you need to do a ridiculous amount of stuff that touches the same memory (already in cache then in that case) over and over, e.g. matrix multiplication of large matrices.

hollow sentinel Dec 31, 2023, 11:11 PM

#

so i got this data from this website: https://catalog.data.gov/dataset/national-student-loan-data-system-722b0

#

i'm kinda overwhelmed

#

i downloaded all their spreadsheets

#

i had a hypothesis that students from certain demographic groups (age, gender, ethnicity) might exhibit different loan default rates.

#

the problem is that the data that i'm looking at does not show any demographics

hollow sentinel Dec 31, 2023, 11:59 PM

#

https://nces.ed.gov/programs/digest/d22/tables/dt22_331.95.asp?current=yes i switched to this dataset

Percentage of undergraduate degree/certificate completers who ever ...

The primary purpose of the Digest of Education Statistics is to provide a compilation of statistical information covering the broad field of American education from prekindergarten through graduate school. The Digest includes a selection of data from many sources, both government and private, and draws especially on the results of surveys and ac...

#

trying to get clean data off this is not fun

unique summit Jan 1, 2024, 1:33 AM

#

anyone want to spend the new year helping me find where the issue is with a gesture recognizer calculation:)

agile owl Jan 1, 2024, 3:51 AM

#

It's very easy to pin a CPU with anything that's inherently iterative and if it's something that can be parallelized you can easily pin a server chip of arbitrary size with a large enough problem

#

i disagree with this idea that cpus are not a bottleneck you have to worry about, it depends on what you're trying to do

#

there's a lot of problem spaces that can't be vectorized to a GPU atleast not very easily

#

if you are doing evolutionary algos you can easily pin a cpu for hours and end up with garbage because it's a variance machine

#

reward = self.sharpe_ratio - prev_sharpe_ratio
if self.initial_margin > 0.2 * current_portfolio_value:
    reward = min(reward, 0)

so things like these in reinforcement learning, is it right to make them part of the reward return or should I make them deterministic constraints such that if the agent runs into them it will always do a corrective action

worldly dawn Jan 1, 2024, 4:25 AM

#

agile owl if you are doing evolutionary algos you can easily pin a cpu for hours and end u...

I would rather describe them as meta heuristic.

Whether that runs on GPU or CPU would depend more on the problem to be solved

#

And whether or not you end up with garbage is more linked to how they are used, no different than any other method.

agile owl Jan 1, 2024, 4:31 AM

#

I think EA are more prone to overfitting than anything else

#

unless you happen to know the problem has a certain structure and you're just solving for the parameters it's like shooting in the dark that the parameters you are solving for actually mean anything

worldly dawn Jan 1, 2024, 4:33 AM

#

not really

#

That said, feature engineering is an important step for any ML/AI related problem

agile owl Jan 1, 2024, 4:34 AM

#

engineering the features is different from engineering a parameterization of them

#

for RL you engineer features and the algorithm handles the structure

#

for EA you have to also provide a structure to the problem

#

and if it doesn't match reality, then you get garbage

#

which is easy to have happen I think from my experience with them but maybe other people have had better experiences

worldly dawn Jan 1, 2024, 4:36 AM

#

In my experience, if you get garbage with EA, is that you are doing something really wrong

agile owl Jan 1, 2024, 4:37 AM

#

let's say it's an agent problem like playing a game or optimizing some return over time

#

for RL you have to define the game and the features, sure, but you don't have to tell it what to do as a result of the features being some value or another etc.

#

if you want to use EA you have to have an idea of the causal relationship between the features and whatever the reward is

#

and learn whatever weights/parameters you apply to them using the EA

#

to me it seems easy to get that part wrong

worldly dawn Jan 1, 2024, 4:40 AM

#

take NEAT for instance

#

you don't model the structure

#

you have your input and output

#

like any other ML model

#

but it could be anything else you think is appropriate

#

You don't need any causal relationship. That's not how EA work

agile owl Jan 1, 2024, 4:41 AM

#

EA is just a way to solve for parameters right

#

the genes that solve a problem the best

worldly dawn Jan 1, 2024, 4:42 AM

#

What matter is you need to make it friendly to your mutation/selection/combination operators

agile owl Jan 1, 2024, 4:43 AM

#

what I got was something that fit in-sample time series data extremely well

#

and generalized very poorly out of sample

#

whereas with RL I get comparable performances

#

that's why I called it pejoratively a variance machine

#

I have no doubt that EA could reproduce what I'm doing with RL if I understood the right way to use the genes

#

but if I knew that why would I be using ML methods to begin with

worldly dawn Jan 1, 2024, 4:47 AM

#

Right. And similarly, had you gotten great results with EA and terrible results with RL (even for a single typo), you might be stating the opposite.

What I am reacting to is not that EA should be the solution to all the problems. Far from that. I am reacting to calling it garbage. It's a great tool which works very well for what it's made

agile owl Jan 1, 2024, 4:47 AM

#

I understood it to just be a way to fit parameters for experimental models

#

things where you aren't quite sure how to solve it using something like differentiation and gradients

worldly dawn Jan 1, 2024, 4:47 AM

#

It's great when you are ok with approximate solutions, it's very expensive to derive a model or have to deal with non-linear stuff

#

And also it has a cool factor

#

I mean, it's a meta heuristic after all

#

not a random generator

agile owl Jan 1, 2024, 4:49 AM

#

I think unless there's a clearer way to control overfitting then it's of limited use to complex problems like trading

#

and even if there was one, and I spent a lot of time thinking about it and talking with people about it, you still end up stuck if you find out you are overfitting and you don't know how to change the model to generalize better

#

it's just a lot more work

#

also even though you can parallelize it I don't think you can vectorize it in general

#

EA is only a partial solution I guess is the point and the other part of the solution can be arbitarily complex and something like RL can handle that better if you can't start to put into words how you should use the features you have available to solve the problem

#

you have to be searching in the right places to start with

#

I'd like something like NEAT for RL though

worldly dawn Jan 1, 2024, 5:06 AM

#

agile owl I think unless there's a clearer way to control overfitting then it's of limited...

overfitting is no different than other methods

agile owl Jan 1, 2024, 5:07 AM

#

I think the whole point of EAs is it depends on what you are using the genes for

#

it seems like with NEAT the genes themselves are weights in a neural network that will handle the structure of the problem for you

#

but then it's really a neural network but you are using EA as the optimization methodology instead of gradient descent

#

so EA is just an optimization method

#

you still need something else to provide the structure to the problem

#

anyway feel like I'm just repeating myself at this point not sure what our difference in understanding is

#

like, how do you use EA?

worldly dawn Jan 1, 2024, 5:12 AM

#

agile owl it seems like with NEAT the genes themselves are weights in a neural network tha...

the contribution to NEAT is that you aren't just modifying weights, but also the whole structure

worldly dawn Jan 1, 2024, 5:13 AM

#

agile owl like, how do you use EA?

I have used it in many contexts, from generative arts, agents in games, to generating code in b2b products

agile owl Jan 1, 2024, 5:13 AM

#

right, where the structure is represented by the genes

#

you are providing the neural network and the ways its structure can be represented as a way to abstract away the structuring of the problem though

#

I was thinking of EA with freeform modeling by hand and solving for parameters

worldly dawn Jan 1, 2024, 5:15 AM

#

I mean, only the game agent was using NEAT. The rest was using other structures

agile owl Jan 1, 2024, 5:17 AM

#

I am curious how they are able to do both the structures and the weights with EA though

#

given that the number of weights youd have to solve for would be different depending on the structural parameters

#

so the individuals would be of different sizes wouldnt they

worldly dawn Jan 1, 2024, 5:17 AM

#

I would recommend to read the paper about NEAT. It's a pretty cool paper and easy read too

#

And if you are in the mood, you could take it further with CPPNs

agile owl Jan 1, 2024, 5:18 AM

#

I'm having some success with sb3 is there a library like that for NEAT

#

and is it as documented and easy to use

worldly dawn Jan 1, 2024, 5:19 AM

#

there are some python neat libraries

#

I have been using java though for the NEAT stuff

agile owl Jan 1, 2024, 5:20 AM

#

that's often scary because it means there's a diffusion and duplication of efforts

#

and none of them end up being obviously the right choice

paper aurora Jan 1, 2024, 11:33 AM

#

Hi there, looking for someone as an intern who is good in image processing and know how to deal with PTZ/Network cameras.
Should be available for next 3 months as an intern. Hit me up!!

final kiln Jan 1, 2024, 11:38 AM

#

donzies

#

gotta test it on a gpu machine tho

#

should work the same if the os is the same and the drivers are pre installed

#

#

#

its going for 100 epochs, so I got a lot of time for testing

#

im gonna see if I can get aws to send the spot notice of termination to verify that the fault tolerance isworking

#

ah no way to do it, I'd have to mock it

final kiln Jan 1, 2024, 12:26 PM

#

mlflow is goated

#

it's backing up my models automatically to s3, providing fault tolerance and facilitating master-slave communication between my actions workflow and spot instance

#

goated

#

yep, like everything in the internet

#

works quite well tho, what happened

#

don't see how this can possibly substitute an ML Eng, it's more like a logbook or a database

#

im a bit confused tho

#

looking at their website they dont seem to claim that it is a substitute

#

it's possible they offer some sort of consulting service to maintain ml infra

#

well in any case, im super happy with my setup

#

I can run the experiments right from actions

#

then see the stats on mlflow, and if I dont like it I can redo it by setting the same params and some other adjustment

#

all with spot instances, which is super cost effective

#

and I can always change the model in code, test it locally with cpu, and then go on to gpu via the actions workflow

final kiln Jan 1, 2024, 1:10 PM

#

I think this is a good point to stop and move on to the Shakespeare dataset, gpu is not fully figured out, but I'm sure it's some detail that I can handle once I need it, I also got quotas for the expensive gpus only so Im gonna have to wait a bit more on that

past meteor Jan 1, 2024, 1:59 PM

#

I use MLFLOW a bit myself at work. I do it on-premise though, it's a fine tool.

cosmic plover Jan 1, 2024, 2:11 PM

#

hello, I want to change my field to bioinformatics. can anyone guide for the starters>

left tartan Jan 1, 2024, 3:46 PM

#

paper aurora Hi there, looking for someone as an intern who is good in image processing and k...

!rule 9 6 No recruiting here, please.

arctic wedgeBOT Jan 1, 2024, 3:46 PM

#

Rules

6. Do not post unapproved advertising.

9. Do not offer or ask for paid work of any kind.

final kiln Jan 1, 2024, 4:42 PM

#

Even with this setup, Im looking at 200-300 dollars just to reproduce the smallest gpt 2, so I think I'm gonna stop at the Shakespeare dataset

wide tendon Jan 1, 2024, 5:01 PM

#

What algorithm would i want to use for beating a platformer DQN DDPG or PPO

potent pollen Jan 1, 2024, 8:08 PM

#

worldly dawn oh so no selection or combination

I have 20% of parents (so ai of past generation), 60% of mutations and then some random ones

worldly dawn Jan 1, 2024, 10:04 PM

#

potent pollen I have 20% of parents (so ai of past generation), 60% of mutations and then some...

yeah, in this configuration it feels more like random exploration than an EA.
I would suggest to stick with more classical approaches and to use a library. See https://www.geeksforgeeks.org/genetic-algorithms/

For instance, the top performers from the previous generations are fine being only 2-5% of the new generations. and 60% mutations is pretty high.

Note also that the operators do matter. How do you select individuals, through tournaments, roulette wheel, other?

worldly dawn Jan 1, 2024, 10:10 PM

#

potent pollen I have 20% of parents (so ai of past generation), 60% of mutations and then some...

Here is an example of parameters from a random paper on using EA to control cars:

potent pollen Jan 1, 2024, 10:19 PM

#

worldly dawn yeah, in this configuration it feels more like random exploration than an EA. I ...

You guys are so handsome TYSM
I also do think I need to use a library but it would be a shame I have created my own for nothing
For the selection of individuals, I let them play a party and rate their game based on how much line they made, how much block they have displayed, the bumpiness of their grid, the number of holes, the mean height of their columns...
I will try modifying the criteria of selection, you're right, but if that was the problem, I guess they would have evolved a bit, at least they would have been slightly better, which is not the case

worldly dawn Jan 1, 2024, 10:21 PM

#

For the selection of individuals, I let them play a party and rate their game based on how much line they made, how much block they have displayed, the bumpiness of their grid, the number of holes, the mean height of their columns...

This is describing your fitness function, not the selection process.
Let's say you have 100 individuals, each with their fitness score as established by your fitness function.
Let's say you need to select 10 individuals out of this population, what is the process?

potent pollen Jan 1, 2024, 10:25 PM

#

I take the 10 individuals with the best fitness

worldly dawn Jan 1, 2024, 10:25 PM

#

I am asking this because it is important to apply pressure to the evolution.
For instance, if you select the top 10, it's a start, but it also means that they are considered the same regardless of their success.
In natural selection / evolution, the most successful entity is the one that tends to reproduce the most since it's the most fit for the environment. And as such, it would make sense to select multiple times the top individual if its fitness is multiple times better than the rest

#

That's why I would suggest to look into the "roulette wheel" selection operator or the "tournament" one

potent pollen Jan 1, 2024, 10:26 PM

#

Yeah right, I will go check on that one thanks !

#

But I still think there is another problem to it

#

Maybe it's just me who messed up a thing on my code, though

worldly dawn Jan 1, 2024, 10:28 PM

#

potent pollen Yeah right, I will go check on that one thanks !

Tangential, it looks like your home made library still has some way to go before implementing to basics of GA. Coupled with a tough RL type problem you are trying to solve, that means mixing two different and very complex problems.

I would suggest to pause your problem for now and focus on something simpler until you get your GA library working with the basics of GA

#

Once you got your GA library solved, it will be easier to focus on your game ai problem

potent pollen Jan 1, 2024, 10:29 PM

#

You are so impressive

worldly dawn Jan 1, 2024, 10:29 PM

#

lol no

potent pollen Jan 1, 2024, 10:30 PM

#

Thank you for your tips, I will modify this, can I call you afterwards if it still doesn't work (or if it does to thank you)?

worldly dawn Jan 1, 2024, 10:30 PM

#

np, have fun!

crisp salmon Jan 1, 2024, 11:09 PM

#

Ive uninstalled and reinstalled h5py multiple times now. Im trying to use tf 2.9 with nvidia cuda and cudnn so that I can utilize my gpu. Initially I did this project with my mac book and on its cpu. but after the first epoch I keep getting this error. I dont know what to do anymore and im beginning to get pretty frustrated

#

I think it has something to do with the version compatibilities but ive found nothing that says which version of h5py i need to use with tf 2.9 or anythintg of that matter

#

Im contemplating redoing this entire project in pytorch at this point

paper aurora Jan 2, 2024, 3:46 AM

#

left tartan !rule 9 6 No recruiting here, please.

My bad!!

agile owl Jan 2, 2024, 8:51 AM

#

    def reset(self, seed = 42):
        self.period = np.random.randint(0, len(self.periods)-1) if len(self.periods) > 1 else 0

    def reset(self, seed = 42):
        self.period = self.period + 1 if self.period < len(self.periods) - 1 else 0

What are the pros and cons of each of these approaches to iterating over instances of games/periods in RL training data (picking a random game/period each time you reset the environment vs iterating over them sequentially and looping when you hit the end)

indigo moth Jan 2, 2024, 10:27 AM

#

hi guys, I came across this old exercice I did last year during my studies, but couldn't understand a part of the code:

a = np.random.randint(100, size=10)
print(a)
a = [*a]
a[a.index(max(a))] = None
print(a)

anyone can tell me what I meant with a = [*a]? it doesn't seem to do anything I can't remember what I did here

P.S: another example of why one should always comment code lol

mild dirge Jan 2, 2024, 10:50 AM

#

indigo moth hi guys, I came across this old exercice I did last year during my studies, but ...

It makes a list of the elements of a

#

* in that context means unpacking of an iterable (in this case a numpy array)

#

If the numpy array a has the elements [1 2 3 4] and you write [*a] that is the same as writing [1, 2, 3, 4]

#

Better to write list(a), a lot more understandable

#

@indigo moth

final kiln Jan 2, 2024, 11:14 AM

#

today is the day I teach a machine to write poetry

brittle storm Jan 2, 2024, 12:24 PM

#

how to make a command that would work like this:

if subject_code == "0417" and ranses == "w" and paper_number == "2" or "3" and ranyear == "2019":```


i want it to proceed only if all the conditions are accepted

#

elif subject_code == "0417" and ranses == "w" and paper_number == "2" or "3" and ranyear == "2019":
sesh = "November"
qpv = f"{paper_number}"
msv = f"{paper_number}"
qp = f"https://edupapers.store/wp-content/uploads/simple-file-list/CIE/{programme}/{subject_name}-{subject_code}/{ranyear}/{sesh}/{subject_code}_{ranses}{ranyear[2:5]}_qp_{qpv}.pdf"
ms = f"https://edupapers.store/wp-content/uploads/simple-file-list/CIE/{programme}/{subject_name}-{subject_code}/{ranyear}/{sesh}/{subject_code}_{ranses}{ranyear[2:5]}_ms_{msv}.pdf"
print(qp)
print(ms)```

mild dirge Jan 2, 2024, 12:34 PM

#

The only thing you need to change is the or

#

!or-gotcha

arctic wedgeBOT Jan 2, 2024, 12:34 PM

#

The or-gotcha

When checking if something is equal to one thing or another, you might think that this is possible:

# Incorrect...
if favorite_fruit == 'grapefruit' or 'lemon':
    print("That's a weird favorite fruit to have.")

While this makes sense in English, it may not behave the way you would expect. In Python, you should have complete instructions on both sides of the logical operator.

So, if you want to check if something is equal to one thing or another, there are two common ways:

# Like this...
if favorite_fruit == 'grapefruit' or favorite_fruit == 'lemon':
    print("That's a weird favorite fruit to have.")

# ...or like this.
if favorite_fruit in ('grapefruit', 'lemon'):
    print("That's a weird favorite fruit to have.")

mild dirge Jan 2, 2024, 12:34 PM

#

@brittle storm

brittle storm Jan 2, 2024, 12:41 PM

#

mild dirge <@838682557976936509>

but now.. i asked to print the a random paper that has these:
subject_code == "0417" and ranses == "w" and paper_number == "2" or paper_number == "3" and ranyear == "2019":

it prints this now.. https://edupapers.store/wp-content/uploads/simple-file-list/CIE/IGCSE/Information-and-Communication-Technology-0417/2022/November/0417_**s**22_ms_2.pdf.

which is invalid cuz s is wrong

#

it is supposed to get on w

mild dirge Jan 2, 2024, 12:42 PM

#

I'm not sure about the priority of the and and or

#

Try adding brackets

#

subject_code == "0417" and ranses == "w" and (paper_number == "2" or paper_number == "3") and ranyear == "2019":

brittle storm Jan 2, 2024, 12:48 PM

#

mild dirge `subject_code == "0417" and ranses == "w" and (paper_number == "2" or paper_numb...

i did that and now it doesn't do anything

#

it printed once just now

#

and i ran the command again

#

it didn't print

#

@mild dirge

mild dirge Jan 2, 2024, 1:12 PM

#

brittle storm <@309775277720993792>

I'm not sure, the syntax corresponds to the logic I think you want to implement. I don't know where the problem lies.

neon field Jan 2, 2024, 2:21 PM

#

Anyone just give me a whole machine learning project for social good already please!!!
I don't wanna do this 😭

odd meteor Jan 2, 2024, 2:23 PM

#

neon field Anyone just give me a whole machine learning project for social good already ple...

Have you checked online for possible suggestions? If yes, what did you find?

agile owl Jan 2, 2024, 3:04 PM

#

#

how many more timesteps would you run this before you consider it converged

#

(this is PPO btw)

final kiln Jan 2, 2024, 3:09 PM

#

that looks like a fractal time series

agile owl Jan 2, 2024, 3:12 PM

#

well it's convolved with a window of 50 so it could look even more like a fractal if I didn't smooth it but that's how it looks when it converges slowly I think

final kiln Jan 2, 2024, 3:13 PM

#

uhm, the overall trend seems to be a linear function, so no convergence to a given value

#

tho I dont really know the context here

agile owl Jan 2, 2024, 3:16 PM

#

It looks like diminishing returns have set in to me

#

but hard to tell where that last oscillation will end up

#

I was trying to avoid running it for 2e6 because I will probably iterate on it again anyway

#

but I guess that's how I will tell for sure

final kiln Jan 2, 2024, 3:18 PM

#

not 100% sure what you're doing, but I'd be tempted to run several experiments witth diff seeds and place the graphs on top of each other

agile owl Jan 2, 2024, 3:23 PM

#

that's a good idea

#

this is what I'm doing, big business

versed gulch Jan 2, 2024, 3:44 PM

#

I have a text file which looks like this:
number_of_nodes: 9
842 2578 0
842 2578 1
842 2578 2
842 2578 3
842 2578 4
842 2578 5
843 2579 6
843 2579 7
843 2579 8
number_of_nodes: 4
926 2206 0
927 2205 0
927 2204 0
927 2203 0

The lines number_of_nodes: 4 represent how many coordinates there are below, what I want to do is read this file in and have list of nested lists containing these coordinates, i.e the [[...], [(926, 2206, 0 ], ... (927, 2203, 0) ].
Any help will be much appreciated

agile owl Jan 2, 2024, 4:05 PM

#

#

this is what it looked like with another seed, much more convincing diminishing returns

#

although the policies it came up with are quite different

final kiln Jan 2, 2024, 4:07 PM

#

uhm, looks like you have a random walker

#

one way to look at this

#

look at the y axis, and see the difference in y from each step i to the next step i + 1

#

if you histogram it, you will likely see the gaussian distribution

agile owl Jan 2, 2024, 4:10 PM

#

reinforcement learning is supposed to do that

#

to some extent

#

it's exploring

#

it doesn't monotonically increase

#

the trend is definitely higher up to a point though

#

importantly it gets into positive rewards territory

#

so you could say it "passes" the problem

#

it could be arbitrarily better though

#

I'm going to do some feature engineering next

#

especially since this is an on-policy learner

#

it only can learn from what it does

#

other methods try to figure out what the best action is instead of trying to make the one it's doing the best one possible if that makes any sense so they have nicer learning curves most of the time but they often fail to actually solve the problem

final kiln Jan 2, 2024, 4:15 PM

#

again I don't really know the context, just gathering from the graph alone haha

#

shakespeare is coming along

boreal gale Jan 2, 2024, 4:16 PM

#

agile owl

i have not followed what you are doing so far. but

what is reward here?
isn't it bad that it's an on-policy learner? the reward still flip flops and some time goes into <0 reward region, isn't that bad for when you are actually trading?
will you be validating your agent against a dataset that it hasn't explicitly learnt from? i would be super anxious about overfitting here

agile owl Jan 2, 2024, 4:17 PM

#

boreal gale i have not followed what you are doing so far. but 1. what is reward here? 2. i...

You only use the saved model from the best reward timestep

#

reward is change in portfolio value / value at risk

#

yes I am doing the out-of-sample evaluation but I am only looking at one year period out of sample whereas I'd like to have more to have a "distribution" of out-of-sample performance

#

importantly the on-policy learner is the only one that actually gets to positive rewards

#

the off-policy methods overleverage themselves

#

and blow up

#

namely PPO vs SAC

#

I'm looking at PPO and PPO with an LSTM architecture in the policy network vs SAC

#

and SAC gives a lot nicer training chart but i can't get it to stop overleveraging itself even if I try to penalize it in the reward

#

the on-policy results are much better out-of-sample too

boreal gale Jan 2, 2024, 4:22 PM

#

that's cool 👍
reinforcement learning is not my strong suit.

what asset class are you trading if you don't mind me asking? hopefully you are getting some nice out of sample sharpe/calmar ratio already 😄

agile owl Jan 2, 2024, 4:23 PM

#

this is actually doing the thing that everyone says is a fool's errand

#

I figured if I can trade 10y duration then I can find a model that can trade almost anything

#

I've got 1.5 sharpe ratio on a combination of 10y, 2s10s curve and usdjpy models

#

the correlations were low and the correlation between 10y and 2s10s was even negative

#

I figure if I do some feature engineering I can get the sharpes convincingly above 2

boreal gale Jan 2, 2024, 4:25 PM

#

that would be sweet 😉 good luck!

agile owl Jan 2, 2024, 4:26 PM

#

right now I have a pretty simple model where it's a natural gradient boosting fair value model based on macro variables, a garch-like variance estimate using Light GBM, and some indicators for economic events like NFP and CPI days

#

I figure if I add some more of the inputs I have on a daily frequency directly to the agent's observations that alone might improve performance

#

right now I have a lot of inputs that just go into the fair value model without being exposed directly to the RL agent

#

what I'd like and don't have is historical consensus estimates so it could try to learn a response to actual economic releases in a meaningful way

#

there's like one company that sells it and it's a ridiculous amount of money

boreal gale Jan 2, 2024, 4:30 PM

#

what kind of estimates?

#

like CPI? estimated by a panel of analysts?

agile owl Jan 2, 2024, 4:32 PM

#

yeah

#

banks put out estimates for those things like earnings

#

and they get compiled into a consensus number

#

so for each release of significance there is a number to compare it to

#

and say "is this higher or lower than expected"

#

that delta is what drives the market reaction not the value itself

#

so if CPI goes down and everyone expected it to go down your model shouldn't react to that like CPI is low

boreal gale Jan 2, 2024, 4:33 PM

#

yeah, the expected value is all baked into the price already

agile owl Jan 2, 2024, 4:33 PM

#

it needs to know what the expectation was before it makes a judgment

final kiln Jan 2, 2024, 4:33 PM

#

couldn't you train a giant language model to do this stuff, feed it a bunch of documents and google searches and have it come up with predictions

#

have it decide what is important

agile owl Jan 2, 2024, 4:34 PM

#

I've thought about that but the market isn't democratic

final kiln Jan 2, 2024, 4:34 PM

#

well

agile owl Jan 2, 2024, 4:34 PM

#

you could maybe get economic sentiment in general from that

#

but I'm sure it's noisy as hell

final kiln Jan 2, 2024, 4:34 PM

#

from what I know it does require domain knowledge to be predicted

#

like, people who dont know about it are discouraged to participate in the first place

#

thus those who stay know their stuff

#

https://en.wikipedia.org/wiki/Prediction_market

Prediction market

Prediction markets, also known as betting markets, information markets, decision markets, idea futures or event derivatives, are open markets that enable the prediction of specific outcomes using financial incentives. They are exchange-traded markets established for trading bets in the outcome of various events. The market prices can indicate wh...

agile owl Jan 2, 2024, 4:37 PM

#

You could maybe in theory extract historically significant economic relases from news stories

#

from places like Bloomberg and WSJ

final kiln Jan 2, 2024, 4:38 PM

#

I say feed it the entire internet and have it converge to a decision, or at least, a slice of the internet, and during training you give it the internet

agile owl Jan 2, 2024, 4:38 PM

#

I would actually like to set up a sentiment model for different assets using bloomberg news stories

#

the reaction of the internet and the reaction of the market are not similar enough to just use the entire internet

#

financial media is the way to go

final kiln Jan 2, 2024, 4:39 PM

#

right, the model would filter out what's not relevant

#

like, training a language model is a compression procedure

agile owl Jan 2, 2024, 4:39 PM

#

when people talk about economic data in general it's much more likely to be about politics than actual economics for instance

#

since economic data gets politicized

final kiln Jan 2, 2024, 4:41 PM

#

yeah, I think economics as a field seems to interplay a lot with politics, but idk much about either of those topics

#

still goin

agile owl Jan 2, 2024, 4:48 PM

#

nice

#

so what does it do again?

final kiln Jan 2, 2024, 4:59 PM

#

well

#

at the moment it spits out non-sense coolcry

#

To be or not to be, that is'###--/00.033300--///>3<:!3030..<=/::.++0O3..3.:L0

#

the part that is not nonsense is the prompt

#

im gonna let it run for longer ig

agile owl Jan 2, 2024, 5:02 PM

#

what is it meant to do, generate from shakespeare?

final kiln Jan 2, 2024, 5:02 PM

#

yes

#

should write poetry

#

YES

#

I was just sampling it incorrectly

#

To be or not to be chopped,
He cannot tell thee to come to have, and we will play him?
If everlastingly king to deny straight: therefore, methinks you,
Must myself against you hear his presence.

LADY CAPULE

#

it's beautiful 🥲

#

im gonna have endless fun with this thing

hollow sentinel Jan 2, 2024, 5:24 PM

#

i'm a bit confused

#

type                                    object
unit                                    object
creationdate                            object
startdate                               object
enddate                                 object
value                                  float64
HKMetadataKeyHeartRateMotionContext    float64
HKMetadataKeySyncVersion               float64
HKMetadataKeySyncIdentifier             object

#

most of these columns are objects

#

i'd like to convert creationdate, startdate, enddate to datetime

agile owl Jan 2, 2024, 5:26 PM

#

pd.to_datetime

hollow sentinel Jan 2, 2024, 5:26 PM

#

heart_df["creationdate"] = pd.to_datetime(heart_df["creationdate"])

agile owl Jan 2, 2024, 5:27 PM

#

is it telling you it's an invalid format or something?

hollow sentinel Jan 2, 2024, 5:27 PM

#

i tried this and tried testing again to see if it would change to datetime

#

UserWarning: Could not infer format, so each element will be parsed individually, falling back to dateutil. To ensure parsing is consistent and as-expected, please specify a format.

agile owl Jan 2, 2024, 5:27 PM

#

well, is there a consistent format?

hollow sentinel Jan 2, 2024, 5:27 PM

#

5/27/2022 8:02:00 AM

agile owl Jan 2, 2024, 5:27 PM

#

did you check and try specifying it?

#

is it consistent though?

#

sounds like it might not be from that warning

hollow sentinel Jan 2, 2024, 5:28 PM

#

is there any way to check if it's consistent besides scrolling through the entire file?

agile owl Jan 2, 2024, 5:29 PM

#

you can do some string processing

#

how big is the file

hollow sentinel Jan 2, 2024, 5:29 PM

#

31.4 mb

agile owl Jan 2, 2024, 5:29 PM

#

sounds like you would probably need to come up with a way to figure out using python string processing methods

#

maybe regex

#

or

#

you can just try to specify that format

#

and see if it tells you there's an invalid value given that format

#

that might give you some hints

hollow sentinel Jan 2, 2024, 5:35 PM

#

agile owl sounds like you would probably need to come up with a way to figure out using py...

any way i can call datetime.strptime on a pandas column?

#

no, incompatible types.

#

i can try using a .apply function

agile owl Jan 2, 2024, 5:47 PM

#

there might be a way with the dt accessor

#

I mean that's what to_datetime is supposed to be doing

boreal gale Jan 2, 2024, 5:50 PM

#

!d pandas.to_datetime

arctic wedgeBOT Jan 2, 2024, 5:50 PM

#

pandas.to\_datetime

pandas.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=False, format=None, exact=_NoDefault.no_default, unit=None, ...)```
Convert argument to datetime.

This function converts a scalar, array-like, [`Series`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series) or [`DataFrame`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame)/dict-like to a pandas datetime object.

boreal gale Jan 2, 2024, 5:50 PM

#

use the format argument here just like how you would in datetime.strptime

final kiln Jan 2, 2024, 7:29 PM

#

letting it go for 500 epochs

agile owl Jan 2, 2024, 7:30 PM

#

I figured out why the off-policy learners were sucking

final kiln Jan 2, 2024, 7:30 PM

#

there's a lot of things im not doing there that could improve this

agile owl Jan 2, 2024, 7:31 PM

#

I forgot to tell the agent what its var was so even though it was using it in the reward function it had no way of directly observing it or using it to constrain its behavior

#

that's why they kept overleveraging

#

I'm amazed PPO worked anyway

#

engineered some new features and our rewards are higher than ever

final kiln Jan 2, 2024, 7:41 PM

#

final kiln letting it go for 500 epochs

now that I have a first model that writes poems, I'm gonna setup a more serious infra to experiment with this, I'm mostly gonna copy what I did until now for the gpt array sorter, might need to adapt it a bit for gpu. After that it's time to explore my idea of using metric tensors for performing self attention and compare the results. I mean, tbh I kinda wanna do that now, I'm gonna run a quick loop to see what happen

#

it was following it a little bit too closely, so I restarted the run, got a print in there just to be certain the mod is there

#

but it seems that it was right

#

kinda weird that there's no difference, I expect that at least it takes less time training

#

#

for my last unsuccessful prediction, the newer version will start plateauing at a higher value

#

nvm I was right, mine is gonna end up being a bit faster

#

if this stays like this I'm gonna start getting excited. But in all likelihood it's gonna converge to a higher loss value because there's less parameters around and the values they can take are constrained

agile owl Jan 2, 2024, 8:12 PM

#

much better

#

I like reinforcement learning because it's optimistic

#

it wants rewards

final kiln Jan 2, 2024, 8:15 PM

#

That does look better, but you should probably use a fixed y-axis size

final kiln Jan 2, 2024, 8:16 PM

#

agile owl

This one doesn't reach 10

#

So it is better

#

I'm assuming higher = better

agile owl Jan 2, 2024, 8:16 PM

#

I am only interested in the shape really

#

of course the value to

#

but this chart is for the shape

#

if I were to plot th em all to compare then it would be normalized obviously

final kiln Jan 2, 2024, 8:17 PM

#

Yeah the other one looked like a random walker on the y axis

agile owl Jan 2, 2024, 8:17 PM

#

it is supposed to random walk a bit

#

but I wasn't giving it enough information to reliably solve the problem that was my bad

#

I also switched to an off-policy method

final kiln Jan 2, 2024, 8:24 PM

#

agile owl it is supposed to random walk a bit

Right, but I'm guessing that a pure random walker would mean something is wrong

#

I remember studying this class of graphs in college. Even made a bunch of sims that generate them

agile owl Jan 2, 2024, 8:27 PM

#

it wasn't a pure random walker

#

it had a definite up trend in the beginning

#

then it random walked around like 1

#

but it became definitely positive from negative

#

so there was something, and it was only with ppo because ppo is risk averse

#

compared to the off-policy learners

final kiln Jan 2, 2024, 8:29 PM

#

The only way to confirm would've been to run several experiments. A random walker can produce any of the observed patterns via pure chance.

final kiln Jan 2, 2024, 8:29 PM

#

agile owl much better

This one is much more clear

agile owl Jan 2, 2024, 8:29 PM

#

I ran it again with a differesnt seed and got the same pattern

final kiln Jan 2, 2024, 8:29 PM

#

The other ones looked random

agile owl Jan 2, 2024, 8:29 PM

#

nah that was just exploration

#

I'm sure if you looked at the stats that the moving average change was significant

#

difference is due to a couple factors including changing the model

final kiln Jan 2, 2024, 8:31 PM

#

agile owl

This is super random walker behaviour imo.

agile owl Jan 2, 2024, 8:32 PM

#

#

the moving average definitely changes from negative to positive

#

and then it flatlines

final kiln Jan 2, 2024, 8:32 PM

#

agile owl Jan 2, 2024, 8:33 PM

#

I ran it again with another seed and found a similar shape

#

it was two things: this one is actually truncated in the x-axis

#

and the model is different

#

and I didn't give it as much information

#

I mean the y -axis not x-axis sorry

#

it didn't start out as negative

#

that is just chance

final kiln Jan 2, 2024, 8:34 PM

#

I'd fit it to gaussian + constant value

#

No

agile owl Jan 2, 2024, 8:34 PM

#

the optimal value won't change

#

but the starting point did

final kiln Jan 2, 2024, 8:34 PM

#

Uhm, + x

#

c

#

Giving it a bias to go upwards

agile owl Jan 2, 2024, 8:36 PM

#

well, it performed out-of-sample

#

just not as well

#

as the new one

#

reinforcement learning is different from supervised learning

#

there's a bit of "random walking" by design

#

it's changing the strategy to try to find new things to do to ultimately get to a higher end point

#

the on-policy ones tend to do that more aggressively

#

because they have limited information compared to off-policy

#

so they need to move around more to find a better policy

#

the other chart I showed more recently where it looked more like an inverse of your loss chart was from an off-learning policy

#

off-policy learner*

final kiln Jan 2, 2024, 8:40 PM

#

That sounds ineficient tho

#

Trying out random stuff til it gets it right

agile owl Jan 2, 2024, 8:42 PM

#

in a lot of cases they give more robust solutions

#

they also have better convergence criteria

#

even though you said it looks like a random walk

#

the other ones just became more negative

final kiln Jan 2, 2024, 8:44 PM

#

It is one, you can even check that it's statistical properties are invariant under scale changes

#

Making it a fractal

agile owl Jan 2, 2024, 8:44 PM

#

it goes through cycles of exploration and exploitation

#

but if you insist sure it was a pure random walk

#

I'm just glad I got the off-policy learner working

pearl barn Jan 2, 2024, 8:48 PM

#

guys is maven analytics course for excel is good ??

odd meteor Jan 2, 2024, 10:17 PM

#

pearl barn guys is maven analytics course for excel is good ??

I thought you were using Jovian to learn Python or so... You ditched it for excel now? How was your experience with Jovian?

agile owl Jan 3, 2024, 12:05 AM

#

so here's another example

#

this is SAC (off-policy)

#

This is PPO (on-policy)

#

#

you might look at these and say wow PPO sucks in comparison but in reality it's not that clear on out-of-sample data

#

this is what they did respectively out of sample

#

the one on the right is a lot more stable and predictable in behavior

#

might want to run both and put a bigger weight on the less risky one

agile owl Jan 3, 2024, 12:25 AM

#

I'm going to try to tune it to take less risk but the on-policy one takes less risks by default

agile owl Jan 3, 2024, 4:22 AM

#

we've broken 2 sharpe it's time to hook it up to paper trading and demo it

keen delta Jan 3, 2024, 7:00 AM

#

hey guys has anyone worked with llms here? i kinda need a small help

lapis sequoia Jan 3, 2024, 8:58 AM

#

Hi, I'm excited to join here.
I'm a professional machine learning developer and have 8+ years of experience of developing data science, image processing, optimization projects.
Image preprocessing, deep learning, time series processing, dimension reduction and optimization are my major.
Nowadays I'm looking for employment opportunities and willing to do full-time/part-time.
Please DM me if any Employer is interested about me. thanks...

final kiln Jan 3, 2024, 9:05 AM

#

lapis sequoia Hi, I'm excited to join here. I'm a professional machine learning developer and...

read the rules

whole egret Jan 3, 2024, 9:08 AM

#

keen delta hey guys has anyone worked with llms here? i kinda need a small help

what are you trying to do?

lapis sequoia Jan 3, 2024, 9:24 AM

#

i wanna solve image processing or time series tasks...

final kiln Jan 3, 2024, 10:02 AM

#

toda

#

my keyboard is\ going cra rn, the caps\ l.ock is\ del.eteing chars\

#

It's restarting, I think I'm gonna have to check for malware.

#

Anyway, today I'm gonna setup the rest of the infra for training GPT on a GPU machine without supervision and with fault tolerance.

#

It's mostly copy paste from the previous workflow + some corrections on a couple things I missed in the model code

potent pollen Jan 3, 2024, 10:44 AM

#

worldly dawn np, have fun!

I've implemented a GA, with elitism, crossover and mutation of parents, and some random agents. I've made a generation of 400 agents, with 8 elitists, 146 parents, 146 children and 100 random. Mutation probability is at 12.5% because I have my data on 8 bits. After ~60 generations, I could still not witness any upgrades.
Though, learning about GA and programming it was a fun task so thank you! But I still think the problem is something more simple...

spark nimbus Jan 3, 2024, 10:52 AM

#

I'm noticing pycharm has intelligent schema autocomplete with Spark with read statements, is there a way to inform it about schemas of dataframes passed to functions?

final kiln Jan 3, 2024, 12:35 PM

#

the first self attention module of the metric tensor network

final kiln Jan 3, 2024, 1:00 PM

#

#

senshrug

pearl barn Jan 3, 2024, 1:31 PM

#

How can get Maven Analytics courses for free or by someone who can share thier Udemy username and password

long locust Jan 3, 2024, 1:54 PM

#

pearl barn How can get Maven Analytics courses for free or by someone who can share thier U...

Do not request other people's credentials, that is not appropriate

pearl barn Jan 3, 2024, 1:58 PM

#

Learning with high quality should be for everyone

spark nimbus Jan 3, 2024, 1:58 PM

#

and teachers deserve to get paid

long locust Jan 3, 2024, 1:58 PM

#

Asking for someone's login credentials is not appropriate, not sure how that is related

tidal bough Jan 3, 2024, 2:05 PM

#

"maven" sure is a confusing name for an analytics platform, I was like "wtf, an entire course on, what, doing analytics on Maven downloads?"

stark bay Jan 3, 2024, 3:34 PM

#

What is the best way to smoothen out or make good predictive accuracy score of linear regression model.... how to eliminate fluctuations/noise in a data that has a lot of hips and hops due to being in micro scale

simple plinth Jan 3, 2024, 3:35 PM

#

from where should i start machine learning? i know python basics, im c++ programmer and im good at maths and logic building

stark bay Jan 3, 2024, 3:36 PM

#

simple plinth from where should i start machine learning? i know python basics, im c++ program...

Datacamp and kaggle have a lot of courses for such... thats a good point

simple plinth Jan 3, 2024, 3:38 PM

#

stark bay Datacamp and kaggle have a lot of courses for such... thats a good point

can you please share the links?

elfin sinew Jan 3, 2024, 3:38 PM

#

Hello

#

Can anyone suggest me any YouTube channel for python ?

stark bay Jan 3, 2024, 3:39 PM

#

simple plinth can you please share the links?

Link

Google

Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for.

elfin sinew Jan 3, 2024, 3:40 PM

#

@stark bay ?

stark bay Jan 3, 2024, 3:40 PM

#

elfin sinew <@637598340154130432> ?

Yes?

elfin sinew Jan 3, 2024, 3:41 PM

#

Plzz can you suggest me any YouTube channel for python ?

stark bay Jan 3, 2024, 3:41 PM

#

I dont use yt

elfin sinew Jan 3, 2024, 3:41 PM

#

Did you learn py or advanced py ?

#

I really need some suggestions

stark bay Jan 3, 2024, 3:43 PM

#

elfin sinew I really need some suggestions

I think u will get more relavant info here since i am not the best person to ask for that

#

What is the best way to smoothen out or make good predictive accuracy score of linear regression model.... how to eliminate fluctuations/noise in a data that has a lot of hips and hops due to being in micro scale

simple plinth Jan 3, 2024, 3:45 PM

#

stark bay [Link](https://www.google.com)

it is redirecting me to google.com

elfin sinew Jan 3, 2024, 3:45 PM

#

stark bay What is the best way to smoothen out or make good predictive accuracy score of l...

Do you work something ?

#

I can help you if you want

#

BeCause i want to learn something

#

I just want any experience how to work or what to work

#

Sorry if i am disturbing you

junior spruce Jan 3, 2024, 3:58 PM

#

Hello, I am a high school student interested in artificial intelligence. I know some Python. What can I do at this stage without entering the field? I mean I need a strong foundation to enter it (skills). What are these skills and what is the foundation that I need?

simple plinth Jan 3, 2024, 3:59 PM

#

junior spruce Hello, I am a high school student interested in artificial intelligence. I know ...

wanna start together?

junior spruce Jan 3, 2024, 4:00 PM

#

simple plinth wanna start together?

Yes

serene scaffold Jan 3, 2024, 4:00 PM

#

junior spruce Hello, I am a high school student interested in artificial intelligence. I know ...

Take the most advanced math courses that you can and get into a CS university program with an AI concentration

#

And do well in school in general. But in STEM especially

junior spruce Jan 3, 2024, 4:02 PM

#

serene scaffold Take the most advanced math courses that you can and get into a CS university pr...

Well, sir, I will definitely take the mathematics programs for those in my country who never accept high school students in universities

simple plinth Jan 3, 2024, 4:03 PM

#

junior spruce Well, sir, I will definitely take the mathematics programs for those in my count...

are u indian?

junior spruce Jan 3, 2024, 4:04 PM

#

simple plinth are u indian?

No iam algerian maybe you heard about it or maybe you didn't

simple plinth Jan 3, 2024, 4:05 PM

#

junior spruce No iam algerian maybe you heard about it or maybe you didn't

IK algeria very well, its a north african country more closely related with arabs

junior spruce Jan 3, 2024, 4:06 PM

#

simple plinth IK algeria very well, its a north african country more closely related with arab...

Haha, you seem to know it, but there are many of its residents who are not Arabs, including me

simple plinth Jan 3, 2024, 4:07 PM

#

junior spruce Haha, you seem to know it, but there are many of its residents who are not Arabs...

cooool so you are african?

mossy sable Jan 3, 2024, 4:08 PM

#

anyone know the best way to locally host an llm cuz i dont have a openai llama thingy

odd meteor Jan 3, 2024, 4:09 PM

#

junior spruce Hello, I am a high school student interested in artificial intelligence. I know ...

Get a major in CS or Statistics while you build on your python skill gradually.

junior spruce Jan 3, 2024, 4:09 PM

#

simple plinth cooool so you are african?

You can say it

agile owl Jan 3, 2024, 4:09 PM

#

trading USDJPY

mossy sable Jan 3, 2024, 4:10 PM

#

agile owl trading USDJPY

live trading bot?

agile owl Jan 3, 2024, 4:10 PM

#

junior spruce Jan 3, 2024, 4:10 PM

#

odd meteor Get a major in CS or Statistics while you build on your python skill gradually.

Ok sir thank u so much for this

agile owl Jan 3, 2024, 4:10 PM

#

it's in development

#

I'm hooking it up to start paper trading now

mossy sable Jan 3, 2024, 4:10 PM

#

agile owl it's in development

what training api u using for trading playform or r u

#

im assuming ur not running it on a live account

agile owl Jan 3, 2024, 4:11 PM

#

what do you mean training api

#

it's a stochastic actor-critic model from sb3

mossy sable Jan 3, 2024, 4:11 PM

#

to actually make the trades are u using a sim api

agile owl Jan 3, 2024, 4:11 PM

#

oh ibkr

mossy sable Jan 3, 2024, 4:11 PM

#

that has actual prices but not actual moneyt

agile owl Jan 3, 2024, 4:12 PM

#

they have paper trading

mossy sable Jan 3, 2024, 4:12 PM

#

k

odd meteor Jan 3, 2024, 4:15 PM

#

mossy sable anyone know the best way to locally host an llm cuz i dont have a openai llama t...

Is local your last resolve? Why not try some cloud options? Some of them even have free tier plan.

Heroku, Streamlit Cloud, Cerebruim, etc.

Check out https://www.cerebrium.ai/

Cerebrium - Serverless GPU infrastructure for Machine learning

A platform that makes it easy to build and deploy machine learning models scalably and performantly. We run GPUs serverlessly so you only pay for the compute that you use. Bring your Python code and we take care of all the infrastructure. Typically customers experience a 40%+ cost saving as opposed to AWS of GCP.

odd meteor Jan 3, 2024, 4:25 PM

#

junior spruce Ok sir thank u so much for this

You're welcome sir ✌️. If you'd have gotten to >= 18 years by 3rd quarter of 2024, I'll recommend you consider applying to attend Deep Learning Indaba 2024.

All the best in your endeavour

stark bay Jan 3, 2024, 4:35 PM

#

What is the best way to smoothen out or make good predictive accuracy score of linear regression model.... how to eliminate fluctuations/noise in a data that has a lot of hips and hops due to being in micro scale

agile owl Jan 3, 2024, 4:37 PM

#

you can try rescaling

#

using log scale or something like that

#

note that you will have to exponentiate the prediction from the log-scale model to get the prediction in normal terms

hollow magnet Jan 3, 2024, 5:42 PM

#

Hi ! Does anyone has any tips about good books to learn and deepen the knowledge in data science

long locust Jan 3, 2024, 5:44 PM

#

hollow magnet Hi ! Does anyone has any tips about good books to learn and deepen the knowledge...

there should be a few in the pinned messages in this channel

hollow magnet Jan 3, 2024, 5:45 PM

#

long locust there should be a few in the pinned messages in this channel

didn't know it was a thing, ty

real goblet Jan 3, 2024, 5:49 PM

#

Hey buds. So I have excel spreadsheet named data.xlsx

df = pd.read_excel("data.xlsx")

df.drop(df.columns[df.columns.str.contains('unnamed', case=False)], axis=1, inplace=True)

I try to do this but there is still freaking indexes

#

could someone fix it please

hollow magnet Jan 3, 2024, 6:05 PM

#

@long locust Have you read the books that are recommended in the pin message ?

#

For a litlle question

#

Or did someone ?

long locust Jan 3, 2024, 6:07 PM

#

hollow magnet <@698273827448291379> Have you read the books that are recommended in the pin me...

I have not gone through them, what question do you have?

hollow magnet Jan 3, 2024, 6:08 PM

#

Well if I read 1 or 2 of them in their entirety, I'm good to go right ?

#

Because I read that the best way to learn is to apply the knowledge, but in datascience, I literraly have 0 idea of what exercize to do to apply and learn

odd meteor Jan 3, 2024, 6:14 PM

#

hollow magnet Because I read that the best way to learn is to apply the knowledge, but in data...

Sure. Just focus on one book at a time, If you can successfully finish https://mml-book.github.io/ and https://www.statlearning.com/ you're good to go.

Mathematics for Machine Learning

An Introduction to Statistical Learning

hollow magnet Jan 3, 2024, 6:15 PM

#

odd meteor Sure. Just focus on one book at a time, If you can successfully finish https://...

Thanks !

odd meteor Jan 3, 2024, 6:17 PM

#

The D2L book from Microsoft is also a piece of art. https://d2l.ai/chapter_introduction/index.html

serene scaffold Jan 3, 2024, 6:24 PM

#

real goblet Hey buds. So I have excel spreadsheet named data.xlsx ```python df = pd.read_ex...

df = pd.read_excel("data.xlsx")
df = df.drop(df.columns[df.columns.str.contains('unnamed', case=False)].tolist(), axis=1)

Try this.
Also, just never use in-place

real goblet Jan 3, 2024, 6:25 PM

#

serene scaffold ```py df = pd.read_excel("data.xlsx") df = df.drop(df.columns[df.columns.str.con...

There is still freaking index

#

df = pd.read_excel("data.xlsx")
df = df.drop(df.columns[df.columns.str.contains('unnamed', case=False)].tolist(), axis=1)

df = df[['Game Name', 'Region', 'Group Override']]

        Game Name   Region Group Override

0 Dead Space Ukraine No
1 Rust Russia No
2 Lethal Company Russia Yes
3 Grand Theft Auto V Ukraine No

serene scaffold Jan 3, 2024, 6:26 PM

#

real goblet There is still freaking index

by "freaking index", do you mean the 0, 1, 2, 3 on the left? because there must be an index. you can never stop having it no matter what.

real goblet Jan 3, 2024, 6:27 PM

#

serene scaffold by "freaking index", do you mean the 0, 1, 2, 3 on the left? because there *must...

Yes

#

why cant I have it

serene scaffold Jan 3, 2024, 6:27 PM

#

every row always has an index no matter what. you can print it without showing the index, but it's still there.

#

do print(df[['Game Name', 'Region', 'Group Override']].head().to_string(index=False))

real goblet Jan 3, 2024, 6:28 PM

#

serene scaffold every row always has an index no matter what. you can print it without showing t...

So there is no sense in droping index?

serene scaffold Jan 3, 2024, 6:28 PM

#

real goblet So there is no sense in droping index?

"dropping the index" is impossible.

#

that's like saying "I don't want the columns to be labeled or numbered"

#

how would you get the first column of the dataframe if the column doesn't have a label, or a number?

tidal bough Jan 3, 2024, 6:31 PM

#

perhaps you want to do .set_index("Game Name", drop=True)? having that column be the index would be a reasonable choice.

pearl barn Jan 3, 2024, 6:40 PM

#

What are the best courses on Udemy to learn Python data analysis the basics and fundamentals to NumPy and pandas?

agile owl Jan 3, 2024, 6:40 PM

#

I think what he wants is to set the index to game name

final kiln Jan 3, 2024, 6:40 PM

#

by tomorrow I'll have a full variation on the transformer architecture, how do I go about making a fair comparison between them ?

agile owl Jan 3, 2024, 6:41 PM

#

so it displays without the series index

whole lichen Jan 3, 2024, 6:43 PM

#

who wants to practice matplotlib with me?

worthy jewel Jan 3, 2024, 6:52 PM

#

Hello everyone,
Tyler here I’m a Comp Sci student currently on a data science internship. I also have my own startup McCarthy & Brogan Solutions.. we’ve been setup for around a year now and starting to get invited into various factories around the UK looking at how our services (primarily focused on maintenance & repair in this case) can increase efficiencies using AI amongst other things. We also have a subsidiary SmartFormAI with which we have just built a document automation application utilising LLM, OCR and GAR. I’d love to get to know some of you, please drop me a DM!

serene scaffold Jan 3, 2024, 6:53 PM

#

worthy jewel Hello everyone, Tyler here I’m a Comp Sci student currently on a data science in...

This server is not for advertising or self-promotion

worthy jewel Jan 3, 2024, 7:00 PM

#

It’s an introduction mate chill out

#

Others have done the same?

final kiln Jan 3, 2024, 7:06 PM

#

I believe you are breaking the rules

agile owl Jan 3, 2024, 7:06 PM

#

@serene scaffold can I promote you?

serene scaffold Jan 3, 2024, 7:06 PM

#

agile owl <@253696366952316929> can I promote you?

No

agile owl Jan 3, 2024, 7:07 PM

#

dang

worthy jewel Jan 3, 2024, 7:15 PM

#

Apologies! Must have missed that

final kiln Jan 3, 2024, 7:17 PM

#

    def forward(self, in_sequence_bwc: Tensor) -> Tensor:
        batch, words, coordinates = in_sequence_bwc.size()
        k_dimension = coordinates // self.NUMBER_OF_HEADS
        pre_metric_tensors_nww = self.pre_metric_tensors_nww.masked_fill(self.MASK_ww[:,:,:words,:words] == 0, 0)
        metric_tensors_nww = pre_metric_tensors_nww @ pre_metric_tensors_nww.transpose(-1, -2)  # ensures symmetry and positive definiteness


        all_projections_bwc = self.projections_cc(in_sequence_bwc)
        all_projections_bnwk = all_projections_bwc.view(batch, words, self.NUMBER_OF_HEADS, k_dimension).transpose(1, 2)

        all_dot_products_bnww = all_projections_bnwk.transpose(-1, -2) @ metric_tensors_nww @ all_projections_bnwk
        all_dot_products_bnww = all_dot_products_bnww / math.sqrt(k_dimension)
        all_dot_products_bnww = all_dot_products_bnww.masked_fill(self.MASK_ww[:,:,:words,:words] == 0, 0)

        nudged_vectors_bnwk = all_dot_products_bnww @ all_projections_bnwk
        nudged_vectors_bwnk = nudged_vectors_bnwk.transpose(1, 2).contiguous()
        nudged_vectors_bwc = nudged_vectors_bwnk.view(batch, words, coordinates)

        out_sequence_bwc = self.projection_cc(nudged_vectors_bwc)

        return out_sequence_bwc

this is my proposed self attention mechanism

#

every head has a projection matrix that compresses the embeddings, and a metric tensor that is used to calculate the dot product between all the elements in the sequence

#

and that's pretty much it, each projection gets scaled according to the dot products and then it's concatenated and mixed

agile owl Jan 3, 2024, 8:19 PM

#

hey everyone I'd like to promote the Greenest Admin

#

he's very Green and very much an admin

serene scaffold Jan 3, 2024, 8:49 PM

#

@agile owl don't shitpost in our wonderful data science chat

ruby grail Jan 3, 2024, 10:00 PM

#

Hello. I have a question, I don't exactly understand what this X_embedded mean or how to use it. Can anyone please give a hint?

scikit-learn

1.6. Nearest Neighbors

sklearn.neighbors provides functionality for unsupervised and supervised neighbors-based learning methods. Unsupervised nearest neighbors is the foundation of many other learning methods, notably m...

worldly dawn Jan 3, 2024, 10:12 PM

#

potent pollen I've implemented a GA, with elitism, crossover and mutation of parents, and some...

Mixing elitists and parents does not look normal.
There are also too many pieces. I would still suggest to either use an off the shelf library or to validate your library on something simpler.

shut slate Jan 4, 2024, 4:42 AM

#

Can someone please explain vectorization in Pandas and why do we not have to do a for loop? And how would I know which methods can be used for vectorization?

agile owl Jan 4, 2024, 4:55 AM

#

it's setting up calculations to be applied across an axis in parallel instead of sequentially that's why you don't need a loop

#

in general if something is autoregressive or recursive it can't be vectorized

#

anything inherently sequential can't be vectorized

#

or at least not very easily afaik

heady sierra Jan 4, 2024, 6:16 AM

#

Hello. I am working on a reinforcement learning model using a gym-anytrading environment. I am having this bug but I couldn't find a solution anywhere. Can someone help?

agile owl Jan 4, 2024, 7:18 AM

#

the error seems pretty straightforward to me

#

you need to have a certain observation shape

pastel lake Jan 4, 2024, 7:18 AM

#

hello guyzz, i learned python and wants to learn machine learning, can anyone share some advice or roadmap to give a great start at my machine leaning journey

heady sierra Jan 4, 2024, 7:24 AM

#

agile owl you need to have a certain observation shape

Apparently while debugging the code, I forgot to rerun the cell above so that fixed the issue. But now I am trying to add some indicators to a custom stock env. It takes gym-anytrading's StockEnv. I am getting this error:

#

agile owl Jan 4, 2024, 7:35 AM

#

sounds like it needs a data attribute. did you read the environment implementation or is there a specification for how to subclass it?

#

I just make my own envs so I can't help you there

#

I only need to implement step and reset

#

i'm guessing the data attribute just needs to be some dataframe with certain column headers

heady sierra Jan 4, 2024, 7:39 AM

#

I think I understand the error better. I need to check the documentation again.

#

Thanks for the help

gloomy parrot Jan 4, 2024, 8:48 AM

#

hey everyone, i just want to ask on how can i stitch multiple images of receipt?

#

Does anyone tried it before?

woeful fossil Jan 4, 2024, 10:56 AM

#

sup everyone, im new to this chat and Ai/data science. i know some basic stuff about Ai and data science but im wondering if someone would be willing to help me learn more on this topic.

lapis sequoia Jan 4, 2024, 1:46 PM

#

Does anyone actually use feature mapping as oppose to kernels/gram matrix in SVM?

#

Cornell says that it's more efficient in lower dimensional feature maps but wouldn't you have to calculate the inner product anyway? So how is it more efficient when the kernel can do this without the explicit mapping

stark ermine Jan 4, 2024, 2:30 PM

#

Good morning, gentlemen,

I'm writing to you because I can't find the solution to my problem after much research.

I'm running a log-linear regression on the adjusted price of a stock (dependent/target variable = Y).
in order to make the relationship between my variables (price/date) more linear (especially if the distribution doesn't really follow a normal distribution, but I'm not telling you anything...).

On the other hand, I'd like to display the "true price" and the true standard deviations (68/95/99.7%).

But I have no idea...

I use python with yfinance, plotly and streamlit.

Thanks for your help in advance : ) !

zenith wing Jan 4, 2024, 2:35 PM

#

Any one has pc recommendation for data analytics??
I'm thinking ryzen 5 7600 and 6750xt with 32gb ram
Please advice my current amd laptop is quiet old and has 4gb ram which is hardly available half the time so I saved some money to buy a good pc to last few years

stark ermine Jan 4, 2024, 2:45 PM

#

zenith wing Any one has pc recommendation for data analytics?? I'm thinking ryzen 5 7600 and...

Your planned setup with a Ryzen 5 7600 CPU, Radeon RX 6750XT GPU, and 32GB of RAM seems quite decent for handling data analytics tasks.

zenith wing Jan 4, 2024, 3:04 PM

#

Alright just one question should I go with 13500 in intel but i also prioritize power consumption because cant pay too high electricity bills right

buoyant vine Jan 4, 2024, 3:05 PM

#

If power is something you care about

#

I would not use Intel, AMD in general has much better power efficiency per compute value in the newer chips in general

#

Also, in general, unless you're using 100% of the CPU all the time, your power bill is probably not going to be noticably different regardless of what CPU you go with

#

other than maybe if you put a Threadripper or server CPU in it 😅

#

If you really care about the power efficiency per compute and can afford it, the Ryzen x3D chips are incredible CPUs and have by far some of the best efficiency of the modern CPUs on the market rn.

#

But again, I don't think you will notice much difference in the power bill department

wooden sail Jan 4, 2024, 3:31 PM

#

do note that you'll have to fiddle with your BLAS backend if you use AMD and want to do computations on cpu

#

and that AMD gpu's are not super well supported for any gpu computations

#

sadly history has favored intel and nvidia greatly in the area of scientific computing

#

check whether your target modules support ROCm instead of cuda, and read around about MKL vs openBLAS vs BLIS

lapis sequoia Jan 4, 2024, 4:05 PM

#

Hello, anyone have experience with Python and AI/Machine Learning etc? or good and woorking tutorials? for prediction stock price. I have data in CSV

hollow sentinel Jan 4, 2024, 4:12 PM

#

#

import pandas as pd 
import numpy as np 
import warnings
from datetime import datetime
import matplotlib.pyplot as plt 
import matplotlib.dates as dates

heart_df = pd.read_csv("/Users//Desktop/Apple Watch Data/HKQuantityTypeIdentifierHeartRate.csv")
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
print(heart_df.head(5))


heart_df["creationdate"] = pd.to_datetime(heart_df["creationdate"], format = "%m/%d/%y %H:%M")

heart_df["startdate"] = pd.to_datetime(heart_df["startdate"], format = "%m/%d/%y %H:%M")

heart_df["enddate"] = pd.to_datetime(heart_df["enddate"], format= "%m/%d/%y %H:%M")
print(heart_df.dtypes)


plt.plot(heart_df["startdate"], heart_df["value"], linestyle = "dotted")
# Add title and axis labels
plt.title('Time Series Plot')
plt.xlabel('Time')
plt.ylabel('Heart Rate')
plt.xticks(rotation=45)
plt.show()

#

is there anything else i can do with this visualization?

#

bc i can't really make sense of this, there's no underlying trend here

#

maybe combine it with something else?

final kiln Jan 4, 2024, 5:37 PM

#

    def forward(self, in_sequence_bwc: Tensor) -> Tensor:

        batch, words, coordinates = in_sequence_bwc.size()
        pre_metric_tensors_nkk = self.pre_metric_tensors_nkk * self.MASK_11ww[0, :, self.K_DIMENSION, self.K_DIMENSION]
        metric_tensors_nkk = pre_metric_tensors_nkk @ pre_metric_tensors_nkk.transpose(-1, -2)  # ensures symmetry and positive definiteness

        all_projections_bwc = self.projections_cc(in_sequence_bwc)

        all_projections_bnwk = all_projections_bwc.view(batch, words, self.NUMBER_OF_HEADS, self.K_DIMENSION).transpose(1, 2)
        # all_projections_bnwk = F.normalize(all_projections_bnwk, p=2, dim=-1)

        all_dot_products_bnww = all_projections_bnwk @ metric_tensors_nkk @ all_projections_bnwk.transpose(-1, -2)
        all_dot_products_bnww = all_dot_products_bnww / math.sqrt(self.K_DIMENSION)
        all_dot_products_bnww = all_dot_products_bnww.masked_fill(self.MASK_11ww[:,:,:words,:words] == 0, float('-inf'))
        all_dot_products_bnww = F.softmax(all_dot_products_bnww, dim=-1)

        # all_dot_products_bnww = all_dot_products_bnww * self.MASK_11ww[:,:,:words,:words]

        nudged_vectors_bnwk = all_dot_products_bnww @ all_projections_bnwk
        nudged_vectors_bwnk = nudged_vec
tors_bnwk.transpose(1, 2).contiguous()
        nudged_vectors_bwc = nudged_vectors_bwnk.view(batch, words, coordinates)

        out_sequence_bwc = self.mixer_cc(nudged_vectors_bwc)

        return out_sequence_bwc

don't know what's up with that softmax stuff, but unlike Q, K, V, it seems to be essential. I've trimmed down Q, K, V to a lower triangular matrix (pre_metric_tensors_nkk, of which only half get updated during training ) and a projection matrix (projections_cc). results so far are very similar to Q, K, V which result from Wq, Qk and Wv, which total a larger number of params

potent pollen Jan 4, 2024, 5:44 PM

#

Is there a problem to give some negative values in the input layer of a RNN ? I give a distance to the ai, and i tell me maybe the fact that it's negative will slow down the learning process, since I'm using ReLU for the hidden layers. Should I modify my way of counting the distance ?

agile cobalt Jan 4, 2024, 6:25 PM

#

strictly speaking "distance" is always a positive metric, you probably should be abs()'ing it anyway

agile cobalt Jan 4, 2024, 6:26 PM

#

potent pollen Is there a problem to give some negative values in the input layer of a RNN ? I ...

but there is no problem in giving negative inputs, and in some cases you are even recommended to transform positive inputs into negative inputs as part of normalization - see https://datascience.stackexchange.com/questions/54296/should-input-images-be-normalized-to-1-to-1-or-0-to-1 for example

Data Science Stack Exchange

Should input images be normalized to -1 to 1 or 0 to 1

Many ML tutorials are normalizing input images to value of -1 to 1 before feeding them to ML model. The ML model is most likely a few conv 2d layers followed by a fully connected layers. Assuming

potent pollen Jan 4, 2024, 6:29 PM

#

agile cobalt but there is no problem in giving negative inputs, and in some cases you are eve...

Ok thanks !

final kiln Jan 4, 2024, 7:17 PM

#

this is the loss graph from the new transformer variant, which I'm calling metric tensor network (mtn), maybe I'm biased but it looks exactly like the transformers loss graph that I've been sharing here so far

#

the real advantage will come from the fact that I can double the amount of attention heads (since the metric tensor is symmetrical, half the space is being wasted rn) and still have less parameters than the transformer that produces this

frigid owl Jan 4, 2024, 7:57 PM

#

Im looking for a tool to visuilaze training results. For now im just using matplotlib but is there any better libraries to get the job done?

final kiln Jan 4, 2024, 8:05 PM

#

frigid owl Im looking for a tool to visuilaze training results. For now im just using matpl...

I've been using MLFlow, and it has been a blessing

frigid owl Jan 4, 2024, 8:07 PM

#

Thanks a lot

devout creek Jan 4, 2024, 8:07 PM

#

someone can help me with this ? : https://colab.research.google.com/drive/10g0pY3vv-mBu5t9yVJ2dQADoI-luOPa9

Google Colaboratory

frigid owl Jan 4, 2024, 8:08 PM

#

Cant get in, No access

devout creek Jan 4, 2024, 8:08 PM

#

devout creek someone can help me with this ? : https://colab.research.google.com/drive/10g0pY...

i want to output in srt format, someone know how to do this ?

devout creek Jan 4, 2024, 8:08 PM

#

frigid owl Cant get in, No access

what ?

frigid owl Jan 4, 2024, 8:09 PM

#

your collab notebook is private

#

make it public

devout creek Jan 4, 2024, 8:09 PM

#

okay

#

https://colab.research.google.com/drive/10g0pY3vv-mBu5t9yVJ2dQADoI-luOPa9?usp=sharing

Google Colaboratory

#

now it's public

#

@frigid owl

frigid owl Jan 4, 2024, 8:11 PM

#

👍

devout creek Jan 4, 2024, 8:11 PM

#

frigid owl 👍

so can you help me or no ?

frigid owl Jan 4, 2024, 8:12 PM

#

i dont really understand the problem

#

can you explain again please

devout creek Jan 4, 2024, 8:12 PM

#

do you know what is insanely fast whisper ?

frigid owl Jan 4, 2024, 8:12 PM

#

nah

devout creek Jan 4, 2024, 8:14 PM

#

do you know what is whisper ?

frigid owl Jan 4, 2024, 8:14 PM

#

yes

#

speech recognition model

devout creek Jan 4, 2024, 8:15 PM

#

so insanely fast whisper is way way faster

#

but i don't know how to make srt file output instead of json file

frigid owl Jan 4, 2024, 8:19 PM

#

have no idea how to do it

devout creek Jan 4, 2024, 8:19 PM

#

okay

frigid owl Jan 4, 2024, 8:19 PM

#

maybe go ask in #1035199133436354600

devout creek Jan 4, 2024, 8:20 PM

#

no one knows here

past meteor Jan 4, 2024, 9:00 PM

#

frigid owl Im looking for a tool to visuilaze training results. For now im just using matpl...

New(er) versions of matplotlib have a "plot training curve thing"

frigid owl Jan 4, 2024, 9:04 PM

#

huh

#

can please send a link to documentation

#

i looked it up but didnt find anything

#

ik that sklearn has this but never seen something like that in matplotlib

past meteor Jan 4, 2024, 9:26 PM

#

Oh sorry, I meant newer versions of sklearn

faint cape Jan 4, 2024, 9:39 PM

#

Hi

#

I'm a junior python developer that do web scraping and data analysis

#

I'm new

dusty valve Jan 5, 2024, 2:23 AM

#

hello, i wrote a mandelbrot zoom code and it pretty much zooms in anywhere.
The issue is, as it zooms it requires more and more iterations to get a clear image (as is intended). but i dont know exactly how many are required. so i just take perform iterations=frame_n*2.
This works fairly well, but is extremely slow as the video progresses.
Is there any formula or way to approximate the iterations required as i know the zoom in speed and point?

#

when it turns all white, the iterations are not high enough to get a clear frame. and later, it fuzzes out

grave ledge Jan 5, 2024, 9:25 AM

#

I have a task at hand. I have to write a python script to build a log parser into JSON Format. The log files are taken from the MACBook. I have to ultimately feed it to an LLM model so that it can detect the issues from the log files and summarise it.
Can anyone point me to a good resource or help me understand how can I build a good log parser?

final phoenix Jan 5, 2024, 11:28 AM

#

Hi
Please help me to extract 30 secs time interval from a column called created_time (ex 08:00:00 format) in python.

indigo moth Jan 5, 2024, 3:13 PM

#

mild dirge It makes a list of the elements of a

Ohhhhh I see ! Thanks a lot sir.

woeful wren Jan 5, 2024, 5:01 PM

#

how do i plot the graph of a function that i calculated

serene scaffold Jan 5, 2024, 5:01 PM

#

woeful wren how do i plot the graph of a function that i calculated

Hello, please show your code as text

#

!code

arctic wedgeBOT Jan 5, 2024, 5:01 PM

#

Formatting code on Discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

woeful wren Jan 5, 2024, 5:02 PM

#

i have these functions

#

fig    = plt.figure(figsize=(6, 6))
ax     = fig.add_subplot()

xx     = np.linspace(-5., 10., 1000)
# ------- Vul verder aan ------- 
vgl = opl.subs(a, 1/2)
vgl1 = vgl.subs(b, 2)
vgl2 = vgl.subs(b,4)
display(vgl1)
display(vgl2)

plt.show()

#

oh thats the wrong thing elt me fix

#

ye so this is my code so far

serene scaffold Jan 5, 2024, 5:03 PM

#

and this is with sympy?

woeful wren Jan 5, 2024, 5:04 PM

#

serene scaffold and this is with sympy?

uh mathplotlib and sympy

#

the exam on monday is a sympy exam and this is a thing they gave us to prep

#

ax.plot(x, vgl1)

#

this is what i gotta use i think but it doesnt work

#

this is the error i get if i use thatt code

serene scaffold Jan 5, 2024, 5:07 PM

#

import sympy as sp
x = symbols('x')
sp.plotting.plot(sp.cos(x), (x, -10, 10))

This worked for me

#

#

from your example, I don't know what opl is, so I can't make sense of what happens after that

woeful wren Jan 5, 2024, 5:09 PM

#

serene scaffold ```py import sympy as sp x = symbols('x') sp.plotting.plot(sp.cos(x), (x, -10, 1...

ye alright but there is given code

#

sorry i forgot to mention the first part of the code is given

#

fig    = plt.figure(figsize=(6, 6))
ax     = fig.add_subplot()

xx     = np.linspace(-5., 10., 1000)
# ------- Vul verder aan -------

#

this is givenand the plt.show as well

serene scaffold Jan 5, 2024, 5:10 PM

#

woeful wren ``` fig = plt.figure(figsize=(6, 6)) ax = fig.add_subplot() xx = np....

that's fine, but you don't define opl here, and then you use it in the next line. good examples have every variable defined, or follow common conventions for variable naming

#

(and maybe "opl" is a common name for something, but idk what)

tidal bough Jan 5, 2024, 5:11 PM

#

serene scaffold ```py import sympy as sp x = symbols('x') sp.plotting.plot(sp.cos(x), (x, -10, 1...

either that or evaluate the function on some points and then plot them normally:

import sympy as sp, numpy as np
x = sp.symbols('x')
f = sp.cos(x)

X = np.linspace(-10,10,1000)
y = sp.lambdify(x,f)(X)

# plt.plot(X,y) or whatever

woeful wren Jan 5, 2024, 5:11 PM

#

vgl = opl.subs(a, 1/2)
vgl1 = vgl.subs(b, 2)
vgl2 = vgl.subs(b,4)
display(vgl1)
display(vgl2)

#

all this pat of the code deos is calculate the fucntions

woeful wren Jan 5, 2024, 5:12 PM

#

woeful wren i have these functions

these functions

#

opl stands for oplossing which is dutch for solution and opl was the solution of my differential equation and then i sub a,b in the equation with the values that they tell us to sub

woeful wren Jan 5, 2024, 5:14 PM

#

tidal bough either that or evaluate the function on some points and then plot them normally:...

yo this works thankss

soft lantern Jan 5, 2024, 5:17 PM

#

python for data analysis, a good book to start with?

serene scaffold Jan 5, 2024, 5:19 PM

#

soft lantern python for data analysis, a good book to start with?

"data science from scratch" second edition

soft lantern Jan 5, 2024, 5:36 PM

#

serene scaffold "data science from scratch" second edition

im not really a beginer, ive learned numpy and pandas on free code camp, but thats hasnt equiped me with problem solving

serene scaffold Jan 5, 2024, 5:36 PM

#

soft lantern im not really a beginer, ive learned numpy and pandas on free code camp, but tha...

then you can skip a few chapters

soft lantern Jan 5, 2024, 5:37 PM

#

you still recomment that book?

serene scaffold Jan 5, 2024, 5:37 PM

#

ya

soft lantern Jan 5, 2024, 5:37 PM

#

um what makes you suggest that very book

serene scaffold Jan 5, 2024, 5:38 PM

#

I've read it and it's a good overview of the space and doesn't have shitty code examples

soft lantern Jan 5, 2024, 5:42 PM

#

ty admin

dusty valve Jan 5, 2024, 6:00 PM

#

serene scaffold ```py import sympy as sp x = symbols('x') sp.plotting.plot(sp.cos(x), (x, -10, 1...

Bruh spy cam plot too?

#

sympy*

#

mpl still top tjo

final kiln Jan 5, 2024, 7:32 PM

#

started drafting an explanation thing, if I get good results I'll expand it into a paper

valid crow Jan 5, 2024, 8:11 PM

#

I'm relatively new to this field, but I'm eager to learn about data science and AI. If any of you have recommendations for the best resources to study data science and AI, I would greatly appreciate it if you could kindly share those details with me.

serene scaffold Jan 5, 2024, 8:22 PM

#

valid crow I'm relatively new to this field, but I'm eager to learn about data science and ...

!resources data science

arctic wedgeBOT Jan 5, 2024, 8:22 PM

#

Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

dense yarrow Jan 5, 2024, 9:02 PM

#

how can i interpret this graph? the total is a column that combines the original "drug_recode" column with "drugs_imputated" column

#

the original drug recode column has 1 for often/sometimes use of drugs. 0 for never use of drugs

tidal bough Jan 5, 2024, 9:03 PM

#

imputation increases drug use? 🥴

dense yarrow Jan 5, 2024, 9:03 PM

#

does this mean imputating is not necessary for this dataset?

tidal bough Jan 5, 2024, 9:03 PM

#

i feel like it's bad (suggests problems with imputation) if imputed points are so different from the real ones, but unsure.

dense yarrow Jan 5, 2024, 9:03 PM

#

tidal bough imputation increases drug use? 🥴

it seems that it increases the value in the drug_recode column

dense yarrow Jan 5, 2024, 9:04 PM

#

tidal bough i feel like it's bad (suggests problems with imputation) if imputed points are s...

i was thinking so too, but i wanted to get opinions from those more familiar with the topic

#

this is my first time working with imputation

dense yarrow Jan 5, 2024, 9:25 PM

#

this is what i ended up writing:This graph shows the effect of imputation on the drug_recode value. As we see, the imputated values are very different than the reported or no imputatation values. The combined values better fit the reported values. As next steps, we would revisit the imputation method for significant error or bias as well as investigate how the structure of the data contributed to the accuracy of the imputation.

tidal bough Jan 5, 2024, 9:31 PM

#

The combined values better fit the reported values
i wouldn't write this because this is literally always true

#

an average of datasets X and Y will always be more similar to X than Y is

dense yarrow Jan 5, 2024, 9:34 PM

#

okay!

#

thanks!

half mountain Jan 5, 2024, 11:24 PM

#

Hi, I want to use a machine learning model to predict future performance of players. I am using Quarterback stats from the NFL. The stats are week to week and player by player. I wanted to use past games to determine future games of a player. I decided to use a Random Forest Model. I will try to predict touchdowns using Features like Pass_YDs, Interceptions, and Pass Attempts.

What I am doing does not seem right. I will never have my Features (Pass_YDs, Interceptions, and Pass Attempts) before a game is played so I cannot predict Pass Touchdowns with those. Features seem to me like ideas I know before a game is played like opponent, Home or Away game, etc. What I am trying to do is predict a players future performance in games based on past performance. Can you help with ideas on how I would do this and if I am on the right track?

Thanks! 🙂

agile owl Jan 5, 2024, 11:46 PM

#

i would probably regress each statistic using a model that ensures positivity on a vector of the past statistics of the player, their team, and the opposing team's defense

#

like the past 5 games or something

#

then i would fit that across quarterbacks in general so you have enough data

#

there's a lot of options for how to do the regression

#

I think with things like sports though the variance is huge

#

hard to model all the factors

half mountain Jan 5, 2024, 11:55 PM

#

Okay, that should get me in the right direction. It is just an early project for a portfolio I am putting together. I am a data analyst so just practicing.

#

Thank you for the input!

agile owl Jan 5, 2024, 11:57 PM

#

np

#

where are you getting the data?

#

trading usdjpy out of sample

#

hows shakespeare coming along

half mountain Jan 6, 2024, 1:25 AM

#

agile owl where are you getting the data?

ESPN

agile owl Jan 6, 2024, 1:33 AM

#

do they have a way to download it easily or do you have to pull it from their summaries

half mountain Jan 6, 2024, 2:32 AM

#

@agile owl I pull it from their summaries

pale hemlock Jan 6, 2024, 3:09 AM

#

anyone interested in discussing the topic of conversation?

agile owl Jan 6, 2024, 4:06 AM

#

trading the yield curve

#

trading the 10y treasury

#

add these all toegether and the sharpe ratio is well in excess of 2

#

I need to write the variance weighting code I've just been doing it by hand on the renders in excel

barren fable Jan 6, 2024, 4:31 AM

#

I learned lots of different models in regression and classification in machine learning, like linear, polynomial, and svr for regression and logistic, svm, and kernel svm for classification. But all of these were intuitive explanations without, for instance, explaining the Gradient Descent and Convergence Algorithm, Ridge, Laso, or the math behind every model. So I don't want to dive into deep learning with our understandable machine. So first, what's your opinion? Should I dive more into machine learning or am I able to go to deep learning? Second, some people recommended to me some playlists for machine learning.

Krish Naik: https://www.youtube.com/watch?v=kEmnkUw0NTs&list=PLZoTAELRMXVPMbdMTjwolBI0cJcvASePD&index=3

Andrew NG: https://www.youtube.com/watch?v=vStJoetOxJg&list=PLkDaE6sCZn6FNC6YRfRQc_FbeQrF8BwGI

and some more, so is there a good resource explaining the models in depth in a simple way, like code implementation and projects? Thanks!

YouTube

Krish Naik

Machine Learning Series- Universe of Data Science #Day1

Materials And Dashboard access after the video
https://ineuron.ai/course/Machine-Learning-Community-Class

Starting a new series on ML community ssessions :). In this video we will learn about the Universe OF Data Science

Join iNeuron's Data Science Masters Course with Job Guaranteed Starting From April 3rd 2023
https://ineuron.ai/course/Full-S...

▶ Play video

YouTube

DeepLearningAI

#1 Machine Learning Specialization [Course 1, Week 1, Lesson 1]

The Machine Learning Specialization is a foundational online program created in collaboration between DeepLearning.AI and Stanford Online. This beginner-friendly program will teach you the fundamentals of machine learning and how to use these techniques to build real-world AI applications.

This Specialization is taught by Andrew Ng, an AI visi...

▶ Play video

wispy thorn Jan 6, 2024, 6:15 AM

#

Hi everyone, I'm trying to install an older version of matplotlib, i.e < 3.3.0 but everytime I try to install it, it's not able to gather the requirements to build the wheel

#

Anyone got any heads up as how this problem will be resolved?

final kiln Jan 6, 2024, 7:24 AM

#

You probably need to update python to either an earlier or later version.

vapid garden Jan 6, 2024, 8:51 AM

#

How can I generate classification dataset using either gaussian distribution or Poisson distribution.

#

...?

past meteor Jan 6, 2024, 2:18 PM

#

vapid garden How can I generate classification dataset using either gaussian distribution or ...

You can do this using Numpy and Pandas

#

I've done this a couple of times in the past. Whenever I volunteer to give a data science training I make my own datasets and I do it with Numpy. I pick a case that seems interesting to the audience, define some variables and think about how they're generated (e.g., maybe age is gamma distributed, maybe the relationship between age and cost is a multimodal gaussian etc)

#

Maybe there's tools that do this in a more automated fashion but I do it manually 🙂

#

I don't respond to DMs sorry

vapid garden Jan 6, 2024, 2:24 PM

#

Like a dataset if 10k rows each row is only Poisson distribution, can be any number of features

#

Can u give me code on how to do it

past meteor Jan 6, 2024, 2:25 PM

#

I'm not a fan of giving you the full solution because then you learn the least

#

https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.poisson.html <---- with this you variables following a poisson distribution

desert oar Jan 6, 2024, 2:46 PM

#

vapid garden How can I generate classification dataset using either gaussian distribution or ...

think about how classification models work and do that in reverse: construct a continuous response/output/label, and convert that to discrete classes

final kiln Jan 6, 2024, 4:23 PM

#

final kiln started drafting an explanation thing, if I get good results I'll expand it into...

the larger scale test is taking a while, I'm almost done refactoring the repo for it. I'm going to search through these guys

https://paperswithcode.com/method/strided-attention
https://paperswithcode.com/method/fixed-factorized-attention
https://paperswithcode.com/method/dot-product-attention
https://paperswithcode.com/method/scaled

to try to see how I can evaluate the performance of my architecture, at least two of them seem to benchmark on language translation tasks, so I might have to adapt for that, the dot product attention seems to be where the 2017 paper ended coming from, I only read it for a bit, they don't seem to be using a lot of gpu which is a relief

#

ig my thing fits in as a generalization of the scaled dot product attention, or a simplification ig

elfin swan Jan 6, 2024, 4:35 PM

#

bgf = BetaGeoFitter(penalizer_coef=0.001)

bgf.fit(rfm['frequency'],
rfm['recency_weekly_p'],
rfm['T_weekly'])

ConvergenceError:
The model did not converge. Try adding a larger penalizer to see if that helps convergence.

Please help me with this, I am getting this error

serene scaffold Jan 6, 2024, 5:31 PM

#

elfin swan bgf = BetaGeoFitter(penalizer_coef=0.001) bgf.fit(rfm['frequency'], r...

can you show the import statement for BetaGeoFitter?

heady sierra Jan 6, 2024, 6:45 PM

#

Hello. I am training a reinforcement learning model using stablebaselines3 but every time I train in vscode, I am using my cpu even though I have a rtx 4060. How can I make it use my gpu instead?

twilit dove Jan 6, 2024, 6:56 PM

#

Hi guys! I've recently started my journey into data science and machine learning in general and one tip for exploring different applications I keep hearing is to read research papers and attempt to replicate the models created in the papers. Are there any websites/journals that ML researchers generally publish their papers on, or are ML research papers on more generic science journals. Thanks

serene scaffold Jan 6, 2024, 7:04 PM

#

heady sierra Hello. I am training a reinforcement learning model using stablebaselines3 but e...

Can you show some code?

serene scaffold Jan 6, 2024, 7:06 PM

#

twilit dove Hi guys! I've recently started my journey into data science and machine learning...

I actually don't recommend that. Papers are about very specific developments, and are challenging to read, even for experienced practitioners. You won't build foundational knowledge in DS/ML by reading papers.

heady sierra Jan 6, 2024, 7:07 PM

#

serene scaffold Can you show some code?

Its in vscode notebook so which part do you want to see?

serene scaffold Jan 6, 2024, 7:07 PM

#

heady sierra Its in vscode notebook so which part do you want to see?

the part where you create the model. By the way, I won't look at screenshots.

heady sierra Jan 6, 2024, 7:11 PM

#

class MyCustomEnv(StocksEnv):
    _process_data = signals

env2 = MyCustomEnv(df= data, window_size= 15, frame_bound=(15, 90))

env_maker = lambda: env2

env = DummyVecEnv([env_maker])

model = DQN("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=500000)

while True:
    obs = obs[np.newaxis, ...]
    action, _states = model.predict(obs)
    obs, rewards, terminated, truncated, info = env.step(int(action))

    done = terminated or truncated
    if done:
        print("info", info)
        break

#

This is the output:

Using cpu device

| rollout/ | |
| exploration_rate | 0.993 |
| time/ | |
| episodes | 4 |
| fps | 16755 |
| time_elapsed | 0 |
| total_timesteps | 348 |

| rollout/ | |
| exploration_rate | 0.987 |
| time/ | |
| episodes | 8 |
| fps | 16891 |
| time_elapsed | 0 |
| total_timesteps | 696 |

| rollout/ | |
| exploration_rate | 0.98 |
| time/ | |
| episodes | 12 |
| fps | 16301 |
...
| learning_rate | 0.0001 |
| loss | 1.3e+05 |
| n_updates | 112431 |

serene scaffold Jan 6, 2024, 7:12 PM

#

heady sierra ``` class MyCustomEnv(StocksEnv): _process_data = signals env2 = MyCustomEn...

Can you add the import statements for everything used here?
Also, please use python highlighting.
```py

twilit dove Jan 6, 2024, 7:13 PM

#

serene scaffold I actually don't recommend that. Papers are about very specific developments, an...

hmm ok then. Thanks!

heady sierra Jan 6, 2024, 7:14 PM

#

import gymnasium as gym
import gym_anytrading

from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3 import DQN

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
import torch

class MyCustomEnv(StocksEnv):
    _process_data = signals

env2 = MyCustomEnv(df= data, window_size= 15, frame_bound=(15, 90))

env_maker = lambda: env2

env = DummyVecEnv([env_maker])

model = DQN("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=500000)

while True:
    obs = obs[np.newaxis, ...]
    action, _states = model.predict(obs)
    obs, rewards, terminated, truncated, info = env.step(int(action))

    done = terminated or truncated
    if done:
        print("info", info)
        break

serene scaffold Jan 6, 2024, 7:15 PM

#

modify these two lines accordingly

# before
model = DQN("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=500000)

# after
cuda = torch.device('cuda')
model = DQN("MlpPolicy", env, verbose=1)
model.to(cuda)
model.learn(total_timesteps=500000)

then see if that moves it to the GPU

elfin swan Jan 6, 2024, 7:17 PM

#

serene scaffold can you show the import statement for BetaGeoFitter?

import datetime as dt
import pandas as pd
import matplotlib.pyplot as plt
from lifetimes import BetaGeoFitter
from lifetimes import GammaGammaFitter
from lifetimes.plotting import plot_period_transactions

serene scaffold Jan 6, 2024, 7:19 PM

#

elfin swan bgf = BetaGeoFitter(penalizer_coef=0.001) bgf.fit(rfm['frequency'], r...

The short answer is that that model can't learn your rfm data with the penalizer_coef hyperparameter that you set.

#

btw @heady sierra, you might need to redefine obs as a tensor (it appears to be an array currently) and move that to the GPU as well.

heady sierra Jan 6, 2024, 7:21 PM

#

ok got it. Thanks

desert oar Jan 6, 2024, 8:02 PM

#

final kiln ig my thing fits in as a generalization of the scaled dot product attention, or ...

yeah if anything your technique is a specialization of it

#

great that you're following up on it. i'm looking forward to the benchmark results

final kiln Jan 6, 2024, 8:08 PM

#

desert oar yeah if anything your technique is a _specialization_ of it

Yes indeed, it's a special case of it

final kiln Jan 6, 2024, 8:08 PM

#

desert oar great that you're following up on it. i'm looking forward to the benchmark resul...

As long as I keep getting good results I'll keep digging

#

I've experimented with several designs now, and it seems that the network doesn't really care what you do as long as you calculate scores.

#

The exciting thing is the parameter reduction

#

And the design philosophy of forcing the network to do geometry to make decisions

#

I checked the hardware used ok the 2017 paper and I'll be able to reproduce a good chunk of the their table

#

These scores are already from the new thing

#

And its output is similar to the transformer

The meaning of life is to shake others.
With old only make an oath shallker thoughts; and says so defend:
but the fault out of thick-morrow, to ruin;
And holy clergymen must be needful
The younger slanders to be more than the hollow service
Known and defars; crave heaven,
Even as it would fear'd keep the time
Of love can admitges change;
So much lenity of soldiers,
Then thieves conn'd poison'd blanks,
Cry

#

So at least on a very small dataset it works fine and on par w/ the transformer

agile owl Jan 6, 2024, 8:20 PM

#

Not sure if that means anything or not

#

lol

final kiln Jan 6, 2024, 8:21 PM

#

Me either, I just ask gpt if it looks coherent

agile owl Jan 6, 2024, 8:21 PM

#

I don't think shallker is a word?

#

then again I think shakespeare invented words so hey

final kiln Jan 6, 2024, 8:21 PM

#

Wait

#

Yes it exists, it's an archaic word

agile owl Jan 6, 2024, 8:22 PM

#

what about thick-morrow

final kiln Jan 6, 2024, 8:23 PM

#

Ah, no it doesn't

#

Gpt hallucinated

desert oar Jan 6, 2024, 8:23 PM

#

final kiln These scores are already from the new thing

how do those compare to the attention matrices from the original?

#

not that i'd expect all 8 heads to find the same thing

#

but maybe we'd expect their sum/average to evaluate to something similar

#

or i suppose just comparing the final output distribution over tokens (rather than individual tokens that were chosen)

final kiln Jan 6, 2024, 8:25 PM

#

desert oar how do those compare to the attention matrices from the original?

I didn't graph the other scores, but I can do it tomorrow, same code practically

desert oar Jan 6, 2024, 8:25 PM

#

that is, does your model generate quantitatively similar distributions over tokens (e.g. sum of squared differences akin to brier score)?

#

your projects have been very inspiring!

final kiln Jan 6, 2024, 8:26 PM

#

agile owl Not sure if that means anything or not

But the point is that the output is the same non sense as the transformer

final kiln Jan 6, 2024, 8:26 PM

#

desert oar that is, does your model generate quantitatively similar distributions over toke...

That's a good idea actually

desert oar Jan 6, 2024, 8:28 PM

#

i'm also really curious if this works just as well on bert-like models as it does on gpt-like models

final kiln Jan 6, 2024, 8:28 PM

#

desert oar or i suppose just comparing the final output distribution over tokens (rather th...

So like, feeding a token and directly comparing the output ?

desert oar Jan 6, 2024, 8:28 PM

#

final kiln So like, feeding a token and directly comparing the output ?

yeah, but instead of reducing the output to a token or sequence of tokens, leave it as a (sequence of) probability distribution(s) over tokens

final kiln Jan 6, 2024, 8:29 PM

#

Right, ig it would make sense for them to be similar since they're both approximating the same probability law

desert oar Jan 6, 2024, 8:29 PM

#

yeah hopefully

#

although you need to collapse it down to a single token to generate more than one token... so maybe just start with a single token prediction like you said

#

maybe walk through an existing valid english-language document and compare the output at each token?

#

that way you know the inputs make sense

#

averaging a bunch of one-step-ahead forecasts rather than averaging an entire forecasting procedure, if that makes any sense

final kiln Jan 6, 2024, 8:31 PM

#

Yes it's a good idea. I'm gonna take note. Right now I'm preparing the repo for a series of larger scale and more systematic experiments. After that I'll go through a data analysis phase where I do these comparisons

final kiln Jan 6, 2024, 8:32 PM

#

desert oar i'm also really curious if this works just as well on bert-like models as it doe...

I didn't know bert was different

desert oar Jan 6, 2024, 8:33 PM

#

final kiln I didn't know bert was different

i'm sure there are other details involved, but my understanding of the main difference is that bert doesn't do masking, so it's considered an "encoder-only" model compared to the gpt "decoder-only" design

#

that is, in bert every token can attend to every other token in the sequence without restriction, whereas in gpt tokens can only attend to tokens earlier in the sequence (which is why the upper triangle of the attention matrix in your output is all 0)

#

https://datascience.stackexchange.com/a/65242/1156

Data Science Stack Exchange

Why is the decoder not a part of BERT architecture?

I can't see how BERT makes predictions without using a decoder unit, which was a part of all models before it including transformers and standard RNNs. How are output predictions made in the BERT

agile owl Jan 6, 2024, 8:37 PM

#

what are the best models for sentiment extraction these days

#

I'm thinking about adding a new input stream to my reinforcement learner based on sentiment analysis

desert oar Jan 6, 2024, 8:37 PM

#

interestingly it looks like microsoft was attempting to use a bert-like architecture for language generation, which i guess turned out to be a dead end since i've never heard anything about it (and gpt of course turned out to be a tremendous success) https://arxiv.org/abs/1905.02450

arXiv.org

MASS: Masked Sequence to Sequence Pre-training for Language Generation

Pre-training and fine-tuning, e.g., BERT, have achieved great success in language understanding by transferring knowledge from rich-resource pre-training task to the low/zero-resource downstream tasks. Inspired by the success of BERT, we propose MAsked Sequence to Sequence pre-training (MASS) for the encoder-decoder based language generation tas...

final kiln Jan 6, 2024, 8:40 PM

#

desert oar https://datascience.stackexchange.com/a/65242/1156

Wait so you literally just don't mask it and have a special char marking a word as empty ?

#

The output is generated non-autoregressively (every token at the output is computed at the same time, without any self-attention mask), conditioning on the non-masked tokens, which are present in the same input sequence as the masked tokens.

#

So this is why it's so much faster

shadow viper Jan 6, 2024, 8:46 PM

#

Good day everyone, is there a quicker way to train a model?
it takes too much time and the power outage affects it a lot

proud wing Jan 6, 2024, 10:31 PM

#

@final kiln Thats a method... masking versus padding

#

It's just not supported by all the different tuning approaches out there.

#

Just finished one of the stages for my model ocr code dataset training-generator.

Model image detection:
[1] CPLUZPLUZ.png
[2] pyth0n.png
[3] ruzt.png
[4] c55.png
Auto-Selecting *.png
Auto-preprocess: (yes/no): yes
File Save Strategy: (txt/pdf/auto): auto
CPLUZPLUZ.png saved as CPLUZPLUZ-OCR.cpp (Cpp detected)
pyth0n.png saved as pyth0n-OCR.py (Py detected)
ruzt.png saved as ruzt-OCR.rs (Rs detected)
c55.png saved as c55-OCR.css (Css detected)

slow vigil Jan 6, 2024, 10:43 PM

#

Anyone here use Polars? I'm having a weird issue. I have a program that is using multi-processing and threading in each process, and I'm trying to intermittently write a dataframe to a csv for each process once the dataframe reaches a certain height. The height check gets triggered and by printing the dataframe just before the call to df.write_csv(filename) I can see that the dataframe has data in it, but when I look in the file that gets written out it is only writing out the headers of the dataframe and doesn't contain any data

past meteor Jan 6, 2024, 10:46 PM

#

slow vigil Anyone here use Polars? I'm having a weird issue. I have a program that is using...

Can I ask why you've structured your program like that?

slow vigil Jan 6, 2024, 10:47 PM

#

There's a lot of data being gathered and I wanted to safeguard it in case the program fails at any point

#

I was being CPU bottlenecked so I set it up to run in multiple processes

#

Also, by flushing the dataframes periodically it keeps the in-mem size down

past meteor Jan 6, 2024, 10:48 PM

#

Personally seems like a strange way to structure it, I think if you did it in a more orthodox way it wouldn't be as big of an issue

slow vigil Jan 6, 2024, 10:48 PM

#

I would if I could

past meteor Jan 6, 2024, 10:48 PM

#

Especially since Polars lets you stream data from 1 source to another

slow vigil Jan 6, 2024, 10:49 PM

#

I had to do it this way

past meteor Jan 6, 2024, 10:49 PM

#

So you can effectively work with larger-than-memory datasets

slow vigil Jan 6, 2024, 10:49 PM

#

It was going to take 30 hours to run without multiprocessing

#

now it takes 4

past meteor Jan 6, 2024, 10:49 PM

#

Polars uses multiple cores by default

slow vigil Jan 6, 2024, 10:49 PM

#

Please just trust me lol

proud wing Jan 6, 2024, 10:50 PM

#

Have you tried inspecting the dataframe prior to it reaching the height?

past meteor Jan 6, 2024, 10:50 PM

#

I don't know what your specific issue is, you could have a race condition somewhere

slow vigil Jan 6, 2024, 10:50 PM

#

I print the dataframe right before trying to write and it is normal. Full of data

proud wing Jan 6, 2024, 10:51 PM

#

It sounds to me like the way you are writing the data is the issue

past meteor Jan 6, 2024, 10:51 PM

#

Seems likely since you have multiple processes trying to write to the same file

proud wing Jan 6, 2024, 10:51 PM

#

not the dataframe itself, since its printing without any issue

#

You might have a race condition if youre trying to write to the same file

#

You can use a writequeue

#

to ensure they dont compete for writing

slow vigil Jan 6, 2024, 10:52 PM

#

if main_df.height >= 10:
  if os.path.exists(csv_name):
      existing = pl.read_csv(csv_name)
      main_df = pl.concat([existing, main_df])
  print(main_df)
  main_df.write_csv(csv_name)
  main_df = main_df.clear()

I used a height of 10 here just for testing purposes

#

But yeah it might be because of the threading

#

Maybe I'll just have to make it with bulletproof error handling and then return the df from each process

#

and do the write after it returns

#

pretty inconvenient though tbh

#

oh

proud wing Jan 6, 2024, 10:55 PM

#

import os
import pandas as pd
import threading
import random
import time

def generate_test_data(thread_num):
    return pd.DataFrame({'Thread': [thread_num], 'Height': [random.randint(1, 15)]})

def process_and_write(main_df, csv_name, thread_num):
    with write_lock:
        if os.path.exists(csv_name):
            existing = pd.read_csv(csv_name)
            main_df = pd.concat([existing, main_df])
        print(f"Thread {thread_num} - Data put in write queue: {main_df}")
        main_df.to_csv(csv_name, mode='a', index=False, header=not os.path.exists(csv_name))
        print(f"Thread {thread_num} - Data written to the file: {main_df}")
        main_df = main_df.iloc[0:0]

def data_generation_thread(csv_name, thread_num):
    while True:
        main_df = generate_test_data(thread_num)
        if main_df.iloc[0]['Height'] >= 10:
            process_and_write(main_df, csv_name, thread_num)
        time.sleep(1)  


write_lock = threading.Lock()
num_threads = 3
threads = []
csv_name = 'output.csv'

for i in range(num_threads):
    thread = threading.Thread(target=data_generation_thread, args=(csv_name, i+1))
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

#

hope that helps.

slow vigil Jan 6, 2024, 10:56 PM

#

hmmmm write lock

past meteor Jan 6, 2024, 10:56 PM

#

Can you try writing to different files and just concatenating at the end?

proud wing Jan 6, 2024, 10:56 PM

#

try that script ^

#

it will do what you basically described.

slow vigil Jan 6, 2024, 10:56 PM

#

I'm writing to a separate csv for each process. I did plan to combine them at the end

proud wing Jan 6, 2024, 10:56 PM

#

example output from it:

2,15
3,11
1,11
1,11
2,15
1,11
1,11
2,15
3,11
1,14
2,12
3,11
2,15
1,14
3,14

I have it writing the threadnumber to each line

slow vigil Jan 6, 2024, 10:57 PM

#

I'm not using the threading library though

#

If I import it just for write lock will it still work?

proud wing Jan 6, 2024, 10:57 PM

#

Its your script, test it:)

slow vigil Jan 6, 2024, 10:57 PM

#

lol true

#

Another thing I'm thinking is I could use a multiprocessing Value object

#

idk if I can pass a dataframe into that lol

past meteor Jan 6, 2024, 11:01 PM

#

proud wing ```python import os import pandas as pd import threading import random import ti...

Is this generated by AI?

#

I don't immediately see where/how this is failing but all I can say is that you're fighting against the API of polars

#

I would really really consider using the library as intended with, pl.source_x with x being your source and pl.sink_x to lazily read and write

slow vigil Jan 6, 2024, 11:03 PM

#

Each process has 10 threads running at a time, so it's probably the case that somehow that is causing an issue

past meteor Jan 6, 2024, 11:04 PM

#

Unless there's a very very specific reason why that is not possible

slow vigil Jan 6, 2024, 11:04 PM

#

I will look into those. I'm new to polars so I'm not familiar with the workflow yet. I'm basically using it philosophically the same as Pandas, which I'm sure is wrong

past meteor Jan 6, 2024, 11:05 PM

#

Yeah they're very different libraries. They only look similar on the surface

#

In the past I've reduced a workflow that took 1+ hour to run under <1 min using polars 🤷

slow vigil Jan 6, 2024, 11:05 PM

#

Sheeesh

past meteor Jan 6, 2024, 11:06 PM

#

This is due to various reasons, not just the libraries being different. (Pandas consumes a lot of memory so I had to batch my results, there was a lot of overhead of doing DB calls). All I'm saying is: read the documentation first and it'll pay off.

slow vigil Jan 6, 2024, 11:06 PM

#

I caught a co-worker using nested calls to iterrows in Pandas the other day lol

#

I was like, "Nuh uh"

frail arch Jan 7, 2024, 12:06 AM

#

so, I am trying to use standard baselines for some RL. Just learning.

env_name = "CartPole-v0"
env = gym.make(env_name)
env = DummyVecEnv([lambda: env])
model = PPO("MlpPolicy", env, verbose=1, tensorboard_log=log_path)
model.learn(total_timesteps=20000)

the last line is giving the following exception:

75 for env_idx in range(self.num_envs):
76 maybe_options = {"options": self._options[env_idx]} if self._options[env_idx] else {}
---> 77 obs, self.reset_infos[env_idx] = self.envs[env_idx].reset(seed=self._seeds[env_idx], **maybe_options)
78 self._save_obs(env_idx, obs)
79 # Seeds and options are only used once
ValueError: too many values to unpack (expected 2)
Any idea what I am doing wrong?

desert oar Jan 7, 2024, 12:27 AM

#

slow vigil I caught a co-worker using nested calls to iterrows in Pandas the other day lol

i actually had a legitimate use case for nested itertuples in a project last year

#

never iterrows though. just use range and iloc for that

slow vigil Jan 7, 2024, 1:38 AM

#

It appears that trying to write out from the process pool is causing each process to hang

Using cpu device

| rollout/ | | | exploration_rate | 0.993 | | time/ | | | episodes | 4 | | fps | 16755 | | time_elapsed | 0 | | total_timesteps | 348 |

| rollout/ | | | exploration_rate | 0.987 | | time/ | | | episodes | 8 | | fps | 16891 | | time_elapsed | 0 | | total_timesteps | 696 |

| rollout/ | | | exploration_rate | 0.98 | | time/ | | | episodes | 12 | | fps | 16301 | ... | learning_rate | 0.0001 | | loss | 1.3e+05 | | n_updates | 112431 |

| rollout/ | |
| exploration_rate | 0.993 |
| time/ | |
| episodes | 4 |
| fps | 16755 |
| time_elapsed | 0 |
| total_timesteps | 348 |

| rollout/ | |
| exploration_rate | 0.987 |
| time/ | |
| episodes | 8 |
| fps | 16891 |
| time_elapsed | 0 |
| total_timesteps | 696 |

| rollout/ | |
| exploration_rate | 0.98 |
| time/ | |
| episodes | 12 |
| fps | 16301 |
...
| learning_rate | 0.0001 |
| loss | 1.3e+05 |
| n_updates | 112431 |