#data-science-and-ml

sturdy kiln Apr 30, 2024, 6:02 PM

#

thats probably top of the line commercial GPUs

neat bluff Apr 30, 2024, 6:02 PM

#

Wait out for 4090ti ducky_devil

#

It's gonna cost a house and a car

#

And also half of your soul

#

Hm. I would love to help You but I have no clue what might be wrong

sturdy kiln Apr 30, 2024, 6:03 PM

#

does it also require a sacrifical lamb and a drop of a virgin's blood

neat bluff Apr 30, 2024, 6:04 PM

#

sturdy kiln does it also require a sacrifical lamb and a drop of a virgin's blood

That's saved for RTX5000 series

sturdy kiln Apr 30, 2024, 6:04 PM

#

thats more likely for the electric bill rather than the cost of the GPU itself lol

neat bluff Apr 30, 2024, 6:04 PM

#

True dat^

sturdy kiln Apr 30, 2024, 6:05 PM

#

i wonder for any of the 40 series how long the electricity bill outweighs the GPU itself lol

#

probably like a month of constant use?

neat bluff Apr 30, 2024, 6:06 PM

#

You would have to use it to a maxium for a year probably

#

That's my guess

#

home refrigerator's power consumption is typically between 300 to 800 watts of electricity.

#

That's when cooling of course

tidal bough Apr 30, 2024, 6:07 PM

#

they have a tdp of only like 300-400W, right? 400W×(13cent/(kW×h))−>$/year is 455$/year (of constant usage), where I googled us electricity prices as 13 cents/kwh

neat bluff Apr 30, 2024, 6:07 PM

#

Nvidia RTX 4090 has an official power draw of 450W

#

So not even a year, almost 3 in fact

sturdy kiln Apr 30, 2024, 6:21 PM

#

lol thats a very wrong forecast right there if is see one

#

hilarious how ARIMA(1,1,1) also gives me a flat line

#

like it just refuses to do anything

neat bluff Apr 30, 2024, 6:26 PM

#

sturdy kiln lol thats a very wrong forecast right there if is see one

ppm is having a snake tournament

#

Btw how is ur training set looking like? Is it divided by years or months/days? Cuz training it on years perspective might give a huge false positive

sturdy kiln Apr 30, 2024, 6:28 PM

#

its on months

#

or so i think

#

because i did dataDF.index = dataDF.index.to_period('M') to change the index to Month but i actually dont know if its what it does lol

#

grid search ftw

neat bluff Apr 30, 2024, 6:32 PM

#

Now it actually matches in terms of position

sturdy kiln Apr 30, 2024, 6:32 PM

#

im curious because this resource im following limited the grid search of the p,q,d to 11

#

can i go higher to get better results

#

or does it cause diminishing results

neat bluff Apr 30, 2024, 6:33 PM

#

I am afraid I don't know what You are talking about

sturdy kiln Apr 30, 2024, 6:34 PM

#

ARIMA takes 3 argumental values (p , q , d), different models give different results, you do grid search by fitting each ARIMA model and evaluating, and determining the best with the best metric (IE lowest RMSE)

#

the grid search i did limited the value from 0 to 11

#

so it can go from (0,0,0) to (11,11,11)

#

hence this thing

#

actually it wasnt 11

#

it was only p

#

q and d was limited to 4, so technically the max is (11,4,4)

neat bluff Apr 30, 2024, 6:36 PM

#

Hence the repetetive result? This is hella interesting but I see that I've got a TON of things to learn

sturdy kiln Apr 30, 2024, 6:36 PM

#

its not repetitive

#

its taking the model arguments, example (1,2,3), create a model and fit it and evaluate the results

#

do it on the next

#

and when finished compare all models, and determine which one is the best

neat bluff Apr 30, 2024, 6:37 PM

#

Oh, that's what You mean. Yeah I think I get it now

sturdy kiln Apr 30, 2024, 6:37 PM

#

for this dataset, it got (9,2,0) since it has the lowest MSE value out of all

#

its a very interesting regressional technique used on time series stuff

#

this is the first time im dealing with ARIMA lol

neat bluff Apr 30, 2024, 6:39 PM

#

This is the first time I am even looking at such a thing

#

Altough it seems super cool I think I will stick to my NLP shit

sturdy kiln Apr 30, 2024, 6:40 PM

#

its not even deep learning, its literally just regressional analysis

#

although im curious how i can use DL on time series data lol

neat bluff Apr 30, 2024, 6:43 PM

#

No clue, but good luck 👌🏻

peak ridge Apr 30, 2024, 6:43 PM

#

hey guys

#

ever worked with langchain?

serene scaffold Apr 30, 2024, 6:45 PM

#

peak ridge ever worked with langchain?

please always ask your actual question. don't ask to ask. if you have a question about langchain, assume that someone can help and ask a question that person could start answering.

neat bluff Apr 30, 2024, 6:49 PM

#

peak ridge ever worked with langchain?

Right now, what's up?

peak ridge Apr 30, 2024, 6:50 PM

#

so for my product what im doing rn is m using
ChatGPT-AssistantsAPI for responses and handling context/history for a convo
everythings good working great,

so i want to pass user's data like his workspace data from my db via restapi call to gpt via RAG
so it could give biased answer considering the workspace data + it's own llm

My problem:
FileNotFoundError: [Errno 2] No such file or directory
actually i have JSON(rest_api) responses because in Langchain there's no option for SQL RAG,

im trying to use JSONLoader
but it takes an required arguement called filepath
i dont rly have a filepath

serene scaffold Apr 30, 2024, 6:50 PM

#

peak ridge so for my product what im doing rn is m using ChatGPT-AssistantsAPI for respons...

FileNotFoundError: [Errno 2] No such file or directory
try giving the whole error message, from Traceback all the way to the end of the output.

#

!code

arctic wedgeBOT Apr 30, 2024, 6:50 PM

#

Formatting code on Discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

serene scaffold Apr 30, 2024, 6:50 PM

#

Please do not post screenshots of code.

peak ridge Apr 30, 2024, 6:51 PM

#

import requests
from dotenv import load_dotenv
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_community.document_loaders import JSONLoader


API_URL = "http://127.0.0.1:8000/api/workspaces/"

def get_workspace():
    response = requests.get(API_URL, auth=("aryanjainak@gmail.com","Iamreal@123"))
    if response.status_code == 200:
        return response.json()
    else:
        print("Failed to fetch data:", response.status_code)
        return None

def main():
    workspace_data = get_workspace()
    embeddings_model = OpenAIEmbeddings()
    """
    splitter = RecursiveJsonSplitter(max_chunk_size=300)
    json_chunks = splitter.split_json(json_data=workspace_data)
    print(json_chunks,'efef')
    """
    loader = JSONLoader(
    file_path=str(workspace_data),
    jq_schema='.messages[].content',
    )
    data = loader.load()
    embeddings = embeddings_model.embed_documents(data)
    vectorstore = Chroma.from_documents(embeddings, embedding=OpenAIEmbeddings())
    retriever = vectorstore.as_retriever()
    docs = retriever.get_relevant_documents("What is the name of my workspace?")```

#

basically ik what the issue is
but what even can i do
it's an required argument file_path

neat bluff Apr 30, 2024, 6:51 PM

#

Leaking your password to such a channel ain't the best idea

serene scaffold Apr 30, 2024, 6:52 PM

#

yeah, you should change your password for that API, since a bot has probably stolen it.

peak ridge Apr 30, 2024, 6:52 PM

#

neat bluff Leaking your password to such a channel ain't the best idea

localhost

serene scaffold Apr 30, 2024, 6:52 PM

#

once you've changed your password for that API, post the whole error message that you're getting, starting from Traceback.

peak ridge Apr 30, 2024, 6:52 PM

#

it's a localhost

#

and all passwords are of dummy db

neat bluff Apr 30, 2024, 6:53 PM

#

Yeah it's local host, but You posted your email as well. From Your reaction I supposed it's not a valid pass. It was just a friendly reminder to keep it in mind for the next time

serene scaffold Apr 30, 2024, 6:54 PM

#

@peak ridge in addition to the whole error message, can you also post all the import statements in that file?

peak ridge Apr 30, 2024, 6:54 PM

#

serene scaffold once you've changed your password for that API, post the whole error message tha...

FileNotFoundError: [Errno 2] No such file or directory: "PycharmProjects/Kleenestar/src/backend/[{'id': 6, 'root_user': {'id': 1, 'first_name': 'xyz', 'email': 'aryanjainak@gmail.com', 'last_name': 'Jain', 'is_active': True, 'profile': {'id': 1, 'user': 1, 'avatar': 'http:/127.0.0.1:8000/media/default.jpeg', 'country': None, 'phone_number': None, 'referral_code': '865083', 'total_referrals': 0}}, 'users': [{'id': 1, 'first_name': 'xyz', 'email': '2342@gmail.com', 'last_name': 'Jain', 'is_active': True, 'profile': {'id': 1, 'user': 1, 'avatar': 'http:/127.0.0.1:8000/media/default.jpeg', 'country': None, 'phone_number': None, 'referral_code': '865083', 'total_referrals': 0}}], 'business_name': 'Xyz', 'website_url': 'https:/www.xyz.com', 'industry': None, 'created_at': '2024-04-23T04:37:55.983893+05:30'}]"

peak ridge Apr 30, 2024, 6:55 PM

#

serene scaffold <@446207619837984777> in addition to the whole error message, can you also post ...

i did ,edited there

serene scaffold Apr 30, 2024, 6:55 PM

#

peak ridge ```FileNotFoundError: [Errno 2] No such file or directory: "PycharmProjects/Klee...

given this data structure

[{'id': 6, 'root_user': {'id': 1, 'first_name': 'xyz', 'email': 'aryanjainak@gmail.com', 'last_name': 'Jain', 'is_active': True, 'profile': {'id': 1, 'user': 1, 'avatar': 'http:/127.0.0.1:8000/media/default.jpeg', 'country': None, 'phone_number': None, 'referral_code': '865083', 'total_referrals': 0}}, 'users': [{'id': 1, 'first_name': 'xyz', 'email': '2342@gmail.com', 'last_name': 'Jain', 'is_active': True, 'profile': {'id': 1, 'user': 1, 'avatar': 'http:/127.0.0.1:8000/media/default.jpeg', 'country': None, 'phone_number': None, 'referral_code': '865083', 'total_referrals': 0}}], 'business_name': 'Xyz', 'website_url': 'https:/www.xyz.com', 'industry': None, 'created_at': '2024-04-23T04:37:55.983893+05:30'}]

is there anything here that should be the completion of "PycharmProjects/Kleenestar/src/backend/ ?

neat bluff Apr 30, 2024, 6:56 PM

#

JSON loader is requiring filepath because it's supposed to load the file from the harddrive

#

json.loads() is probably gonna solve your issue

#

And passing the actual data to it directly

peak ridge Apr 30, 2024, 6:56 PM

#

serene scaffold given this data structure ```py [{'id': 6, 'root_user': {'id': 1, 'first_name': ...

nothing as such
JSON loader is the issue

peak ridge Apr 30, 2024, 6:57 PM

#

neat bluff json.loads() is probably gonna solve your issue

ohh,
but as far as ik from the docs everybody is using this lib cuz it's returning Document(page_content= format lists

#

Document(page_content='Bye!', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat.json', 'seq_num': 1}),
     Document(page_content='Oh no worries! Bye', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat.json', 'seq_num': 2}),```
something like this

neat bluff Apr 30, 2024, 6:58 PM

#

Then just save the data from API as .txt/.json

serene scaffold Apr 30, 2024, 6:58 PM

#

peak ridge nothing as such JSON loader is the issue

it looks to me like workspace_data is that list, and you were passing it as a string.

peak ridge Apr 30, 2024, 6:58 PM

#

neat bluff Then just save the data from API as .txt/.json

i cant i would have alot of users wont make sense ig

peak ridge Apr 30, 2024, 6:58 PM

#

serene scaffold it looks to me like `workspace_data` is that list, and you were passing it as a ...

i actually passed the list but it doesnt accept list

neat bluff Apr 30, 2024, 6:58 PM

#

serene scaffold it looks to me like `workspace_data` is that list, and you were passing it as a ...

Cuz he doesn't have a filepath

peak ridge Apr 30, 2024, 6:58 PM

#

wait i'll show u, just a sec.

serene scaffold Apr 30, 2024, 6:59 PM

#

peak ridge i actually passed the list but it doesnt accept list

sure, but the solution wasn't to turn the list into a string. you can't just pass any string--it has to be a string that actually represents what you need.

peak ridge Apr 30, 2024, 6:59 PM

#

>>> from channels.rag import get_workspace
>>> get_workspace()
[{'id': 6, 'root_user': {'id': 1, 'first_name': 'xyz', 'email': 'aryanjainak@gmail.com', 'last_name': 'Jain', 'is_active': True, 'profile': {'id': 1, 'user': 1, 'avatar': 'http://127.0.0.1:8000/media/default.jpeg', 'country': None, 'phone_number': None, 'referral_code': '865083', 'total_referrals': 0}}, 'users': [{'id': 1, 'first_name': 'xyz', 'email': '123@gmail.com', 'last_name': 'Jain', 'is_active': True, 'profile': {'id': 1, 'user': 1, 'avatar': 'http://127.0.0.1:8000/media/default.jpeg', 'country': None, 'phone_number': None, 'referral_code': '865083', 'total_referrals': 0}}], 'business_name': 'Xyz', 'website_url': 'https://www.xyz.com', 'industry': None, 'created_at': '2024-04-23T04:37:55.983893+05:30'}]```

#

cools right?
json response

neat bluff Apr 30, 2024, 7:00 PM

#

It's a dictionary already

peak ridge Apr 30, 2024, 7:00 PM

#

serene scaffold sure, but the solution wasn't to turn the list into a string. you can't just pas...

hm, that's what i missed prolly

serene scaffold Apr 30, 2024, 7:00 PM

#

neat bluff It's a dictionary already

the outermost structure is a list

neat bluff Apr 30, 2024, 7:01 PM

#

serene scaffold the outermost structure is a list

True, my point is that this is not a JSON. JSON doesn't accept single quotes

peak ridge Apr 30, 2024, 7:01 PM

#

TypeError: expected str, bytes or os.PathLike object, not list
if i do this

def main():
    workspace_data = get_workspace()
    embeddings_model = OpenAIEmbeddings()
    """
    splitter = RecursiveJsonSplitter(max_chunk_size=300)
    json_chunks = splitter.split_json(json_data=workspace_data)
    print(json_chunks,'efef')
    """
    loader = JSONLoader(
    file_path=workspace_data,
    jq_schema='.messages[].content',
    )
    data = loader.load()
    print(data)```

serene scaffold Apr 30, 2024, 7:02 PM

#

neat bluff True, my point is that this is not a JSON. JSON doesn't accept single quotes

sure, but file_path needs to be a pathlib.Path, or a string that is a file path. passing a string that is a valid json will still cause an error.

peak ridge Apr 30, 2024, 7:02 PM

#

is my approach very terrible? @serene scaffold @neat bluff
isnt how u guys do rag

#

im from Django background
web dev databases bg

#

im learning all this for our startup

#

just a early startup

neat bluff Apr 30, 2024, 7:03 PM

#

serene scaffold sure, but `file_path` needs to be a `pathlib.Path`, or a string that is a file p...

I know, I've pointed it out earlier to him earlier. jsonloader is clearly designed to only read files from hard drive

peak ridge Apr 30, 2024, 7:03 PM

#

neat bluff I know, I've pointed it out earlier to him earlier. jsonloader is clearly design...

i can just use

#

json.loads() ?

neat bluff Apr 30, 2024, 7:03 PM

#

My best guess would be to try. What's the worst that can happen.

peak ridge Apr 30, 2024, 7:04 PM

#

neat bluff My best guess would be to try. What's the worst that can happen.

💯 smart guy

#

actually i did, but imma do again

#

def main():
    workspace_data = get_workspace()
    embeddings_model = OpenAIEmbeddings()
    """
    splitter = RecursiveJsonSplitter(max_chunk_size=300)
    json_chunks = splitter.split_json(json_data=workspace_data)
    print(json_chunks,'efef')
    """
    loader = json.loads(workspace_data)
    print(loader,'xyz')
    data = loader.load()
    print(data)```
TypeError: the JSON object must be str, bytes or bytearray, not list

neat bluff Apr 30, 2024, 7:05 PM

#

Turn it to a string now

peak ridge Apr 30, 2024, 7:05 PM

#

didnt even got 1 print
so like it even didnt work

neat bluff Apr 30, 2024, 7:05 PM

#

As you did earlier

peak ridge Apr 30, 2024, 7:05 PM

#

ohh

#

json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 3 (char 2)

neat bluff Apr 30, 2024, 7:06 PM

#

So it worked

peak ridge Apr 30, 2024, 7:06 PM

#

neat bluff So it worked

umm

#

did it?

#

it didnt printed the result tho?

neat bluff Apr 30, 2024, 7:07 PM

#

But as I said already - JSON doesn't accept the ' as value surroundings (i have no clue what they are called)

[{"id": 6, "root_user": {"id": 1, "first_name": "xyz", "email": "aryanjainak@gmail.com", "last_name": "Jain", "is_active": True, "profile": {"id": 1, "user": 1, "avatar": "http://127.0.0.1:8000/media/default.jpeg", "country": None, "phone_number": None, "referral_code": "865083", "total_referrals": 0}}, "users": [{"id": 1, "first_name": "xyz", "email": "123@gmail.com", "last_name": "Jain", "is_active": True, "profile": {"id": 1, "user": 1, "avatar": "http://127.0.0.1:8000/media/default.jpeg", "country": None, "phone_number": None, "referral_code": "865083", "total_referrals": 0}}], "business_name": "Xyz", "website_url": "https://www.xyz.com", "industry": None, "created_at": "2024-04-23T04:37:55.983893+05:30"}]

#

It has to look like this instead

peak ridge Apr 30, 2024, 7:08 PM

#

so what should i do sir

#

def main():
    workspace_data = get_workspace()
    embeddings_model = OpenAIEmbeddings()
    """
    splitter = RecursiveJsonSplitter(max_chunk_size=300)
    json_chunks = splitter.split_json(json_data=workspace_data)
    print(json_chunks,'efef')
    """
    loader = json.loads(str(workspace_data))
    print(loader,'xyz')
    data = loader.load()
    print(data)
    embeddings = embeddings_model.embed_documents(data)
    vectorstore = Chroma.from_documents(embeddings, embedding=OpenAIEmbeddings())
    retriever = vectorstore.as_retriever()
    docs = retriever.get_relevant_documents("What is the name of my workspace?")```
this is how it looks rn

neat bluff Apr 30, 2024, 7:08 PM

#

I would save it as a raw string in your code

#

Lemme do it for ya

#

def main():
    workspace_data = get_workspace()
    embeddings_model = OpenAIEmbeddings()
    """
    splitter = RecursiveJsonSplitter(max_chunk_size=300)
    json_chunks = splitter.split_json(json_data=workspace_data)
    print(json_chunks,'efef')
    """
    workspace_data = '[{"id": 6, "root_user": {"id": 1, "first_name": "xyz", "email": "aryanjainak@gmail.com", "last_name": "Jain", "is_active": True, "profile": {"id": 1, "user": 1, "avatar": "http://127.0.0.1:8000/media/default.jpeg", "country": None, "phone_number": None, "referral_code": "865083", "total_referrals": 0}}, "users": [{"id": 1, "first_name": "xyz", "email": "123@gmail.com", "last_name": "Jain", "is_active": True, "profile": {"id": 1, "user": 1, "avatar": "http://127.0.0.1:8000/media/default.jpeg", "country": None, "phone_number": None, "referral_code": "865083", "total_referrals": 0}}], "business_name": "Xyz", "website_url": "https://www.xyz.com", "industry": None, "created_at": "2024-04-23T04:37:55.983893+05:30"}]'
    loader = json.loads(str(workspace_data))
    print(loader,'xyz')
    data = loader.load()
    print(data)
    embeddings = embeddings_model.embed_documents(data)
    vectorstore = Chroma.from_documents(embeddings, embedding=OpenAIEmbeddings())
    retriever = vectorstore.as_retriever()
    docs = retriever.get_relevant_documents("What is the name of my workspace?")

peak ridge Apr 30, 2024, 7:15 PM

#

neat bluff ```python def main(): workspace_data = get_workspace() embeddings_model ...

same error

neat bluff Apr 30, 2024, 7:15 PM

#

wdym same error

peak ridge Apr 30, 2024, 7:16 PM

#

neat bluff wdym same error

json.decoder.JSONDecodeError: Expecting value: line 1 column 124 (char 123)

neat bluff Apr 30, 2024, 7:17 PM

#

Alright I've forgot one thing

#

Is this API on the other side written by You?

peak ridge Apr 30, 2024, 7:17 PM

#

neat bluff Is this API on the other side written by You?

yes

#

it's a web-application api

neat bluff Apr 30, 2024, 7:18 PM

#

Mind showing me the code where You define and return said "JSON" data

peak ridge Apr 30, 2024, 7:18 PM

#

class WorkSpacesViewSet(viewsets.ModelViewSet):
    #permission_classes = (permissions.WorkSpaceViewSetPermissions,)
    serializer_class = WorkSpaceSerializer

    def get_queryset(self):
        # All the workspaces the request user is a member of
        return self.request.user.workspace_set.all()```

#

its written in django

#

django-rest framework*

neat bluff Apr 30, 2024, 7:18 PM

#

I can see that

peak ridge Apr 30, 2024, 7:18 PM

#

i can show u the response too
in postman if u want

#

via get req

neat bluff Apr 30, 2024, 7:19 PM

#

Wait let me think

#

Mind editing the code of an API a bit?

def get_queryset(self):
        # All the workspaces the request user is a member of
        userWorkspaces = self.request.user.workspace_set.all()
        return json.dumps(userWorkspaces)```

#

No clue if this will not crash

peak ridge Apr 30, 2024, 7:21 PM

#

neat bluff Mind editing the code of an API a bit? ```python def get_queryset(self): ...

🧐
dont u think the web-application will have trouble

#

who cares,
i can try

neat bluff Apr 30, 2024, 7:21 PM

#

peak ridge 🧐 dont u think the web-application will have trouble

"No clue if this will not crash"

peak ridge Apr 30, 2024, 7:22 PM

#

np

neat bluff Apr 30, 2024, 7:22 PM

#

I suppose it's not production deployed yet so hence my "no worries" debugging approach

peak ridge Apr 30, 2024, 7:23 PM

#

ya the web-app crashed

peak ridge Apr 30, 2024, 7:23 PM

#

neat bluff I suppose it's not production deployed yet so hence my "no worries" debugging ap...

ya

neat bluff Apr 30, 2024, 7:23 PM

#

peak ridge ya the web-app crashed

Error? I suppose You imported json

peak ridge Apr 30, 2024, 7:23 PM

#

neat bluff Error? I suppose You imported json

TypeError: Object of type QuerySet is not JSON serializable

#

we could turn it into json

#

but response.json()
(already doing it)

neat bluff Apr 30, 2024, 7:24 PM

#

That's what we are actually trying to do.

neat bluff Apr 30, 2024, 7:24 PM

#

peak ridge but response.json() (already doing it)

It's clearly not. Whatever You receive on the other side is not JSON.

peak ridge Apr 30, 2024, 7:24 PM

#

we are calling the api in get_workspace

peak ridge Apr 30, 2024, 7:24 PM

#

neat bluff It's clearly not. Whatever You receive on the other side is not JSON.

hm

neat bluff Apr 30, 2024, 7:25 PM

#

If it would be JSON there wouldn't be a problem loading it using json.loads()

peak ridge Apr 30, 2024, 7:26 PM

#

neat bluff If it would be JSON there wouldn't be a problem loading it using json.loads()

actually you are right

#

actually im just trying with this workspace data

#

i wont really use this

#

i am calling users dynamic data,
there marketing data via marketing-channels api's

#

and i am storing it on my db and i wanna pass it

#

im just trying with workspace data

neat bluff Apr 30, 2024, 7:29 PM

#

So if I understand correctly - we are trying to fix a mockup which isn't gonna be the final data managed by this code?

#

pithink

peak ridge Apr 30, 2024, 7:29 PM

#

neat bluff So if I understand correctly - we are trying to fix a mockup which isn't gonna b...

yes

#

💀

#

but that data will also come from API request

#

or via db query (within an API)

#

i can change approach
like rn im calling via api

#

i can call via db directly if it works

#

it just needs to work

neat bluff Apr 30, 2024, 7:30 PM

#

Anyway, the fact that json.loads() isn't able to load QuerySet (as stated in the error log) it doesn't mean that it won't be able able to parse it when we first treat it with some DICting...

peak ridge Apr 30, 2024, 7:30 PM

#

or my company will die

neat bluff Apr 30, 2024, 7:31 PM

#

Because now that I think about it... it probably didn't even try to do it because of uncompatible data type

peak ridge Apr 30, 2024, 7:31 PM

#

neat bluff Because now that I think about it... it probably didn't even try to do it becaus...

true

#

but can i call it via db queries

#

more impossible

neat bluff Apr 30, 2024, 7:32 PM

#

peak ridge or my company will die

Funny thing is that I am doing similar thing and had similar issues, but in different area of interest

peak ridge Apr 30, 2024, 7:32 PM

#

on the docs i saw these options

peak ridge Apr 30, 2024, 7:32 PM

#

neat bluff Funny thing is that I am doing similar thing and had similar issues, but in diff...

you understand my pain

#

are u guys using python for the backends?

neat bluff Apr 30, 2024, 7:33 PM

#

Well I am one man army beside my frontend design guy.

peak ridge Apr 30, 2024, 7:33 PM

#

im also the alone backed guy
but we have 2 interns on the frontend

#

and my co-founder is designer

#

and we have a pretty decent access to investors,market product

neat bluff Apr 30, 2024, 7:34 PM

#

Is that a SaaS You are trying to build?

peak ridge Apr 30, 2024, 7:34 PM

#

my co-founder has 1 more product 5k users

peak ridge Apr 30, 2024, 7:34 PM

#

neat bluff Is that a SaaS You are trying to build?

yes sir
https://kleenestar.io

KleeneStar - Marketing analytics conversational AI

Marketing data automation platform powered by AI designed to enhance brand visibility by offering strategic insights and recommendations based on in-depth real-time analysis.

#

lol, i hate that too

#

these designers are crazy they love that

neat bluff Apr 30, 2024, 7:35 PM

#

That's the one You are building rn or the one of Your friend?

#

Cuz it looks fucking fancy. That's for sure

peak ridge Apr 30, 2024, 7:36 PM

#

yes, we are.

#

we have crazy funds and access too

#

180 pre-registered users

#

my co-founder is gr8 guy.

#

glad to have him

neat bluff Apr 30, 2024, 7:37 PM

#

Alright. Now I feel dedicated to fix this crap

peak ridge Apr 30, 2024, 7:37 PM

#

yes, we gotta do it sir.

neat bluff Apr 30, 2024, 7:37 PM

#

Cuz maybe we will help each other in this crazy world of building SaaS

peak ridge Apr 30, 2024, 7:37 PM

#

ya, we talk daily about everything

#

and work

lapis sequoia May 1, 2024, 1:15 AM

#

Should people speedrun their data stuff? https://youtu.be/x82Ze21aQ2E?si=PAFuaMkcwUtgqmD6

YouTube

here we go

Sub_1 hour record

Uh, choked hard during the yfinance cluster split, nonetheless good run. You have to speed run your sets. You have to. It means absolutely nothing if you do not. Come on, break my Reekie.

▶ Play video

tacit basin May 1, 2024, 2:58 AM

#

https://course.fast.ai/

Practical Deep Learning for Coders

Practical Deep Learning for Coders - Practical Deep Learning

A free course designed for people with some coding experience, who want to learn how to apply deep learning and machine learning to practical problems.

#

Data Science in Python

Elements of Data Science

An introduction to data science designed for people with no programming experience, this book presents a small, powerful subset of Python that allows you to do real work in data science as quickly as possible. It includes Jupyter notebooks where you can read the text, run the code, and work on exercises to practice what you learn.
https://allendowney.github.io/ElementsOfDataScience/README.html

#

https://huggingface.co/CohereForAI/c4ai-command-r-plus

CohereForAI/c4ai-command-r-plus · Hugging Face

#

Install it with pip and run it then compare it to standard python repl.
pip install ipython

dawn light May 1, 2024, 5:45 AM

#

Can anyone point me to the right direction,

I'm trying to build a model that matches a book's paragraphs in one language with the matching paragraphs in a translation (for example, let's take the little prince's english version and its japanese translated version)
The idea would be to create a version of a book where its original and translation are laid out side by side for language learning

I'm not too sure yet how to approach this kind of problem (what model to use, what kind of problem it is, etc.) so i'd appreciate some guidance

as of now, my idea would be to vectorize/tokenize the words, compute something like a vector sum per paragraphs, then maybe match the resultant vector using a dot product with the vectors in the other language, the thing tho is that since these are two different languages, the way the words would be vectorized would probably result in vectors where the dimensions aren't the same, so not yet sure how to deal with that

TLDR: I'd like to create a model that automates the creation of something like this: http://bilinguis.com/book/alice/jp/en/c1/ where the model aligns the text from an original language to an official human-translated text

Any suggestions would be appreciated!

peak ridge May 1, 2024, 6:11 AM

#

😐

peak ridge May 1, 2024, 6:12 AM

#

serene scaffold sure, but `file_path` needs to be a `pathlib.Path`, or a string that is a file p...

so what can i do now

buoyant folio May 1, 2024, 8:11 AM

#

How can i speed this loop up alot?:
how do i speed up this loop, it needs to be very fast, so i can run it like 120 times a second:

`def RSI_strategy_numba(data: pd.DataFrame, rsi_values, indicators) -> tuple[list[pd.DatetimeIndex], list[pd.DatetimeIndex]]:
buy_dates, sell_dates, state = [], [], 0
for idx, rsi in zip(data.index.values, rsi_values):
# If were in the buy state, check for a buy
if rsi > indicators[0] and state == 0:
buy_dates.append(idx); state = 1
# Otherwise check for a sell
elif rsi < indicators[1] and state == 1:
sell_dates.append(idx); state = 0
return buy_dates, sell_dates`

wooden sail May 1, 2024, 8:21 AM

#

buoyant folio How can i speed this loop up alot?: how do i speed up this loop, it needs to be ...

appending to dataframes (and numpy arrays) is always slow and generally not recommended. it's better if you use dicts or lists, and if you still need a dataframe at the end, convert the final result to a dataframe

buoyant folio May 1, 2024, 8:22 AM

#

I am not appending to the dataframe. The problem i believe is the size of the dataframe

#

its around 43000 lines

#

I would like to use numpy's faster vectorization, but i cant figure out how

wooden sail May 1, 2024, 8:24 AM

#

ah true, that's what i get for not reading carefully

#

what type is rsi_values?

buoyant folio May 1, 2024, 8:25 AM

#

it's a numpy array of values. the dataframe should look have datatime as index, and then a column for rsi. Rsi values is coming from dataframe['rsi'].values()

wooden sail May 1, 2024, 8:26 AM

#

it does seem like you need the state from the previous result, but that's not a big problem here

#

what's the type of indicators[0]?

buoyant folio May 1, 2024, 8:27 AM

#

thats just a list of floats.

wooden sail May 1, 2024, 8:27 AM

#

ok

buoyant folio May 1, 2024, 8:27 AM

#

i have a genetic algorithm which generates it for me

wooden sail May 1, 2024, 8:27 AM

#

then you can compare the entire rsi_values against indicators[0]

buoyant folio May 1, 2024, 8:28 AM

#

yeah, using numpy,where

wooden sail May 1, 2024, 8:28 AM

#

just rsi_values > indicators[0] yields a vector of booleans with all the results

buoyant folio May 1, 2024, 8:28 AM

#

or that

wooden sail May 1, 2024, 8:29 AM

#

you can similarly compute the state for all indices together, though this is a bit more tricky because each state depends on the previous one. it may be that you cannot avoid doing this in a loop, but you could rewrite it as a convolution at least

#

that means you can do all of these operations without any explicit for loops

buoyant folio May 1, 2024, 8:29 AM

#

Idk how that'd work, im quite new to numpy

#

I guess i could calculate the checks using rsi_values > indicators[0]

#

but then from there, how do i loop that without explicitly using a for loop?

wooden sail May 1, 2024, 8:33 AM

#

hmm if you're not familiar with convolutions then there isn't a much better way than what you're already doing

#

you could get a speedup by avoiding resizing the lists. you can initialize them with 0s

buoyant folio May 1, 2024, 8:34 AM

#

I'll try to press chatgpt for answers on convolutions tomorrow. But for now ill go to bed. Thx for the help!

wicked vessel May 1, 2024, 9:29 AM

#

"Hey, I'm about to start my journey into AI and ML! If anyone else is starting from scratch and wants to join a group for group study, let's create a study group together. Together, we can learn the basics and support each other along the way. Excited to start this journey with like-minded individuals!"

trim saddle May 1, 2024, 12:27 PM

#

wicked vessel "Hey, I'm about to start my journey into AI and ML! If anyone else is starting f...

Andrej Karpathys yt series is a great starter for intuitiv Neural Network Basics. Theres also a discord learning community there

analog bolt May 1, 2024, 12:51 PM

#

if I made a machine learning algorithm and gave it info about jokes that I find funny and jokes I don't, would it be able to generate new jokes that I would find funny the majority of the time?

serene scaffold May 1, 2024, 12:57 PM

#

analog bolt if I made a machine learning algorithm and gave it info about jokes that I find ...

depends on the model's ability to learn properties of jokes that discriminate between ones that you do or do not find funny. it would probably take more training data than you would want to produce.

analog bolt May 1, 2024, 1:22 PM

#

serene scaffold depends on the model's ability to learn properties of jokes that discriminate be...

How long do you think it'd take just going through and collecting training data?

serene scaffold May 1, 2024, 1:35 PM

#

analog bolt How long do you think it'd take just going through and collecting training data?

you'd have to find a dataset of "jokes" and go through (perhaps in excel) and label each one as funny or not funny to you. I would expect to spend at least several hours doing that.

wicked vessel May 1, 2024, 3:20 PM

#

trim saddle Andrej Karpathys yt series is a great starter for intuitiv Neural Network Basics...

Thanks for your information 😊

past meteor May 1, 2024, 3:24 PM

#

analog bolt How long do you think it'd take just going through and collecting training data?

If you have no clue of how large your dataset needs to be a priori what you can always do is start with a small sample, check the results, increase the sample etc. until you no longer really improve

peak ridge May 1, 2024, 4:33 PM

#

hm

neat bluff May 1, 2024, 4:36 PM

#

tacit basin https://huggingface.co/CohereForAI/c4ai-command-r-plus

Thanks a bunch. It might be an overkill for my use-case, but I will check it out regardless.

timid kiln May 1, 2024, 4:37 PM

#

@serene scaffold I probably should have tagged you. Perhaps you can shed some light on how I can analyze the data?

past meteor May 1, 2024, 4:38 PM

#

peak ridge hm

basically, you find 10 jokes you find funny and 10 you don't and you give that to the algorithm. You let it produce 10 funny and unfunny jokes. If all of them are spot on, that means you're done. This will likely not be the case, You'll have to find more funny and unfunny jokes (say 10 more) and repeat with 20, you keep doing this in a loop until you're satisfied

peak ridge May 1, 2024, 5:16 PM

#

past meteor basically, you find 10 jokes you find funny and 10 you don't and you give that t...

genius

neat bluff May 1, 2024, 5:45 PM

#

past meteor basically, you find 10 jokes you find funny and 10 you don't and you give that t...

AI is not able to understand human humour btw

buoyant folio May 1, 2024, 5:45 PM

#

but it can train to simulate his

past meteor May 1, 2024, 5:47 PM

#

neat bluff AI is not able to understand human humour btw

this is basically philosophy because you need to define what you mean by "understand" to have this discussion (and honestly, these are my least favourite ones)

#

Depending on your definition of understand the answer is either yes or no

tropic kettle May 1, 2024, 6:33 PM

#

You know how in unicode different characters have different codes like
I think
A is 01000001
a is 01100001
(Just an example, probably wrong code)
My question is does any character take more storage space than another or do they all take up the same uniform space

agile cobalt May 1, 2024, 6:35 PM

#

some """characters""" do take up more space, but when talking about things at this level of detail, your notion of what a character is starts clashing with formal definitions

#

!e ```py
examples = ['A', 'Á', '猫']
for example in examples:
print(example, len(example), example.encode('UTF-8'), len(example.encode('UTF-8')))
print(example, len(example), example.encode('UTF-16'), len(example.encode('UTF-16')))
print(example, len(example), example.encode('UTF-32'), len(example.encode('UTF-32')))

arctic wedgeBOT May 1, 2024, 6:36 PM

#

@agile cobalt :white_check_mark: Your 3.12 eval job has completed with return code 0.

001 | A 1 b'A' 1
002 | A 1 b'\xff\xfeA\x00' 4
003 | A 1 b'\xff\xfe\x00\x00A\x00\x00\x00' 8
004 | Á 1 b'\xc3\x81' 2
005 | Á 1 b'\xff\xfe\xc1\x00' 4
006 | Á 1 b'\xff\xfe\x00\x00\xc1\x00\x00\x00' 8
007 | 猫 1 b'\xe7\x8c\xab' 3
008 | 猫 1 b'\xff\xfe+s' 4
009 | 猫 1 b'\xff\xfe\x00\x00+s\x00\x00' 8

tropic kettle May 1, 2024, 6:38 PM

#

Wow, I just lied to someone basically, thanks bros

agile cobalt May 1, 2024, 6:38 PM

#

I recommend reading up on how Rust handles Unicode data and strings
it is pretty insightful even if you don't plan to ever use Rust

desert oar May 1, 2024, 6:42 PM

#

tropic kettle You know how in unicode different characters have different codes like I think ...

in utf-8 specifically the answer is "yes" -- some characters are 1 byte, some are 2, etc.

in python the answer is "maybe" because (i think) strings use a fixed width for each code point, auto-resizing their character width as needed. so it acts like utf-32 functionally, but in practice the storage size might be more like ascii if the characters are all 1 codepoint (which can be represented in 1 byte). however don't quote me on this because i don't remember where i read it.

neat bluff May 1, 2024, 6:42 PM

#

past meteor this is basically philosophy because you need to define what you mean by "unders...

It's a "no" everytime actually. LLM's are able to generate jokes only out of existing ones - it's not able to generate new, unique or trend-based jokes. It will be able to operate only within existing context and by merging/changing the jokes it's been trained with.

desert oar May 1, 2024, 6:43 PM

#

neat bluff It's a "no" everytime actually. LLM's are able to generate jokes only out of exi...

you could build a system that does something like search for jokes, which then gets appended to the current context, from which the model can then generate new jokes. "LLMs are zero-shot learners" and all. but that's not the LLM itself, that's a bigger system built on top of the LLM.

past meteor May 1, 2024, 6:45 PM

#

ML (not just LLMs) is capable of taking 2 existing concepts and string them together to a 3rd, novel concept

#

this is a nice image

diPSq2pZ9QKP1L9tR5FXbx7XP35ZVe7wwSZ5JycnJw39By3yVxvbbOCyAAAAAElFTkSuQmCC.png

#

given 2 existing jokes it can produce a 3rd new joke

desert oar May 1, 2024, 6:47 PM

#

@tropic kettle note that python does not equal numpy does not equal apache arrow.

i actually don't know exactly how numpy stores its strings, but they behave like fixed-size UCS-4 fields, i.e. UTF-32, so bigger codepoints shouldn't take up more space but strings will tend to be large (and are un-ergonomic to work with due to the fixed field size)

i don't think arrow has a native string data type, but polars for example uses utf-8 and is backed by arrow. i assume the pandas arrow-backed dtype is also utf-8 but you'd have to dig around in their docs or source code for that info

and of course this is all irrelevant if you're interested in how databases store your text (depends on the database), or file size at rest (you choose the encoding + compression), or data size over the wire (same as file size, your choice)

neat bluff May 1, 2024, 7:41 PM

#

past meteor given 2 existing jokes it can produce a 3rd new joke

It's not really new then, is it?

violet gull May 1, 2024, 10:04 PM

#

Running into a weird issue here.
Im training an AI to make number based predictions. It fits to the data perfectly shown by a final loss value of
0.00026427686376862626
Then on a test value it also "perfectly" predicts it.

Expected normalized number: [-0.9962218830221753]```
with a loss value of `0.000004458501602317615`
except when I un-normalize the value it is completely off of the target. even though the loss is extremely low. 
```prediction: [170.93668721399916]
expected: [696.299988]``` So everything did its job correctly. I already verified that the normalization stuff works. I believe the issue is caused by precision and the range of the upper and lower bounds of the normalization being huge.

buoyant vine May 1, 2024, 10:18 PM

#

I'm not sure if you have the model train on the normalized value but then pass it the non-normalized value

#

since that effectively completely changes how the data looks and can change the pattern

fallow coyote May 1, 2024, 10:23 PM

#

Just want to ask, in the pinned section, someone recommended three resources for learning maths for ML/AI. Will they be enough to at least have a good maths base for ML/AI or does anyone recommend any alternate sources?

violet gull May 1, 2024, 11:13 PM

#

buoyant vine I'm not sure if you have the model train on the normalized value but then pass i...

no it is given the normalized everything

desert oar May 1, 2024, 11:16 PM

#

fallow coyote Just want to ask, in the pinned section, someone recommended three resources for...

What is a "good maths base" in your case?

#

In general you're looking for strong numeracy fundamentals, and specifically a good handle on undergrad-level calculus, linear algebra, probability, and statistics.

#

for "AI" specifically you probably don't need much statistics and can skimp on probability a little bit, but for generalist DS you do need both.

fallow coyote May 1, 2024, 11:43 PM

#

desert oar What is a "good maths base" in your case?

As in learning the required 'basic' maths for ML/AI. In my current situation, Im relearning my A level (high school) maths just to understand the general concepts before going onto learning the uni level maths

#

Im decent at maths. Im relearning it pretty quickly but it has been a few years. Only want to get into ML/AI cos its interesting and maybe useful for me in the future so might as well start now

winged yew May 2, 2024, 3:09 AM

#

anyone knows how to install tensorflow-gpu on windows ??

frail heart May 2, 2024, 3:19 AM

#

https://youtu.be/IHZwWFHWa-w?si=ymfibEI1iRHxjHf7&t=290

I am watching 3blue1brown's neural network video ep 2, and I am wondering how he got 13,002 weights/biasas for the parameters for his neural network. When I calculate it I get 12,963.

YouTube

3Blue1Brown

Gradient descent, how neural networks learn | Chapter 2, Deep learning

Enjoy these videos? Consider sharing one or two.
Help fund future projects: https://www.patreon.com/3blue1brown
Special thanks to these supporters: http://3b1b.co/nn2-thanks
Written/interactive form of this series: https://www.3blue1brown.com/topics/neural-networks

This video was supported by Amplify Partners.
For any early-stage ML startup fo...

▶ Play video

serene scaffold May 2, 2024, 3:34 AM

#

frail heart https://youtu.be/IHZwWFHWa-w?si=ymfibEI1iRHxjHf7&t=290 I am watching 3blue1brow...

your number appears to be correct if the output layer had 9 nodes, but it has 10 (they're numbered on the screen starting from zero)

#

In [10]: (784 * 16) + (16 * 16) + (16 * 10) + (16 + 16 + 10)
Out[10]: 13002

#

the last term here are the biases.

frail heart May 2, 2024, 3:36 AM

#

Thanks

serene scaffold May 2, 2024, 3:37 AM

#

yw

craggy coral May 2, 2024, 4:22 AM

#

serene scaffold ```py In [10]: (784 * 16) + (16 * 16) + (16 * 10) + (16 + 16 + 10) Out[10]: 1300...

makes my head spin

#

just started getting into data science and stuff

#

im 17

serene scaffold May 2, 2024, 4:29 AM

#

craggy coral makes my head spin

A weight is a connection between two nodes. And each non-input node has a bias.

#

Why does it make your head spin? Your initial calculation was correct except that you were missing one output node.

hasty grail May 2, 2024, 4:42 AM

#

winged yew anyone knows how to install tensorflow-gpu on windows ??

TensorFlow >2.10 no longer supports GPU on Windows. You'll have to run it inside WSL2

#

Please see: https://www.tensorflow.org/install/pip#windows-native

TensorFlow

Install TensorFlow with pip

winged yew May 2, 2024, 4:50 AM

#

hasty grail Please see: https://www.tensorflow.org/install/pip#windows-native

If I install <2.11 version of tensorflow will it work then???

hasty grail May 2, 2024, 4:51 AM

#

Yes but then you'd be stuck with old content, which will be detrimental in the long run (especially with how fast-moving the field is)

#

better to bite the bullet now and set up a WSL2 environment

past meteor May 2, 2024, 6:15 AM

#

fallow coyote Just want to ask, in the pinned section, someone recommended three resources for...

Yes, the book math for machine learning will give you enough maths to understand the majority of canonical methods

frail heart May 2, 2024, 8:46 AM

#

serene scaffold Why does it make your head spin? Your initial calculation was correct except tha...

That ain't me.

buoyant vine May 2, 2024, 11:15 AM

#

On the topic of TF, why do people still use TF over PyTorch? It seems for the most part Torch just dominates over TF both in speed and available tooling now.

I remember years ago it was the other way around, but since TF3 it seems PyTorch easily beats TF in almost every situation?

agile cobalt May 2, 2024, 11:28 AM

#

TF3? do you mean tf2?

#

the biggest reason is probably just momentum, but iirc it has a few niche advantages like easier/more mature deployment to web and edge devices

buoyant vine May 2, 2024, 11:32 AM

#

yeah sorry

#

for some reason I have it in my head that TF is v3 and v2 was the old version

agile cobalt May 2, 2024, 11:33 AM

#

pithink TIL https://pytorch.org/executorch-overview is a thing though

ExecuTorch alpha release also provides early support for the recently announced Llama 3 8B along with demonstrations on how to run this model on an iPhone 15 Pro and a Samsung Galaxy S24 mobile phone.

PyTorch

PyTorch ExecuTorch

buoyant vine May 2, 2024, 11:33 AM

#

agile cobalt the biggest reason is probably just momentum, but iirc it has a few niche advant...

my experience of that has been to opposite tbh

#

and ease of converting models to onnx, quantizing etc...

agile cobalt May 2, 2024, 11:34 AM

#

buoyant vine my experience of that has been to opposite tbh

specifically

web and edge devices
or did you do it in torch? iirc it doesn't supports web directly
(I mean embed, not creating an api)

buoyant vine May 2, 2024, 11:35 AM

#

we go straight to onnx models

#

Now, my experience with just pytorch on edge or embedded devices is bad because PyTorch feeels huge and bulky

#

but exporting to onnx and then embedding onnxruntime or what not experience wise is better than TFLite

agile cobalt May 2, 2024, 11:38 AM

#

pithink yeah idk then

past meteor May 2, 2024, 11:54 AM

#

buoyant vine On the topic of TF, why do people still use TF over PyTorch? It seems for the mo...

TF/Keras is arguably easier to use and gets you results a tad faster

#

And it's also what some of us learnt in uni 👈

#

But I've since switched to Torch, all in all they're quite similar and I'd just recommend the vast majority to use Torch (in conjunction with lightning)

potent sky May 2, 2024, 12:23 PM

#

buoyant vine but exporting to onnx and then embedding onnxruntime or what not experience wise...

This is what we do too for edge
I personally used to use Tflite earlier, and haven't used it recently. But Pytorch -> onnx is pretty straightforward

buoyant vine May 2, 2024, 12:23 PM

#

yeah, and portability wise very nice

potent sky May 2, 2024, 12:24 PM

#

plus who knows when Google will kill something

#

the import mechanism change in tf 2.6, when they changed keras to a separate python package broke a lot of things and made it generally frustrating to use tf.
imo before that things were looking good with the subclassing api and the functional API.
But that was what finally forced me to completely pivot to pytorch as my primary.
Haven't really used tf much since.
It's a shame because I was actively excited for tf, even contributed a few smol things iirc.

past meteor May 2, 2024, 1:56 PM

#

potent sky the import mechanism change in tf 2.6, when they changed keras to a separate pyt...

Yeah that was exactly the reason why I switched. The constant breaking changes

#

And the docs that don't follow them etc.

runic parcel May 2, 2024, 3:38 PM

#

can anyone tell me how can i use isochrone?

serene scaffold May 2, 2024, 3:46 PM

#

frail heart That ain't me.

o shit, u rite

serene scaffold May 2, 2024, 3:47 PM

#

buoyant vine On the topic of TF, why do people still use TF over PyTorch? It seems for the mo...

I never see anyone at work use TF. I only see TF in questions asked on this server. So I suspect that many beginner tutorials were written for TF in the past.

jaunty valve May 2, 2024, 6:19 PM

#

hey all, im building this dev tool to transform scrappy python code to code that follows best practices by using LLMs and AI.
still in beta but would love to get feedback from python practitioners and people in AI
https://gitgud.autonoma.app/
any best practice im missing? is the output good quality for prod?

GitGud

Copilot for real developers

tame blade May 2, 2024, 8:11 PM

#

jaunty valve hey all, im building this dev tool to transform scrappy python code to code that...

like the idea, and your ui is really impressive

#

nvm there was a bug, fixed now

desert oar May 2, 2024, 8:23 PM

#

serene scaffold I never see anyone at work use TF. I only see TF in questions asked on this serv...

Nobody switching back to Keras?

spring field May 2, 2024, 8:27 PM

#

I don't remember asking it to complicate the code beyond recognition (take it lightheartedly, lol)... all I really wanted it to do was to just go from for key in dct: value = dct[key] to for value in dct.values(): pass or for key, value in dct.items(): pass
anyway, this is most certainly not ready for production, at least I can't imagine trusting it, also doing weird stuff like this might make some tests fail and then you have to rewrite those or it can change the ast in unexpected ways oh and after all, didn't use .items(), I even tried being a bit more explicit about the usage and even then... though it did produce less clutter that time. the logging is frankly way too much IMO and also where are the two blank lines around function definitions 🙂
also constants appear to get lowercased for some reason
it does seem to work somewhat better with a bit more code than with those tiny samples I provided in the screenshots
also also, it seems to quite arbitrarily get rid of some comments... that's definitely not ideal
dunno, there's certainly room for improvement I guess

jaunty valve May 2, 2024, 8:44 PM

#

spring field I don't remember asking it to complicate the code beyond recognition (take it li...

Thank you very much for the taking the time to test it and share observations Matiiss! This helps us a lot to iterate the solution. Sharing your comments with the team 🙂

craggy coral May 3, 2024, 2:18 AM

#

guys im new to data science so far i understand data mining and getting unstructured data but i dont understand the part where u
use python or ai to structure the data and get key insights can anyone explain that?

serene scaffold May 3, 2024, 2:31 AM

#

craggy coral guys im new to data science so far i understand data mining and getting unstruct...

"data mining" is really just a buzzword. I wouldn't put any stock into it.

If you have a bunch of reddit messages, that's semi-structured data, since you know who wrote each message, and when, and which message was in response to which. But the messages themselves are unstructured data in natural language (English, or what have you). If you were to identify all the locations that are mentioned in each message, then you'd have structured data.

#

What kind of structured data you might want to extract from unstructured or semi-structured data depends on who you are, and what data you have or can obtain, and what your goal is. Retail companies might want to obtain structured data about what people think about their products.

craggy coral May 3, 2024, 2:50 AM

#

serene scaffold What kind of structured data you might want to extract from unstructured or semi...

so in this case for the retail company u would write a program that looks into the Databse or .txt file and extracts information that only mentions what people think about the product?

serene scaffold May 3, 2024, 2:53 AM

#

craggy coral so in this case for the retail company u would write a program that looks into t...

Potentially.

craggy coral May 3, 2024, 3:23 AM

#

serene scaffold Potentially.

is there a good video or channel explaining these stuff

lapis sequoia May 3, 2024, 5:27 AM

#

What are the levels to NLP?

narrow tiger May 3, 2024, 5:27 AM

#

what is difference between data science, machine learning and AI

past meteor May 3, 2024, 6:25 AM

#

narrow tiger what is difference between data science, machine learning and AI

I'd say that AI is about creating algorithms capable of complex decision making. ML is a sunset of AI where you make those algorithms by learning from experience (also known as data) but there is AI that isn't ML.

Finally, data science isn't a formal term with definitions. I'd say it's a toolbox of methods ranging from things related to ML to traditional statistics and potentially even optimisation/operations research. It's an applied field where you use data to solve problems. The "science" in there is to distinguish it from let's say business analytics where the goal is "insights" and bar charts. It's a narrower skillset.

narrow tiger May 3, 2024, 6:45 AM

#

"but there is AI that isn't ML."? what do you mean by this exactly

#

also any advice to someone coming from programming background into this field i don't wanna go to too deep into calculus but stay near the programming end
any pathways / job titles i should aim for

jaunty helm May 3, 2024, 6:55 AM

#

narrow tiger "but there is AI that isn't ML."? what do you mean by this exactly

personally I think of AI as the goal and ML as a means

AI that isn't ML
programs that could do complex decision making existed before ML got popular
for example, you could just code a ton of conditional checks manually, and that could act like AI; in fact that's the idea of expert systems
ML is when hardware got better and people thought, "man, manually finding & coding in these rules every single time for every single new problem is a lot of work, what if we just had a generic tool which can do that for us instead?"

iron basalt May 3, 2024, 7:03 AM

#

jaunty helm personally I think of AI as the goal and ML as a means > AI that isn't ML progr...

ML existed long before the hardware improved (1950s). The concept of AI has been around for a very long time, but ML came about around the time that lots of people started getting into AI for real (actual implementations on computers (machines, not people, back then "computer" also still meant a person / job title)).

narrow tiger May 3, 2024, 7:05 AM

#

jaunty helm personally I think of AI as the goal and ML as a means > AI that isn't ML progr...

thanks

narrow tiger May 3, 2024, 7:05 AM

#

narrow tiger also any advice to someone coming from programming background into this field i ...

is there any place for me 😆 ?

iron basalt May 3, 2024, 7:05 AM

#

But for example, an automatic prover / search algorithm was considered AI back then, now it may not be due to not being impressive enough anymore.

jaunty helm May 3, 2024, 7:06 AM

#

iron basalt ML existed long before the hardware improved (1950s). The concept of AI has been...

true, but I meant before ML got popular
unless that's also false thus making me more of a dum dum

iron basalt May 3, 2024, 7:06 AM

#

jaunty helm true, but I meant `before ML got popular` unless that's also false thus making m...

ML was popular back then also. It's been around for a while now.

jaunty helm May 3, 2024, 7:07 AM

#

iron basalt ML was popular back then also. It's been around for a while now.

ah welp
learn something everyday I guess 😅

iron basalt May 3, 2024, 7:07 AM

#

Including some lesser known roles it played in stuff like the space race (optimization algorithms).

past meteor May 3, 2024, 7:08 AM

#

BFS, DFS, A* are all AI depending on the context

#

People don't like this but it's true

iron basalt May 3, 2024, 7:08 AM

#

(About the space race stuff) But would in that context only be consider a search / optimization algorithm, the term ML was around, but did not blow up yet in usage, still was ML though.

past meteor May 3, 2024, 7:08 AM

#

(and it has nothing to do with ML)

jaunty helm May 3, 2024, 7:09 AM

#

I guess they just "feel less AI" when compared to chatbots

iron basalt May 3, 2024, 7:09 AM

#

Also a lot would fall just under control theory, now parts of it are considered ML, even though it's still (optimal) control theory.

past meteor May 3, 2024, 7:10 AM

#

https://en.m.wikipedia.org/wiki/Knowledge_representation_and_reasoning

Knowledge representation and reasoning

Knowledge representation and reasoning (KRR, KR&R, KR²) is the field of artificial intelligence (AI) dedicated to representing information about the world in a form that a computer system can use to solve complex tasks such as diagnosing a medical condition or having a dialog in a natural language. Knowledge representation incorporates findings ...

wooden sail May 3, 2024, 7:10 AM

#

past meteor People don't like this but it's true

this seems appropriate for the present discussion

iron basalt May 3, 2024, 7:10 AM

#

AI is when it feels magical enough is a certain definition of it.

past meteor May 3, 2024, 7:11 AM

#

wooden sail this seems appropriate for the present discussion

Hahahaha yes this is true

iron basalt May 3, 2024, 7:11 AM

#

This also means that with enough time all AI becomes non-AI.

past meteor May 3, 2024, 7:12 AM

#

Whenever I do a talk I ask people if they think a Google search is AI and the vast majority says no

#

It's the best example of that

iron basalt May 3, 2024, 7:14 AM

#

iron basalt ML existed long before the hardware improved (1950s). The concept of AI has been...

Also I do mean ML as in the term ML was coined and used at this time, not just things that fall under ML now.

#

If you include prior to the term ML, then even earlier, since lots of search and optimization happened automatically (on machines) during WWII.

past meteor May 3, 2024, 7:17 AM

#

wooden sail this seems appropriate for the present discussion

About this, I don't know where we draw the line and say a method is ML or "just statistics" I think most people would say linear regression isn't ML but SVMs somehow are

wooden sail May 3, 2024, 7:18 AM

#

a lot of people introduce linear regression as "the simplest form of ML" too

past meteor May 3, 2024, 7:18 AM

#

But what about linear kernel, least squares SVMs. They reduce to something similar to LDA

wooden sail May 3, 2024, 7:18 AM

#

i don't think the term is well-defined enough to be worthwhile

past meteor May 3, 2024, 7:18 AM

#

wooden sail i don't think the term is well-defined enough to be worthwhile

That I agree about

#

For me the difference is the end goal, statistical inference or simply prediction

#

Yes if you can do inference you can do prediction

#

But stats was always more focused in inference and not necessarily prediction

iron basalt May 3, 2024, 7:19 AM

#

A key part of ML is the M, it happens on a machine, linear regression and such came way before that.

#

But they can also be done on a machine, so idk.

#

And we already had people trying to make AI-like automatic proof machines and such. Although most were never completed, too far ahead of their time (pre-Turing).

iron ruin May 3, 2024, 8:38 AM

#

configuring score ranges is pain

#

especially for a dataset of 4000

#

yert

drifting depot May 3, 2024, 9:50 AM

#

Hi, I am new to python and I need to fit data with x and y errors in mathplotlib. How can I do that? (I am trying something different than gnuplot, and I couldn't figure it out)

tidal bough May 3, 2024, 9:52 AM

#

What are you asking, specifically? For plotting that, use errorbar with xerr and yerr arguments.

drifting depot May 3, 2024, 9:52 AM

#

tidal bough What are you asking, specifically? For plotting that, use `errorbar` with `xerr`...

Yes, I need linear fit with xerr and yerr

#

I see, there is argument sigma, but I don't know to to include xerr and yerr

tidal bough May 3, 2024, 9:55 AM

#

As for fitting - what kind of linear model are you looking for? If you want to take into account having errors on the x-axis, you'd need something like "total least squares", aka https://docs.scipy.org/doc/scipy/reference/odr.html

drifting depot May 3, 2024, 9:56 AM

#

tidal bough As for fitting - what kind of linear model are you looking for? If you want to t...

Yes, I have to use total least squares for fitting as far as I know. But, this looks promising, thanks

serene scaffold May 3, 2024, 1:35 PM

#

past meteor BFS, DFS, A* are all AI depending on the context

the AI director of my company said "AI is whatever you can't currently do"

#

@vernal thunder your message was removed for not being in English or being on-topic for this channel

vernal thunder May 3, 2024, 1:42 PM

#

Is this correct

#

Hmm, who asked for your opinion?

serene scaffold May 3, 2024, 1:43 PM

#

vernal thunder Hmm, who asked for your opinion?

I'm one of the moderators of this server.

vernal thunder May 3, 2024, 1:44 PM

#

Hahahahaah

#

Yes good

#

I'm not afraid of anyone

#

Keep this in your information

#

because Im Arabic

serene scaffold May 3, 2024, 1:45 PM

#

vernal thunder I'm not afraid of anyone

You don't need to be afraid. You just have to follow the rules. Posting informational content about religions is not on-topic.

Ana atakalam bil-arabi.

vernal thunder May 3, 2024, 1:46 PM

#

اذا تكلم معي عربي

#

اريد ان اعرف اي دين تتبع

#

يا

#

يا ايها المشرف

serene scaffold May 3, 2024, 1:48 PM

#

vernal thunder اريد ان اعرف اي دين تتبع

We can't actually talk to each other in arabic in this server. But this server also is not an appropriate place for religious inquiry.

vernal thunder May 3, 2024, 1:48 PM

#

يبدو انك ....

#

امريكي

#

جيد

#

هل تدعم فلسطين او اخرائيل

#

تكلم

serene scaffold May 3, 2024, 1:50 PM

#

@vernal thunder I'm muting you if this off-topic discussion continues.

vernal thunder May 3, 2024, 1:51 PM

#

Hahaha, it looks like there are 14 on the PlayStation

#

This is a bad thing

#

By the way, I am the one who is silent

serene scaffold May 3, 2024, 1:53 PM

#

@vernal thunder if you send another message in this channel, make sure it's about data science or AI.

fallen osprey May 3, 2024, 2:12 PM

#

What's the best place to learn maths for ai ml

spring field May 3, 2024, 3:23 PM

#

I suppose university/college would certainly be one of the better places for that

trail monolith May 3, 2024, 3:26 PM

#

fallen osprey What's the best place to learn maths for ai ml

Mathematics for machine learning book

neat bluff May 3, 2024, 4:06 PM

#

serene scaffold <@1234937510967115927> if you send another message in this channel, make sure it...

He listened, I am surprised 😆

limber token May 3, 2024, 5:33 PM

#

Any idea on why this code is throwing RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor?

#

Is it because of DataLoader?

#

The code for train_model is this:

def train_model(model: nn.Module, train_loader: DataLoader, criterion: LossFunction, optimizer: optim.Optimizer, num_epochs: int = 10) -> None:
    """
    Train a PyTorch model.

    Args:
        model (nn.Module): The model to train.
        train_loader (DataLoader): The DataLoader for the training data.
        criterion (nn.modules.loss._Loss): The loss function.
        optimizer (optim.Optimizer): The optimizer.
        num_epochs (int, optional): The number of epochs to train for. Defaults to 10.
    """
    for epoch in range(num_epochs):
        model.train()
        running_loss = 0.0

        for images, labels in train_loader:
            labels = labels.float()
            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs.squeeze(), labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item()

        print(f'Epoch {epoch+1}, Loss: {running_loss/len(train_loader)}')

limber token May 3, 2024, 5:59 PM

#

It works fine when not using CUDA

leaden narwhal May 3, 2024, 9:49 PM

#

#databases message

#

fellas can anyone give me a hand

gusty flicker May 3, 2024, 11:12 PM

#

hey, does anyone have a good recommendation for a youtube or online guide for doing astronomy stuff with astropy and some machine learning? I found a youtube video online for using fits data and I was able to use ultralytics and roboflow, but i dont know if I got issues because of version mismatches with the pip packages or what, I'd rather ask if anyone is aware of a good guide

narrow tiger May 4, 2024, 4:24 AM

#

vernal thunder because Im Arabic

lmfaooo

leaden narwhal May 4, 2024, 12:41 PM

#



{'Grid_ID': ['1001', '1001', '1001', '1001', '1001'], 'Datetime': [Timestamp('2023-03-01 00:00:00+0000', tz='UTC'), Timestamp('2023-03-01 00:15:00+0000', tz='UTC'), Timestamp('2023-03-01 00:30:00+0000', tz='UTC'), Timestamp('2023-03-01 00:45:00+0000', tz='UTC'), Timestamp('2023-03-01 01:00:00+0000', tz='UTC')], 'C1': ['4.25', '1.909999966621399', '0.0', '0.0', '0.0']}

Convert "Nº de ..." (inteiros) to int
int_cols = df.columns[df.columns.str.startswith('C')]
df[int_cols] = df[int_cols].apply(np.int64)

df.sample(10)

 ValueError: invalid literal for int() with base 10: '4.25 ```

#

Guys im having this error

#

any help?

serene scaffold May 4, 2024, 1:31 PM

#

leaden narwhal Guys im having this error

You need to covert it to a float first

fleet compass May 4, 2024, 4:13 PM

#

https://stackoverflow.com/questions/78429324/transform-excel-data-into-python-dataframe

Stack Overflow

Transform excel data into Python dataframe

Input
Output
I have a standard excel template that I collect from different entities regarding their sales data by site and product as shown in the first table. I would like to write a python code to

#

hi all. python newbie here. need help with above

serene scaffold May 4, 2024, 4:27 PM

#

fleet compass https://stackoverflow.com/questions/78429324/transform-excel-data-into-python-da...

since this question is hyperspecific and depends on an xlsx file that only you have, it's not very likely that anyone will volunteer to answer it. I recommend doing the kaggle pandas tutorial.

calm pagoda May 4, 2024, 4:56 PM

#

https://discord.com/channels/267624335836053506/1236360770141556757

#

Pls guide me..

daring pier May 4, 2024, 5:02 PM

#

Anyone ever had an error while training a cnn that says input ran out of data while using tensorflow? Found some advice in stack-overflow but the error is still there?
Can anyone help me?

wooden sail May 4, 2024, 5:26 PM

#

i would point out that this is the deterministic interpetation, but you can alternatively derive the same regularizers through statistical criteria (e.g. for the L1 case, using maximum a posteriori when the parameters follow a laplace distribution centered at 0). also L2 does not restrict the values of the weights

#

discourages, yes, but not restricts

#

it won't prevent the weights from becoming infinitely large

#

what do you mean?

#

unless you explicitly introduce inequality constraints, the inputs and outputs will be unbounded

#

that won't stop you from quickly exceeding the computer precision and getting infs and nans

#

which is exactly what you see in e.g. exploding gradients

#

L2 reg alone does not prevent the parameters becoming arbitrarily large in any way

#

neither in the math nor in the implementation in the computer

#

yes but it won't "restrict" them. you CAN do that: you can guarantee the values never exceed a certain threshold

#

that's something different altogether and it's where constraints come in

#

the wording and semantics are important to distinguish that

#

i'd advice against making stuff up

#

what you're discussing now already exists and has names, and you'll have an easier time reading about it if you find the proper terms

#

if you say so. at any rate though, L2 does not prevent your parameters from becoming unbounded

#

you can do it via the extended lagrange form of the KKT conditions with inequality constraints

#

that does add some L2-looking regularization terms, but the additional slackness and positivity conditions anyway have to be enforced for the solutions to be in the feasible set. those require inequalities

#

the way it's usually explained is as "promoting smoothness"

#

maximizing the 2 norm of a vector is achieved by dumping all of the values into a single entry and setting everything else to 0

#

the minimum is achieved by making all entries equal

#

the less variation there is among vector entries, the lower the 2 norm

#

it's exactly what it does, though

#

smoothness when paired with an equality constraint does restrict the values, too

#

what L2 will do is make all of the parameters similar to each other

wooden sail May 4, 2024, 5:44 PM

#

wooden sail what L2 will do is make all of the parameters similar to each other

this

#

wdym by that?

#

i don't think degeneracy makes sense here either

#

that's what they mean by smoothness in this context, not differentiability

#

idk who came up with the term nor when, but it's well established

#

because the 2-norm is a contraction for small values, so it ignores them

#

you can try yourself playing with the example i gave above. take a vector, and for simplicity, work only with positive entries. say we work with the condition that the entries of the vector add up to 1

#

now let's maximize and minimize the 2-norm of the vector

#

it's pretty easy to conclude that the maximum value is 1, when one entry is 1 and the others are 0. this is the "least smooth" solution in the sense that it looks spiky

#

the minimum norm solution is the one where, if the vector is of length N, the entries are 1/N

#

then they all get contracted by the 2-norm

#

that solution is "smooth" in that the entries change very little w.r.t. each other

#

yes, though almost never used

#

0 leads to combinatorial problems and is fairly common. L1 is its convex relaxation and they are actually equivalent under special conditions

#

everything between 0 and 1 promotes sparsity, but 0 is not a proper norm and 0 < L < 1 is non convex

#

L = 1 is convex and non differentiable, but it does have a nice subgradient

#

how so

toxic mortar May 4, 2024, 6:14 PM

#

is it possible to neural network different type of activation in output layer of neural network?

#

For example in my output layer 9 of outputs have the softmax, which they should be categorical, and 1 output should have linear activation since it is prediction

serene scaffold May 4, 2024, 6:20 PM

#

toxic mortar For example in my output layer 9 of outputs have the softmax, which they should ...

why is one output different from the others?

toxic mortar May 4, 2024, 6:21 PM

#

serene scaffold why is one output different from the others?

I want to classify and predict at the same time

#

Something like this, where 1 is classification problem and 2 is prediction problem

serene scaffold May 4, 2024, 6:23 PM

#

toxic mortar Something like this, where 1 is classification problem and 2 is prediction probl...

did you make this?

toxic mortar May 4, 2024, 6:24 PM

#

serene scaffold did you make this?

With drawio yes

#

But I cant seem to implement it

serene scaffold May 4, 2024, 6:24 PM

#

toxic mortar I want to classify and predict at the same time

I think there's a misunderstanding here. "classify" and "predict" aren't mutually exclusive things. a classifier predicts the classes of the inputs.

toxic mortar May 4, 2024, 6:26 PM

#

Okay. How would you approach this? If there is 9 classes and 1 parameter

serene scaffold May 4, 2024, 6:26 PM

#

I'm not sure

toxic mortar May 4, 2024, 6:27 PM

#

serene scaffold I'm not sure

Do you understand what I am looking for?

#

I mean one obvious solution is to create two seperate neural nets, one for class classification and one for the stock prediction

#

can I fit it in one?

lapis sequoia May 4, 2024, 8:04 PM

#

guys

#

i need a strong source to learn numpy

past meteor May 4, 2024, 8:07 PM

#

toxic mortar For example in my output layer 9 of outputs have the softmax, which they should ...

Sure, you can make any type of architecture/configuration you want

#

What I'd worry about is how I'd compute the loss of this network

toxic mortar May 4, 2024, 8:08 PM

#

past meteor Sure, you can make any type of architecture/configuration you want

Yes exatcly. I managed to implement it, but it didnt work well

#

I splited into classifier and linear prediction

past meteor May 4, 2024, 8:09 PM

#

So 1 softmax (with 9 classes) and 1 linear layer?

toxic mortar May 4, 2024, 8:09 PM

#

Yes

#

Two seperate models with different architecture

past meteor May 4, 2024, 8:09 PM

#

I wouldn't do that

toxic mortar May 4, 2024, 8:09 PM

#

Why not?

past meteor May 4, 2024, 8:10 PM

#

You can just have 1 network that does 2 outputs, you compute the loss of each and take the mean or so

past meteor May 4, 2024, 8:11 PM

#

toxic mortar Why not?

It has been shown that combining them has desirable properties, it has a regularizing effect. If you want to get into the weeds you can read this https://en.wikipedia.org/wiki/Multi-task_learning

#

Obviously, the biggest issue with neural nets is that you can never easily conclude if it doesn't work or there's just a special set of hyperparameters you haven't tried yet that do work

#

I think if I were you I'd train them separately first and hyperparameter tune them separately as well and then benchmark against a multi-task style architecture

wooden sail May 4, 2024, 8:21 PM

#

i don't think rust has a good BLAS/LAPACK implementation yet, does it?

past meteor May 4, 2024, 8:21 PM

#

Python can use multiprocessing without any problems.
The issue is multithreading. Only one Python thread can talk to the interpreter concurrently.
Most major libs like numpy use multiple threads in C-land which circumvent this issue.

wooden sail May 4, 2024, 8:22 PM

#

which means even though it could be a good idea, no one has done it yet

#

idk how easily rust exposes SIMD

#

google says support is only experimental

#

this arguably has a bigger impact than just parallelization which, as zestar says, is already taken care of in C for numpy

past meteor May 4, 2024, 8:25 PM

#

Not my area of expertise but you can definitely allocate chunks of the matrices/arrays to different threads and combine them afterwards

wooden sail May 4, 2024, 8:25 PM

#

simply by virtue of getting more slots on the OS scheduler, sure. if your task already exceeds the cache size and the number of parallelizable operations in SIMD, you can speed it up by getting favored by the gods of RNG

buoyant vine May 4, 2024, 8:25 PM

#

wooden sail i don't think rust has a good BLAS/LAPACK implementation yet, does it?

it does, pretty good bindings to openblas, or if you're feeling wild it is pretty simple to bind to fortran

wooden sail May 4, 2024, 8:25 PM

#

buoyant vine it does, pretty good bindings to openblas, or if you're feeling wild it is prett...

no but i mean written in rust directly

buoyant vine May 4, 2024, 8:26 PM

#

wooden sail google says support is only experimental

stable SIMD i.e. simd operations behind types that make it easier to use, but intrinsics are stable outside of avx512

wooden sail May 4, 2024, 8:26 PM

#

aha

buoyant vine May 4, 2024, 8:27 PM

#

wooden sail no but i mean written in rust directly

ah, no not really, I'm not sure it is worth ever doing that vs binding to cblas or open blas.

I have made some vector math libraries in Rust, but not to the same extent as blas. Often it becomes pretty annoying to maintain such a large number of specialized ops

#

for the sake of maybe beating blas by a few pct

#

Yeah, idk for AI/ML I probably wouldn't use CPU for heavy ops regardless, and Rust-cuda is a pretty nice experience

wooden sail May 4, 2024, 8:28 PM

#

that's what i would've thought, yeah

buoyant vine May 4, 2024, 8:28 PM

#

Me rn 😅

#

These routines alone are ~20k LOC

wooden sail May 4, 2024, 8:28 PM

#

my respects to you

foggy obsidian May 4, 2024, 8:29 PM

#

Add mine as well

buoyant vine May 4, 2024, 8:37 PM

#

One thing I guess I would weigh in here though, I think Rust can be great for training models in situations where you need multi-gpu or multi-threaded dataset processing or pre-processing.

At work we use PyTorch Lightning and that thing single handidly takes 20 minutes to startup on a big dataset with 32 cores due to all the multi-processing and extra overhead going on from Python, where Rust can just use threads natively. That and the static type checking can help signififcantly to reduce the crashes at ends of runs due to some random error.

That being said, for quickly knocking something out Python still wins, and I think maybe if you have enough time training via onnxruntime might solve the original issue.

#

maybe

iron basalt May 4, 2024, 8:44 PM

#

This is nonsense, every language with performance in mind can do parallelism, multiprocessing is also not what is desired, you don't want a process for each part. If they mean vs Python then that would make sense for CPU heavy tasks, but for matrix multiply we have numpy anyhow. Python is extremely slow. But you actually get more gains (more than the parallelization step) by switching to something like C or Rust ignoring parallelism. Python is just that much slower. And we do do that every time we call a numpy function. And also usually it all happens on the GPU anyhow where Rust does not apply (for large enough matrices / deep learning).

#

Also bonus points if you realize it's even better to use something like OpenCL for the CPU, for which you can also use PyOpenCL (SPMD/ISPC is the superior model for this stuff which is why the GPU also uses it).

past meteor May 4, 2024, 8:48 PM

#

Based you're talking about matrix multiplication in VR Chat tho

iron basalt May 4, 2024, 8:49 PM

#

past meteor Based you're talking about matrix multiplication in VR Chat tho

I have seen ppl teaching calculus on a whiteboard in it.

wooden sail May 4, 2024, 8:50 PM

#

typical vr chat discussion

buoyant vine May 4, 2024, 8:51 PM

#

😅

#

I think it would be more viable if you could more concretely force the compiler to unroll some loops

#

biggest gain fortran has IMO the ability for it to aggressively unroll loops and split the ops into SIMD lanes automatically vs manually

iron basalt May 4, 2024, 8:54 PM

#

buoyant vine I think it would be more viable if you could more concretely force the compiler ...

Adding all this stuff to Rust does not seem a high priority, they really like their functional style without writing manual loops. But it will probably be added.

buoyant vine May 4, 2024, 8:55 PM

#

Tbh idk if it will ever truly have the ability to force unrolls since it is technically controlled by LLVM and depends on LLVM being able to work out if it should or not

iron basalt May 4, 2024, 8:55 PM

#

Rust is more of a modern C++ alternative than C which gives it a focus on ergonomics over this kind of optimization stuff.

#

And as usual everyone ingores all the cool stuff Fortran did :(

buoyant vine May 4, 2024, 8:56 PM

#

Eh I disagree, at least for optimized compute, you can achieve the same thing in Rust abietite unsafe rust, as you would C, but both still have the same issue that LLVM/gcc largely control the unrolling behaviour automatically

#

but yeah, in terms of writing fast math ops without having get your hands dirty with manual SIMD, fortran is awsome

#

especially F95+ where you can expose functions via FFI more easily now

iron basalt May 4, 2024, 9:00 PM

#

Yeah you can, but it's a question of how difficult, after all, I could also in Python by manually outputting machine code to a buffer writing that to an executable memory page and running it. This is an extreme example but unsafe Rust plus hoping LLVM does the right thing can feel like that. Anyhow I don't want to make this a Rust complaint channel so we can go to off topic.

buoyant vine May 4, 2024, 9:01 PM

#

My point was more unsafe rust gives you same control as you would C in reality, and if you really want the most number crunching performance, in both cases you are always manually writing the intrinsic regardless of if it is C or not, but yeah we're getting a bit off topic lol

simple tapir May 4, 2024, 10:02 PM

#

Do I need a Master's degree to work as a data scientist?

#

im currently a sophmore undergrad computer science and engineering student

past meteor May 4, 2024, 10:27 PM

#

simple tapir Do I need a Master's degree to work as a data scientist?

Depends on the country. Are you in the US? Europe? India?

simple tapir May 4, 2024, 10:28 PM

#

past meteor Depends on the country. Are you in the US? Europe? India?

Turkey

past meteor May 4, 2024, 10:28 PM

#

I don't know anything about the Turkish job market to be honest

simple tapir May 4, 2024, 10:28 PM

#

I took linear algebra and some other math classes and I take statistics, ML, ai and differantiel equations in this semester

past meteor May 4, 2024, 10:28 PM

#

The only people that can answer this are people in your country

simple tapir May 4, 2024, 10:29 PM

#

I'd like to work abroad though

past meteor May 4, 2024, 10:29 PM

#

Then it'll depend on the country you want to work in specifically 🙂

#

I think it's possible with a bachelors in the US and UK for instance

#

Where I'm based (Belgium) not so much

simple tapir May 4, 2024, 10:31 PM

#

Belgium requires MSc / PhD at least?

past meteor May 4, 2024, 10:32 PM

#

Science/theory oriented degrees put you on a track where you get BSc + MSc, no one leaves these before getting an MSc. Practice focused tracks don't lead to an MSc and cover no math, stats, ... (anymore) but deliver better programmers at day 1

#

that's the summary

#

1 or 2 years, so 4 or 5 years total (bs + ms). Just 1/3 finishes it in that time so it's more like 5+ years for the majority

#

yeah, a good move here is to do 2 of 1 year each

#

well yes, each place has their peculiarities

#

hence why, and I don't mean this to be rude, it's better to ask people IRL. Online you'll get US-centric advice that most likely doesn't apply to youu (or could even be detrimental)

#

that's the edge case

#

But if you're targetting idk Germany, I think r/germany or whatever is optimal

#

Did you look at the ones I sent? 👀

#

(I can resend)

#

A video

#

the f

#

which one?

quaint crystal May 5, 2024, 1:42 AM

#

Hey I am looking for some hints for something I would like to do with tensorflow. I want to show one of 5 images into the camera and have the program tell me which one it is. I know I should probably use template matching, but all tutorials I can find use more than one image as training data. Does someone know a good starting point for this?

serene scaffold May 5, 2024, 2:06 AM

#

quaint crystal Hey I am looking for some hints for something I would like to do with tensorflow...

For one thing, use pytorch.

Even if you want to implement model that performs template matching, you would need more than one training instance, would you not?

rotund roost May 5, 2024, 4:31 AM

#

quaint crystal Hey I am looking for some hints for something I would like to do with tensorflow...

try pytorch

hidden ferry May 5, 2024, 4:33 AM

#

quaint crystal Hey I am looking for some hints for something I would like to do with tensorflow...

isn't Tensorflow have specific library for this set of problems? i think i've seen it somewhere, KerasCV.

gritty vessel May 5, 2024, 7:30 AM

#

hey are there any resources for this? scraping data using
language model-based tools like OpenAI API, Mistral 7B, Llama2

toxic mortar May 5, 2024, 9:52 AM

#

This means it is overfiting right? Spikes around 10,15 epochs

#

Before I hyperparam tune it, I want to make sure I did my best regarding the model architecture

hasty grail May 5, 2024, 10:00 AM

#

toxic mortar This means it is overfiting right? Spikes around 10,15 epochs

Yes, looks like overfitting. Btw, the last 3 columns in the confusion matrix are all zeros, you might want to look into that

toxic mortar May 5, 2024, 10:02 AM

#

Yes. cool . thanks. Yes I know why are they 0s. this is my class distribution

#

I wanted just to test it out, before I remove outliers

#

Imma try over and undersampling and classweights to see how it performs

#

If it sucks imma just chop it off

past meteor May 5, 2024, 10:17 AM

#

toxic mortar This means it is overfiting right? Spikes around 10,15 epochs

I wouldn't say they're overfitting

#

I look at overfitting as a disproportionate gap between the validation and training loss

#

Your problem is moreso that your val loss isn't smooth but that's imo pointing towards a learning rate that is too high, lack of dropout, ... things you can tune easily

#

I just train with "enough" early stopping

#

Honestly, I noticed that there's a lot of variance in training. If I run the same hyperparameters on the same data some runs it's good, some runs it's not. Setting early stopping to something "reasonably high" makes you robust to the model quitting after a few bad epochs

#

As in, I think it reduces the variance

toxic mortar May 5, 2024, 10:25 AM

#

Thanks guys

past meteor May 5, 2024, 10:28 AM

#

Hmmmm

#

I wouldn't tune the seed but if I had enough time and patience I'd run the same thing N times and make a boxplot or something yes

#

Just don't have the time to do that with neural nets, each run takes way too long

#

You know what I should consider? Doing hyperparameter tuning as a multi-objective optimization problem. Instead of just tuning for the loss you also consider the time it takes to do a single run.

#

That way I could keep my search space larger, but have the hyper param optimizer "punish" the algo for selecting a very low learning rate or very large architecture.

#

Yeah, we have 2 big #enterprise GPUs

#

I have optuna, tensorboard and mlflow set up nicely. All I need to implement is 1 function to have my pipelines run. I code an architecture, run it for a couple of days and then read papers to find/code up the next one.

#

Contract research, $pharma pays us to do the research and then they may or may not put the ideas in prod

#

pretty much

#

that's really nice 😮

#

My issue is, I'm very wary of "tools"

#

I've been burnt so many times trying to adopt a shiny thing in my codebase only to notice it just doesn't do what I want it to do

#

If it's Python I'm also just relatively fast and churning out code so it's always a trade-off between "will I write it myself or figure out how it works from the docs"

#

My setup is ... nonstandard. I already had existing sklearn based preprocessing and metrics. I didn't want to port all of it to work for Torch so I wrap my Torch models in a sklearn interface 🥴

#

Instead of going with https://github.com/skorch-dev/skorch and figuring out the pros and cons it was simply faster (<1-2h work) to write that interface myself

wintry grail May 5, 2024, 11:13 AM

#

I want some good resources to learn text mining in python, can anybody suggest some lec series or book?

short isle May 5, 2024, 1:57 PM

#

did i need to learn machine learning to make ai?

past meteor May 5, 2024, 2:03 PM

#

short isle did i need to learn machine learning to make ai?

You don't need to learn machine learning to use off-the-shelf algorithms or call APIs (like OpenAI and the stuff cloud providers offer) but if you want to train your own and go off the beaten path yes you do

toxic mortar May 5, 2024, 4:21 PM

#

Has anybody used tensorflow Profiler tool?

#

I want to see where my pipeline is bottlenecking. I use CPU training only and read data from SSD.

stop_early = EarlyStopping(monitor='val_accuracy', patience=50, restore_best_weights=True) # 100
tb_callback = TensorBoard(log_dir="logs", profile_batch='1,10')
history =model.fit(
    x_train_norm, y_train,
    epochs=100, # 400
    batch_size=64,
    validation_data=(x_test_norm, y_test),
    callbacks=[stop_early,tb_callback],
    verbose=1
)

#

But theresnt any profiler data

#

I can see other things, which means, it works

unkempt jay May 5, 2024, 4:23 PM

#

help my install isnt working

#

in vs code

#

i used the pip install and it installed perfect

#

then i rebooted and it still says module not recognised or something

#

when i try again it says its already satisfied

serene scaffold May 5, 2024, 4:29 PM

#

unkempt jay then i rebooted and it still says module not recognised or something

whenever you need help with an error message, always show the whole error message in the chat as text.

unkempt jay May 5, 2024, 4:29 PM

#

serene scaffold whenever you need help with an error message, always show the whole error messag...

serene scaffold May 5, 2024, 4:30 PM

#

unkempt jay

Screenshots are not text.

unkempt jay May 5, 2024, 4:30 PM

#

Import "torch" could not be resolved

serene scaffold May 5, 2024, 4:30 PM

#

Anyway, you probably pip installed pytorch to a different environment than the one vscode is using. try running the program anyway and show the whole error message, if there is one, starting from Traceback.

unkempt jay May 5, 2024, 4:31 PM

#

serene scaffold Anyway, you probably pip installed pytorch to a different environment than the o...

serene scaffold May 5, 2024, 4:32 PM

#

unkempt jay

Please stop posing screenshots of text. I will not answer any questions you ask in the future if you keep doing this.

#

it looks like you tried running a pip install command. I'm asking you to run the python script that you're trying to write.

#

@unkempt jay I'm still available to help. what do you do to run the python program?

silk yarrow May 5, 2024, 6:03 PM

#

i have my minor project submission tomorrow so i have one problem, my project is density based traffic light management system which detects objects using yolo v3 model using coco names file which include names of 80 objects, but my aim is to detect the ambulance in traffic with all other vehicles . my project is not detecting ambulance ,please help me.

serene scaffold May 5, 2024, 6:16 PM

#

silk yarrow i have my minor project submission tomorrow so i have one problem, my project is...

hello, you'll need to be more specific in order to get help.

thorn cairn May 5, 2024, 6:19 PM

#

hey Stele if you can help meon #1035199133436354600 it'd be great!

serene scaffold May 5, 2024, 6:20 PM

#

thorn cairn hey Stele if you can help meon <#1035199133436354600> it'd be great!

if you want to cross-post your question, please link to the thread itself and give a brief explanation of what it's about.

thorn cairn May 5, 2024, 6:24 PM

#

sorry but how do i link my posts?

serene scaffold May 5, 2024, 6:25 PM

#

thorn cairn sorry but how do i link my posts?

right-clicking a message gives you the option to copy the message link

thorn cairn May 5, 2024, 6:25 PM

#

https://discord.com/channels/267624335836053506/1236742726045798430

#

ayy it does

#

there is a brief explanation there!

buoyant shoal May 5, 2024, 6:34 PM

#

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# sklearn comes with some example data sets
from sklearn import datasets

# Import train_test_split function
from sklearn.model_selection import train_test_split
#Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics
#Import scikit-learn MLP classifier
from sklearn.neural_network import MLPClassifier 


df = pd.read_csv("dest")
x1 = np.array(df["x1"]).reshape(-1,1)
x2 = np.array(df["x2"]).reshape(-1,1)
x3 = np.array(df["x3"]).reshape(-1,1)
Y = np.array(df["Class"])

X = np.concatenate((x1,x2,x3), axis=1)

accuracy_train = np.zeros(100)
accuracy_test = np.zeros(100)

for i in range(100):

    # Split dataset into training set and test set
    X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.5) # 50% training and 50% test


    # Create MLP classifer object
    mlp = MLPClassifier(solver='adam', hidden_layer_sizes=(20, 20), max_iter=50000)


    # Train MLP Classifer
    model = mlp.fit(X_train, y_train)

    # Predict the response for training dataset
    y_pred = model.predict(X_train)

    acc_train = metrics.accuracy_score(y_train, y_pred)

    accuracy_train[i] = acc_train
    
    # Predict the response for test dataset
    y_pred = model.predict(X_test)

    acc_test = metrics.accuracy_score(y_test, y_pred)
    
    accuracy_test[i] = acc_test
    


print("Average accuracy for training data:", np.mean(accuracy_train))
print("Average accuracy for test data:", np.mean(accuracy_test))

#

Hi, dumb question but for this piece of code I'm curious why there's a variance in accuracy results

#

Is mlp.fit() and train_test_split() the two reasons why?

serene scaffold May 5, 2024, 6:36 PM

#

buoyant shoal ```python import numpy as np import matplotlib.pyplot as plt import pandas as pd...

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.5) # 5% training and 50% test

This is wrong. if you set the test size as .5, the training set size will be 1 - .5

buoyant shoal May 5, 2024, 6:36 PM

#

serene scaffold ```py X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.5) #...

yes the comment is a typo

#

i meant 50% test and 50% training

#

forgot the 0

#

(fixed)

serene scaffold May 5, 2024, 6:37 PM

#

y_pred = model.predict(X_train)
you also predicted on the training data, and you can't use that to evaluate the model's performance.

buoyant shoal May 5, 2024, 6:38 PM

#

serene scaffold `y_pred = model.predict(X_train)` you also predicted on the training data, and y...

no that's intentional, my assignment asks me to predict on both the training and test samples

#

and then compare

#

by changing hidden layer sizes and max iteration

#

which i've sorted but like it also asks what causes the "variance" in the accuracy results

#

i'm suspecting it's the mlp.fit() and train_test_split(), am i right?

silk yarrow May 5, 2024, 6:39 PM

#

serene scaffold hello, you'll need to be more specific in order to get help.

my project is based on yolo v3 model and project uses coco name file which i took from github,it has only 80 objects. My mentor asked me to detect ambulance if it is in traffic but my project detect it as truck, because ambulance is not included in coco name file . So task is to train data set so that it can detect ambulance and mark it as ambulance in bounding box

thorn cairn May 5, 2024, 6:51 PM

#

how do i label these semester into first year and not first year?

odd meteor May 5, 2024, 9:47 PM

#

thorn cairn how do i label these semester into first year and not first year?

I suppose it depends on the information on the data you have and the location/school it was collected from.

Is 1, 2,and 3 the only unique values in that column?

A 2 years masters program for example, usually has 4 semesters. Year 1 would correspond to semester 1 & 2, and Year 2, semester 3 & 4.

In your case it appears the program only has 3 semesters (presumably it's a program with 1.5 years completion time.)

If that's the case, semester 1 & 2 shoukd correspond to year 1. the rest, semester 3 becomes 6 months (0.5 years)

You just have to investigate further to figure how it works over there.

odd meteor May 5, 2024, 9:48 PM

#

thorn cairn how do i label these semester into first year and not first year?

Oh there's semester 4 😀. I didn't catch that at first.

vestal spruce May 6, 2024, 3:16 AM

#

Hi, is anyone familiar with Hugging Face's Transformers Library? I'm trying to fine-tune an ASR/speech to text model with the library but idk how to feed my dataset, and for the dataset it's a 30 second audio file as the "feature" and a text saved on a notepad as target. If I want to feed these data, can i just use a simple list/numpy array? or do I need to turn it into a tensor first? any for of help is appreciated, thanks ion advanced 🙏

dawn light May 6, 2024, 3:43 AM

#

is there a type of ML recommender system where in addition to the usual collaborative/content filtering, I can add/specify/enhance other dimensions (not sure if i phrased that correctly)?

For example, I like to look for a movie that's similar to Matrix but isn't scifi, or a movie/series that's similar to star wars but anime (this would probably legend of the galactic heroes), or death note but romance (kaguya sama)

Or maybe even something like "Something like movie X but not Y or Z" (e.g. something like stranger things but not like Dark)

Can anyone point me to resources on how to build something like this? Thanks!

neon island May 6, 2024, 5:56 AM

#

dawn light is there a type of ML recommender system where in addition to the usual collabor...

Collaborative/content filtering doesn't need to be confined to 1-dimension (a scalar). Examples may only show scalar ratings to keep things simple. You can find the k-nearest neighbors based on vectors of any dimensionality.

To identify series "not like" Dark, point their vectors in the opposite direction (multiplying by -1). Make them "far away" for your distance function, so they are dissimilar for recommendation purposes.

Another RecSys mechanism is a Graph Neural Network (GNN). Different edge labels correspond to what you're thinking of as dimensions. A graph visualization may be easier to imagine, and GNNs can learn an optimal recommendation algorithm.

dawn light May 6, 2024, 7:37 AM

#

neon island Collaborative/content filtering doesn't need to be confined to 1-dimension (a sc...

can you elaborate on how to use KNN for this?

I'm only a bit familiar with the algorithm but how exactly would i implement something like " star wars but anime"?

(I just realized that what i'm trying to build is something very similar to the attached pic, but for movies instead of words)

toxic mortar May 6, 2024, 8:06 AM

#

I left this randomsearch to run overnight. Does this means that this model architecture has capped performance to 85% and I should change something in either data or architecture

deep veldt May 6, 2024, 1:27 PM

#

Can someone give me an example labelled dataset and unlabeled? I'm new I've searched it up but all I got was a long explanation with no example

agile cobalt May 6, 2024, 1:43 PM

#

deep veldt Can someone give me an example labelled dataset and unlabeled? I'm new I've sear...

labelled: You have Images of cats called "cat_1.png", "cat_2.png" and images of dogs called "dog_1.png", "dog_2.png"

unlabelled: You have no idea about what is in "image_1.png", "image_2.png", "image_3.png", "image_4.png"

deep veldt May 6, 2024, 1:44 PM

#

thanks

deep veldt May 6, 2024, 2:36 PM

#

How do I train images?

serene scaffold May 6, 2024, 2:47 PM

#

deep veldt How do I train images?

you don't train images. you train a model. you might train that model to do things with images.

deep veldt May 6, 2024, 2:48 PM

#

serene scaffold you don't train images. you train a model. you might train that model to do thin...

how do I do that?

serene scaffold May 6, 2024, 2:49 PM

#

deep veldt how do I do that?

there are lots of ways you could do it. you first need to decide specifically what the model needs to do.

deep veldt May 6, 2024, 2:51 PM

#

serene scaffold there are lots of ways you could do it. you first need to decide specifically wh...

I'm trying to make a model that predict the given image if its a dog or a cat for fun

agile cobalt May 6, 2024, 2:52 PM

#

technically there are some things you could call "training an image", but these are almost definitely not what you are looking for - in particular Style Transfer

99.99999% of the time you are training models, not images/texts/prompts/etc.

serene scaffold May 6, 2024, 2:53 PM

#

deep veldt I'm trying to make a model that predict the given image if its a dog or a cat fo...

okay, so first you need to procure a dataset with lots of images of dogs and cats. see if you can find one where the dimensions of every image are the same.

#

and there needs to be some way to know which image is which. like a text file structured like

image,animal
1.jpg,cat
2.jpg,cat
3.jpg,dog

deep veldt May 6, 2024, 2:55 PM

#

serene scaffold okay, so first you need to procure a dataset with lots of images of dogs and cat...

i got the dataset part but i don't know what application or the thing to train

serene scaffold May 6, 2024, 2:58 PM

#

deep veldt i got the dataset part but i don't know what application or the thing to train

look into convolutional neural networks for image classification with pytorch

past meteor May 6, 2024, 3:14 PM

#

toxic mortar I left this randomsearch to run overnight. Does this means that this model archi...

Do you have any baselines you can compare to?

deep veldt May 6, 2024, 3:18 PM

#

Are there any good courses, resources that I can learn on?

serene scaffold May 6, 2024, 3:20 PM

#

deep veldt Are there any good courses, resources that I can learn on?

!resources data science

arctic wedgeBOT May 6, 2024, 3:20 PM

#

Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

deep veldt May 6, 2024, 3:21 PM

#

thanks

twilit flower May 6, 2024, 3:42 PM

#

Creating a group for ml looking for friends

stoic gorge May 6, 2024, 4:56 PM

#

My partner has an interesting problem involving Jaccard index...
Comparing >300k unique subsets A (2^A), where |A| > 300 - and for each such set finding sets with which it has minimum Jaccard index...

We were thinking about representing the sets as 300 bits- that gives us fast calculation of the index itself (because bitwise operations), so only the number of calculations makes it costly -
bruteforce of everything-to-everything is (300k)² operations.

Does anyone have any ideas how to get it lower? We were thinking about clustering it somehow but |2^A| is so big it's hard to think of something that makes sense (there's a lot of pairs that don't intersect at all).
Or what to use to optimise the speed of calculations - I know basically nothing of numpy but there might some methods to make such repetitive calculations fast?

plush bobcat May 6, 2024, 5:09 PM

#

Hey there, I don't know if it's the right channel to ask but:

Guys I want to learn a second language after python, I've just started learning ML and want a lang to help me out in that field

Which one do you recommend and why,
Rust or C++?

And yes, I want it for ML/AI primarily, and maybe i could gain some insights into how things work, the compliers, interpreters all of this stuff

plush bobcat May 6, 2024, 5:11 PM

#

plush bobcat Hey there, I don't know if it's the right channel to ask but: Guys I want to le...

If cpp's the one then I'll i shoot myself in the foot, in a way that it blows my whole leg off

serene scaffold May 6, 2024, 5:11 PM

#

plush bobcat Hey there, I don't know if it's the right channel to ask but: Guys I want to le...

There isn't another language that would help you all that much for ML. I guess you could learn C++, since a lot of python libraries are implemented in it, but there are better uses of your time if you want to get better at ML.

plush bobcat May 6, 2024, 5:13 PM

#

serene scaffold There isn't another language that would help you all that much for ML. I guess y...

Thanks for the suggestion, could you tell what are those better uses in case my mindset is wrong?

serene scaffold May 6, 2024, 5:14 PM

#

plush bobcat Thanks for the suggestion, could you tell what are those better uses in case my ...

study the math for ML, read about the different use cases, implement different ones and analyze their performance.

ML is more about math and applying research methods than it is about systems engineering or programming.

plush bobcat May 6, 2024, 5:14 PM

#

Yea

#

I underestimated math for some reason, I'll be on it

#

Thanks

serene scaffold May 6, 2024, 5:15 PM

#

plush bobcat I underestimated math for some reason, I'll be on it

everyone does.

serene scaffold May 6, 2024, 5:17 PM

#

plush bobcat I underestimated math for some reason, I'll be on it

related meme

#

also replace "computer science" with "ml"

plush bobcat May 6, 2024, 5:22 PM

#

True af xd

#

Maths kinda intimidating but I've heard it's like a language, the moment you get fluent you'll be obsessed with it

#

And yea every bit of computer related stuff were made by math

wooden sail May 6, 2024, 5:27 PM

#

the "computer" in "computer science" deals with "computability" in mathematics: can you perform a certain action/do a computation in a finite number of well-described steps

#

traditionally CS is a branch of mathematics

#

(not anymore, now it largely depends on the university cuz it can also mean other stuff)

plush bobcat May 6, 2024, 5:28 PM

#

University just gets you into details, as you said, CS is a branch of math

#

And your explanation of "computer" and math was brilliant

iron basalt May 6, 2024, 6:55 PM

#

wooden sail (not anymore, now it largely depends on the university cuz it can also mean othe...

(Need to separate the two so you can't claim a math degree)

iron basalt May 6, 2024, 6:57 PM

#

serene scaffold related meme

Add geometry, trigonometry, linear algebra, calculus (yes, you need to learn it if you want your physics to not be buggy garbage (on the other hand, it gives speedrunners more to work with)), and more depending on the specific game.

#

(If you are responsible for the graphics, you got a whole lot more to learn)

worthy shoal May 6, 2024, 6:59 PM

#

opengl looks fun but i have 0 reason to learn it

iron basalt May 6, 2024, 7:00 PM

#

It's technically legacy now, since Vulkan is OpenGL 5.x (it was originally suppose to be the next version of OpenGL). But for a while it will still be around, because not everything has good Vulkan drivers yet (or ever will).

worthy shoal May 6, 2024, 7:00 PM

#

it's like 30 times harder though

iron basalt May 6, 2024, 7:01 PM

#

Apple is putting the nail in the coffin though.

iron basalt May 6, 2024, 7:01 PM

#

worthy shoal it's like 30 times harder though

Yeah, I don't recommend using it directly unless you have to.

worthy shoal May 6, 2024, 7:02 PM

#

yeah, i can't find graphics programming applicable other than in game development, maybe it could be a fun experience applying math to it and whatnot

iron basalt May 6, 2024, 7:03 PM

#

worthy shoal yeah, i can't find graphics programming applicable other than in game developmen...

It's crucial for all kind of things, including the main topic of this channel.

#

GPUs used to be just for graphics, now they are pretty general.

worthy shoal May 6, 2024, 7:04 PM

#

how can vulkan be possibly used in ml?

iron basalt May 6, 2024, 7:04 PM

#

worthy shoal how can vulkan be possibly used in ml?

You can use it to run models on the GPU.

worthy shoal May 6, 2024, 7:04 PM

#

aren't there better alternatives? it seems foreign to me that you'd use a graphics library like vulkan to do that

iron basalt May 6, 2024, 7:06 PM

#

It's pretty normal to use a graphics library like Vulkan for this. There are some alternatives, they are all very similar. Ones like CUDA are Nvidia only, Vulkan can even run on mobile.

#

Also CUDA can't render graphics on its own to a window, Vulkan has all the normal graphics stuff.

#

(Without extensions)

worthy shoal May 6, 2024, 7:07 PM

#

i'm gonna guess that something like this is easier than making a game with it

iron basalt May 6, 2024, 7:07 PM

#

Vulkan is an open standard, like OpenGL. There is also stuff like OpenCL, which is more like CUDA, but not just Nvidia.

#

Then there are some others.

iron basalt May 6, 2024, 7:08 PM

#

worthy shoal i'm gonna guess that something like this is easier than making a game with it

Yes, it actually tends to have less setup work.

worthy shoal May 6, 2024, 7:08 PM

#

openCL would be fun, if there was more resources with C++ 😄

iron basalt May 6, 2024, 7:08 PM

#

IMO OpenCL has the least boilerplate, and is the overall best API.

#

OpenCL technically also runs on more than just GPUs, it can do CPU, FPGA, etc.

craggy agate May 6, 2024, 7:29 PM

#

I need to talk to someone who is good at computer vision... Could you please DM me?

iron basalt May 6, 2024, 7:41 PM

#

plush bobcat Hey there, I don't know if it's the right channel to ask but: Guys I want to le...

C, then C++. C because it's the lingua franca of the programming world, lets you make fast things (like C++/Rust), is relatively simple, and C++ is directly based on it. C++ because like C, it's everywhere. Rust could be done instead of C++ after C, but since all the existing stuff uses C++, C++ (so you can read all that existing code).

#

Beyond this, GPU shader languages matter more, which includes stuff like CUDA, OpenCL C, HLSL, GLSL. For this CUDA or OpenCL C. CUDA if you already have an Nvidia GPU.

#

You could add other high level languages, but Python has kind of won that battle.

#

You don't have to be really good at C or C++ or Rust, you more importantly just have to be able to read it.

#

If you can read it, you can read other's code in open source projects to learn how to use them well / mimic them.

hollow escarp May 6, 2024, 9:03 PM

#

Any solutions for installing torch in alpine dockers?

neon island May 6, 2024, 9:06 PM

#

dawn light can you elaborate on how to use KNN for this? I'm only a bit familiar with the ...

'Star Wars' can be represented as a vector in n-dimensions having n scores from [0,1] in features like 'genre: sci-fi', 'producer: george lucas', 'best-picture: 1977', etc.

Reduce them down to a vector i on an arbitrary i-axis representing some 'ideal" measure of "Star Wars"-ness. Allowing for movies that out-Star Wars the original Star Wars, let's assume Star Wars has a score of 0.9977 from all of its n components projected onto i.

Add 2 basis vectors j and k along the j-, k-axes representing Japanese-ness and cartoon-ness. These aren't necessarily orthogonal to the i-axis (Star Wars borrowed from the 7 Samurai, so it may already have a 0.25j component embedded within itself).

j x k is a 2-D plane where (1, 1) represents 'anime'. i x j x k is a 3-space where the movies most similar to Star Wars are nearest to (0.9977, 1, 1) when their n-dimensional vectors are projected down onto this space.

serene scaffold May 6, 2024, 9:39 PM

#

hollow escarp Any solutions for installing torch in alpine dockers?

what problem did you run into?

hollow escarp May 6, 2024, 9:47 PM

#

serene scaffold what problem did you run into?

How to install torch/ultralytics in my docker for production usage

#

Im using mender to deploy my code to my devices and for application deployments i need to create docker file which will conatine all necessary libs for running the script and one of them is ultralytics

#

My app runs on python 3.11.0

serene scaffold May 6, 2024, 10:06 PM

#

hollow escarp How to install torch/ultralytics in my docker for production usage

I would see if you can find a base image that already has python3.11 and pytorch installed, and then extend the Dockerfile from there.

hollow escarp May 6, 2024, 10:07 PM

#

serene scaffold I would see if you can find a base image that already has python3.11 and pytorch...

Do you think thats "stable" solution?

serene scaffold May 6, 2024, 10:08 PM

#

hollow escarp Do you think thats "stable" solution?

yes? why wouldn't it be?

hollow escarp May 6, 2024, 10:09 PM

#

serene scaffold yes? why wouldn't it be?

Idk, always when i see some other img than systems with languages which i use im wondering how to not use them and install it by commands on alpine dockers instead

#

https://hub.docker.com/r/ultralytics/ultralytics/tags found this but thats like 6bg

#

serene scaffold May 6, 2024, 10:21 PM

#

hollow escarp Idk, always when i see some other img than systems with languages which i use im...

I guess you could do docker run -it alpine /bin/bash and figure out all the steps you'd need to do to get pytorch running, and then reconstruct those steps in the dockerfile.

hollow escarp May 6, 2024, 10:41 PM

#

serene scaffold I guess you could do `docker run -it alpine /bin/bash` and figure out all the st...

For sure i will try, but i'm wondering if there is anybody here who have had every same problem

serene scaffold May 6, 2024, 10:44 PM

#

hollow escarp For sure i will try, but i'm wondering if there is anybody here who have had eve...

looks like the other thing you'll want to consider is having the nvidia docker runtime installed

craggy agate May 6, 2024, 11:09 PM

#

Is anyone here familiar with computer vision? If yes, could you please DM me, I need advice for a project. Thank you.

serene scaffold May 6, 2024, 11:18 PM

#

craggy agate Is anyone here familiar with **computer vision?** If yes, could you please DM me...

please don't ask to ask. instead, ask a complete question about computer vision, so that people who know about computer vision can read it and start answering it without extra steps.

craggy agate May 6, 2024, 11:20 PM

#

serene scaffold please don't ask to ask. instead, ask a complete question about computer vision,...

Its a long question which would be more suitable in a conversation form. I just needed some advice

serene scaffold May 6, 2024, 11:20 PM

#

craggy agate Its a long question which would be more suitable in a conversation form. I just ...

at least give enough information in this chat to start the discussion. if someone said "I know about computer vision", what would be your next message?

craggy agate May 6, 2024, 11:22 PM

#

serene scaffold at least give enough information in this chat to start the discussion. if someon...

To give them details about what I want to do and ask for their opinion?

serene scaffold May 6, 2024, 11:22 PM

#

craggy agate To give them details about what I want to do and ask for their opinion?

yes. people want as much information as you're willing to give them before they make a decision about helping or not.

craggy agate May 6, 2024, 11:27 PM

#

Can anyone give me some advice on a object/target tracking project? It would consist of my drone using its front camera to detect me and start tracking me and making appropriate decisions to keep me centred in it's video feed as I move further away or out of it's frame.

serene scaffold May 7, 2024, 1:17 AM

#

@hollow escarp I got curious and tried to install pytorch in an alpine container. and once I finally got it installed, I couldn't import it because of some missing OS dependency.

but just installing pytorch makes the container more than two GB, so you might as well start with a more substantive base image.

past hearth May 7, 2024, 1:24 AM

#

hi

serene scaffold May 7, 2024, 1:24 AM

#

past hearth hi

hello and welcome to our wonderful data science and ai chat.

past hearth May 7, 2024, 1:50 AM

#

Strange question that includes other domains. I'm getting a ValueError when using pandas.apply

df[col] = df[col].apply(
  lambda x: (
    x.strftime(...) # <- vscode raises exception here
    if ((not pd.isnull(x)) and (x != ""))
    else x
  )
)

This one in particular:
https://pandas.pydata.org/pandas-docs/stable/user_guide/gotchas.html#using-if-truth-statements-with-pandas

I'm running this python code with VScode debugger. And it only occurs when I have "raised exceptions" ticked. When it's not ticked, the program runs smoothly and it SEEMS like theres no issues - as in the apply function is doing what it was intended to do.

At the point of the raised exception, the x value is the whole column in an array so I understand why it can show.?

type(x) # <pandas.core.indexes.datetimes.DatetimeIndex>

I have "raised exceptions" ticked and then press Step Over, and execution continues successfully - with each step transforming each value to readable timestamp.

The deployed version of this code runs without exceptions. Am I going insane? or is there an issue with vscode or the python debugger... or python interpreter? (using python3.10 in a venv)

serene scaffold May 7, 2024, 2:05 AM

#

past hearth Strange question that includes other domains. I'm getting a ValueError when usin...

can you show print(df.head().to_dict('list')) as text (no screenshot)?

dawn light May 7, 2024, 2:54 AM

#

neon island 'Star Wars' can be represented as a vector in *n*-dimensions having *n* scores f...

thanks again!

what would you suggest I do to obtain the features? I'm guessing this would be some embedding of some sort done on the synopsis/summary which extracts those features?

past hearth May 7, 2024, 3:37 AM

#

serene scaffold can you show `print(df.head().to_dict('list'))` as text (no screenshot)?

Thank you, sorry for the late reply give me a moment.

past hearth May 7, 2024, 4:04 AM

#

serene scaffold can you show `print(df.head().to_dict('list'))` as text (no screenshot)?

I did only head(2), because 5 is too large. Also changed all the values

# df.head(2).to_dict('list')
{'aa': ['2024-04-30 00:05:00+1234', '2024-04-30 00:10:00+1234'], 'bb': ['123123123', '123123123'], 'cc': ['123123123', '13123123'], 'dd': ['2309rjei230', '2309rjei230'], 'ee': ['', ''], 'ff': ['', ''], 'gg': ['', ''], 'hh': ['', ''], 'ii': ['', ''], 'jj': [0.0, 0.0], 'kk': ['U', 'U'], 'll': ['filename.json', 'filename2.json'], 'll': ['123123123', '123123'],
     'mm-timestamp': [Timestamp('2024-04-30 00:13:46+1234', tz='timezone/timezone'), Timestamp('2024-04-30 00:13:46+1234', tz='timezone/timezone')], 'nn': [0.0, 0.0], 'oo': ['edwed', 'edwed'], 'pp': ['wed', 'wde'], 'qq': [None, None], 'rr-timestamp': [Timestamp('2024-04-30 00:18:50.400544+1234', tz='timezone/timezone'), Timestamp('2024-04-30 00:18:50.400544+1234', tz='timezone/timezone')], 'ss': [True, True], 'tt': ['vee', 'vee'], 'uu': ['123', '123'], 'vv': [0.0, 0.0], 'ww': ['A', 'A'], 'xx': ['', ''], 'yy': [Timestamp('2024-05-01 00:01:08+1234', tz='timezone/timezone'), Timestamp('2024-05-01 00:01:08+1234', tz='timezone/timezone')], 'zz': ['qwerqwer', 'qwerqwer'], 'az': ['weqrt', 'weqrt']}

swift fulcrum May 7, 2024, 5:02 AM

#

hi, i was considering learning prompt engineering what course would yall suggest?

quaint loom May 7, 2024, 6:11 AM

#

Is there anyone here who have performed redundancy analysis (RDA)?

hollow escarp May 7, 2024, 8:13 AM

#

serene scaffold <@687597629387177994> I got curious and tried to install pytorch in an alpine co...

Do you want to share your progress

past meteor May 7, 2024, 8:24 AM

#

hollow escarp Do you want to share your progress

Any reason why you can't use this? https://hub.docker.com/r/pytorch/pytorch

#

Maybe a better question, any reason why you can't export your torch model to ONNX and then make/use an ONNX image without needing the torch dependency?

hollow escarp May 7, 2024, 8:25 AM

#

past meteor Maybe a better question, any reason why you can't export your torch model to ONN...

I created script which uses YOLOv8 to detect some stuff from my camera

#

More like i created license plate recognition

#

Which uses YOLOv8 to get Location of license Plate of img

past meteor May 7, 2024, 8:27 AM

#

Through what are you using Yolo? Darknet? Torch? Tensorflow? Opencv?

hollow escarp May 7, 2024, 8:56 AM

#

past meteor Through what are you using Yolo? Darknet? Torch? Tensorflow? Opencv?

Torch, i followed this installation process: https://medium.com/@pat.x.guillen/a-step-by-step-guide-to-running-yolov8-on-windows-122cb586b567

Medium

A Step-by-Step Guide to Running YOLOv8 on Windows

Part 1

neon island May 7, 2024, 9:03 AM

#

dawn light thanks again! what would you suggest I do to obtain the features? I'm guessing ...

See the fine suggestion of the sklearn.neighbors.KNeighborsClassifier in scikit-learn earlier by @data.exs

It's easy to pick out bespoke features in a specific example, harder when answering a more general query and your data set has 1000s of features like you can find on IMDb ($), TMDB, MovieLens or try a small hand-crafted pd.DataFrame with just Star Wars, Space Cruiser Yamato, 7 Samurai, Spaceballs, Family Guy's Star Wars Parody, Titanic and Rocky to start with.

Feature engineering is trying different sets of features when training your classifier to see which features produce the best recommendations.

GroupLens

MovieLens

GroupLens Research has collected and made available rating data sets from the MovieLens web site ( The data sets were collected over various periods of time, depending on the size of the set. …

hollow escarp May 7, 2024, 9:14 AM

#

hollow escarp Torch, i followed this installation process: https://medium.com/@pat.x.guillen/a...

So im looking for a way to use my model on my raspberry PI for detecting stuff from My already trained model

past meteor May 7, 2024, 9:25 AM

#

hollow escarp So im looking for a way to use my model on my raspberry PI for detecting stuff f...

So, as mentioned I think you should convert it to ONNX and deploy that on your raspi

#

https://pytorch.org/docs/stable/onnx.html

dawn light May 7, 2024, 9:26 AM

#

neon island See the fine suggestion of the `sklearn.neighbors.KNeighborsClassifier` in sciki...

thanks! i'll check this out

past meteor May 7, 2024, 9:26 AM

#

And then you can use this on your raspi https://onnxruntime.ai/

ONNX Runtime | Home

Cross-platform accelerated machine learning. Built-in optimizations speed up training and inferencing with your existing technology stack.

hollow escarp May 7, 2024, 9:34 AM

#

past meteor So, as mentioned I think you should convert it to ONNX and deploy that on your r...

Okay, also i should add that i need to build it to docker to deploy that docker to mender ( thats OTA updates provider ) which then deploys it on devcies

past meteor May 7, 2024, 9:38 AM

#

hollow escarp Okay, also i should add that i need to build it to docker to deploy that docker ...

Sure. You basically have 2 steps now. You need to compile to torch model to ONNX and then load that into an Docker image that has the ONNX runtime.

Personally I'd do this with a multi-stage build. In the first stage I'd basically use the Pytorch base image I linked initially and install all packages I need on top of it to build the ONNX file.

The second stage is one that has the ONNX runtime, you copy the file you made in stage 1 and you're ready 🚀 .

(I simplified it, in practice I'd have at least 3 steps but that's ok. You'll at least use 2 images, that's the important point).

I'd start out testing this workflow in a notebook first to see if it works and so on. A massive advantage is that you don't need to ship 2 gigs of torch to your platform that is just doing inference (the raspi)

#

You can even make it simpler and just compile and version control the binary you get from compiling the torch model. You can do that if you're certain it won't change. Your Dockerfile becomes easier then.

hollow escarp May 7, 2024, 9:39 AM

#

Okay, and also i have model in .pt format ( it's like 20k img model ) which was trained by someone else

#

Isn't that any problem for converting it to ONNX format?

past meteor May 7, 2024, 9:39 AM

#

What do you mean with 20k img model?

hollow escarp May 7, 2024, 9:40 AM

#

past meteor What do you mean with 20k img model?

I mean that this model contained more than 20k img for traning proccess

past meteor May 7, 2024, 9:41 AM

#

Just fyi, the model doesn't contain the images. Example: If you have 1 single neuron and run 10000k images on it it'll still be small.

hollow escarp May 7, 2024, 9:41 AM

#

Ye ye i know that

past meteor May 7, 2024, 9:41 AM

#

So why did you mention it? Maybe I'm missing something.

hollow escarp May 7, 2024, 9:41 AM

#

but Im just askig how to convert that .pt model to ONNX supported format?

past meteor May 7, 2024, 9:42 AM

#

It's in the link I sent you

hollow escarp May 7, 2024, 9:42 AM

#

oh ye, thx

past meteor May 7, 2024, 9:43 AM

#

If you don't mind, I'll stop answering. I gave you a lot of information to digest and I think you should read the links, some other docs, let it sink in etc. and if you have more questions afterwards just tag me 👊

hollow escarp May 7, 2024, 9:43 AM

#

Okay, im really glad for your support

wide wolf May 7, 2024, 9:46 AM

#

So I started working on a python ML project few weeks ago and I don't know Python, ML or Pandas previously. I've got some questions regarding my dataframe structure. Is this a channel I could ask this stuff in?

#

So I'm doing ML regarding stock companies and their quarterly results. So I got 4 rows of quarterly results for a company X which I put into a dataframe, and then I use multiindex to store all these rows together. My reasoning was that if I flatten the dataframe then the ML model won't be able to 'identify' the 4 rows belonging to a specific company.

#

So '181', '356' and '59' are company IDs here

#

Will this work, or am I messing it all up?

#

Basically i'm attempting to make a pandas panel, I think (but been deprecated)

#

yeah

#

yeah, but I mean more specifically that I'm using multiindex and 'grouping' them (0-4) as you see here

#

but if above looks fine/normal to you then I guess my above approach is fine

#

what missing indices?

past meteor May 7, 2024, 9:54 AM

#

Haven't been following the conversation but ordinal encoding is really bad

#

I'd say always do one hot unless you're doing a decision tree and even when you are it's still dangerous

#

Maybe in NLP but in tabular datasets is not good

#

Imagine you have small medium and large and you do an ordinal encoding, you're saying large is X3 small

#

That's typically the danger

#

I used to do it for high cardinality data (like postal codes) but it's no longer necessary as we now have target encoding in sklearn https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.TargetEncoder.html

scikit-learn

sklearn.preprocessing.TargetEncoder

Examples using sklearn.preprocessing.TargetEncoder: Release Highlights for scikit-learn 1.3 Comparing Target Encoder with Other Encoders Target Encoder’s Internal Cross fitting

wide wolf May 7, 2024, 9:58 AM

#

I can't onehot-encoding here since I would get dimension scaling beyond what's reasonable, since every 4 rows is a unique company

#

but since its always 4 rows for each company, I figured there wouldn't be an issue with hierarchy

#

since it's all 'balanced'

past meteor May 7, 2024, 9:58 AM

#

Target encode them then

late spear May 7, 2024, 9:59 AM

#

Is it possible to create a conversational AI that diagnoses a patient's mental illness? The input would be the patient's speech converted to text and their facial expressions recognized for emotions,?

past meteor May 7, 2024, 10:01 AM

#

Yeah, that's the reason. Given enough data it should sort itself out but it's a last resort approach imo

#

I'm being pedantic though 😂

wide wolf May 7, 2024, 10:08 AM

#

Sorry for spamming, but trying to work out of I'm messing up when I'm merging my different dataframes. So I store like 5 companies in 'dfs' (4 rows each, so 20 rows total), enumerate over them and put them in a hashkey table with 'i' being the key. And then concat().

Since you were talking about 'missing indices', i dunno if my mistake here is that I need to for-loop over each of the 4 rows as well and assign company ID ('i) to them or if behaviour in screenshots is all correct

past meteor May 7, 2024, 10:08 AM

#

I am as well, but I'd say benchmark it for tabular data. It can easily be a hyperparemeter.

#

Just empirical evidence showed me it's usually bad 🤷

wide wolf May 7, 2024, 10:13 AM

#

The whole reason I'm doing all of above is because I have data over time (quarters) and I dunno how else to group it together for my ML code. I can't do avg or mean and just 1 row since that doesn't capture change over time. I could skip the multiindex part (just flatten the big dataframe) but then I'm worried the ML code won't be able to identify that each 4 rows are 'tied together'

#

I'm new to all of this so dunno best practices

#

whats NLP?

past meteor May 7, 2024, 10:16 AM

#

Yeah I feel like these are dying art

narrow tiger May 7, 2024, 10:17 AM

#

so what does being data scientist mean?
just someone who can run some data through an algo to generate meaningful charts?
what do i need to learn so i can call myself data scientist

#

to get good at machines learning algos i think i'll need alot of maths

#

https://www.youtube.com/watch?v=b7NnMZPNIXA has anyone watched this

past meteor May 7, 2024, 10:19 AM

#

No, tabular data is a lot more effort and the results are very variable so it seems like it's very much in the trough of disillusionment

wide wolf May 7, 2024, 10:20 AM

#

Above is pre encoding/split. I'm doing SQL queries to get 4 rows for each company and then merging it together into a multiindex

#

its multiindex

#

pandas panel thing

#

I think

past meteor May 7, 2024, 10:20 AM

#

I avoid multi index stuff etc. as the plague

wide wolf May 7, 2024, 10:21 AM

#

Yeah I could flatten it all, but then row 5 (currenty '0 VTVT') will be '5 VTVT'

#

so I lose the 0-4 grouping, and I dunno the impact of that for my ML learning code

past meteor May 7, 2024, 10:22 AM

#

As much as I dislike Pandas, you got to learn it

#

I know you do

#

I'm mostly referring to the other user

wide wolf May 7, 2024, 10:23 AM

#

I'll run a few tests without multiindex, see what happens

past meteor May 7, 2024, 10:24 AM

#

I'm a big Polars fan ofc but what bugs me are breaking changes

#

But pandas has a fair share of those as well tbh

#

Very true

wide wolf May 7, 2024, 11:53 AM

#

Btw, coding with ChatGPT as a helper is amazing

#

made my python learning 10x easier

lapis sequoia May 7, 2024, 1:01 PM

#

Heya
when we have a matrix
M=np.arange(1,11).reshape(5,2)
what does M[2] mean here?
what would M[1][1] be like and why?

deep veldt May 7, 2024, 1:05 PM

#

differences between attribute and future?

craggy agate May 7, 2024, 1:06 PM

#

lapis sequoia Heya when we have a matrix M=np.arange(1,11).reshape(5,2) what does M[2] mean ...

M[2] would return [5,6]

#

M[2] is the third row of the matrix

craggy haven May 7, 2024, 1:07 PM

#

Guys do you think it is going to be useful to learn cuda for machine learning jobs?

craggy agate May 7, 2024, 1:07 PM

#

lapis sequoia Heya when we have a matrix M=np.arange(1,11).reshape(5,2) what does M[2] mean ...

M[1][1] means you're first selecting the second row and then from that row, you're selecting the second element. So M[1][1] would be 4.

past meteor May 7, 2024, 1:07 PM

#

deep veldt differences between attribute and future?

You mean a feature? They're synonyms.

deep veldt May 7, 2024, 1:07 PM

#

past meteor You mean a feature? They're synonyms.

oh

serene scaffold May 7, 2024, 1:27 PM

#

narrow tiger so what does being data scientist mean? just someone who can run some data throu...

there is no consistency in what a "data scientist" is or does. it varies between companies, and even within companies.

serene scaffold May 7, 2024, 1:28 PM

#

narrow tiger to get good at machines learning algos i think i'll need alot of maths

inasfaras one can be "good at an algorithm", yes.

craggy agate May 7, 2024, 1:39 PM

#

craggy haven Guys do you think it is going to be useful to learn cuda for machine learning jo...

Learning how to use CUDA GPUs with your models would be more useful, learning how CUDA works, not so much.

desert oar May 7, 2024, 1:58 PM

#

narrow tiger so what does being data scientist mean? just someone who can run some data throu...

usually "data scientist" is some combination of "machine learning" and "applied statistics". senior DS tends to be involved with long-term product strategy and work directly with senior-level business stakeholders. usually there is an expectation that you can operate with a reasonable level of independence, cleaning and sometimes even gathering your own data

#

the "data scientist" job title itself has come to indicate a kind of generalist jack-of-all-trades role, analogous to "full-stack developer" in software development. bigger organizations tend to have more specialized titles.

#

no one person can be good at all of it. so typically industry data scientists end up being really good at a few things and less good at other things, and self-select into jobs that make sense for their skills, and also tend to work on upskilling throughout their careers

#

understated but important job characteristics including writing/communication skills and project planning

narrow tiger May 7, 2024, 2:57 PM

#

desert oar usually "data scientist" is some combination of "machine learning" and "applied ...

Applied statistics part is big no-no don't think I'll enjoy that for long.
Thanks for extensive explanation it did resolve alot of queries for me.
Any data science related field which has more focus on programming/ coding part then others

agile cobalt May 7, 2024, 2:57 PM

#

data science is literally applied statistics

#

machine learning is just statistics wearing a hood

#

if you want to focus on programming, then do normal programming | backend development
if anything maybe model deployment/devops/mlops might be closer to what you are thinking, but that still depends on the place

narrow tiger May 7, 2024, 3:01 PM

#

agile cobalt if you want to focus on programming, then do normal programming | backend develo...

I have actually spent alot of time into this but no jobs/ scope for frontend and machine learning jobs only

#

What do people do machine learning engineering jobs and LLMs

#

Ig stuff will make sense soon enough thanks everyone for answering

#

Halfway through this
https://youtu.be/b7NnMZPNIXA?si=QaEgK6Mr7vzGSHW-

YouTube

Computer Science with Dr. RCB

Mathematics of neural network

In this video, I will guide you through the entire process of deriving a mathematical representation of an artificial neural network. You can use the following timestamps to browse through the content.

Timecodes
0:00 Introduction
2:20 What does a neuron do?
10:17 Labeling the weights and biases for the math.
29:40 How to represent weights and ...

▶ Play video

#

Doesn't seem very complicated so far hopefully this is mid lvl 😂

#

Any thoughts

desert oar May 7, 2024, 3:16 PM

#

narrow tiger Applied statistics part is big no-no don't think I'll enjoy that for long. Thank...

Any data science related field which has more focus on programming/ coding part then others

Data engineering.

ML engineering if you like math and numerical computing.

narrow tiger May 7, 2024, 3:16 PM

#

Thanks

hollow escarp May 7, 2024, 3:49 PM

#

@past meteor so i converted my pt model to onnx model ( using following command yolo export model=<my_model> format=onnx imgsz=640,640 ) and now im having trouble with reading correct values. Before my script for getting correct box places was:

def detect_closest_license_plate(image, model: YOLO) -> ClosestPlate:
  prediction = model(image)[0]

  camera_center_x, camera_center_y = image.shape[1] // 2, image.shape[0] // 2
  closest_plate: ClosestPlate = None
  closest_distance = float('inf')

  for license_plate in prediction.boxes.data.tolist():  # Assuming prediction.xyxy[0] contains bounding box predictions
    x1, y1, x2, y2, conf, cls = license_plate  # Extract bounding box coordinates and confidence
    plate_center_x, plate_center_y = (x1 + x2) // 2, (y1 + y2) // 2
    distance = np.sqrt((plate_center_x - camera_center_x)**2 + (plate_center_y - camera_center_y)**2)
    if distance < closest_distance:
      closest_plate = ClosestPlate.from_dict({
        'bbox': (x1, y1, x2, y2),
        'confidence': conf,
        'class_': cls,
        'plate_center': (plate_center_x, plate_center_y),
        'distance_to_camera': distance
      })

      closest_distance = distance
    
  return closest_plate

Now im getting my predcitions that way: pred = session.run(None, {"images": to_numpy(proccess_image("./test_photos/test.jpg"))})

And i cant find correct corresponding values

Thats output:

[array([[[     22.407,       27.52,      37.087, ...,      560.26,      582.73,      612.35],
        [     6.8977,      6.7349,      5.8882, ...,      628.67,       628.4,      626.59],
        [     13.573,      12.881,      11.815, ...,      275.83,      318.33,      388.62],
        [ 6.5565e-06,  3.8147e-06,  4.2707e-05, ...,  9.4175e-06,   5.126e-06,  1.8477e-06]]], dtype=float32)]```

limber token May 7, 2024, 4:11 PM

#

How would you guys go about publishing ML code that needs a very large dataset for it to work? Compress it using parquet and unpack it via code? The code is on GitHub, I'm more worried about the dataset

past meteor May 7, 2024, 4:13 PM

#

hollow escarp <@260493929047130113> so i converted my pt model to onnx model ( using following...

I wouldn't immediately know what's up, it's hard to debug if I'm not running the code

past meteor May 7, 2024, 4:13 PM

#

limber token How would you guys go about publishing ML code that needs a very large dataset f...

Does your data need to be public too?

#

Can't you just train, persist the model and then use it?

buoyant vine May 7, 2024, 4:14 PM

#

limber token How would you guys go about publishing ML code that needs a very large dataset f...

Depends what you are doing.

At work we store data via safetensors https://huggingface.co/docs/safetensors/index and use DVC https://dvc.org/ for managing the large files with Git and S3.

If your not going to be pulling often, you can use Git LFS with github as well, but it gets expensive quickly on bigger datasets (i.e. object > 100MB)

Safetensors

Data Version Control · DVC

Open-source version control system for Data Science and Machine Learning projects. Git-like experience to organize your data, models, and experiments.

plush bobcat May 7, 2024, 4:15 PM

#

iron basalt C, then C++. C because it's the lingua franca of the programming world, lets you...

So in conclusion:

• C/C++ because they're the mother language and by learning (not mastering, just good enough to be able to read and mimic the source code) it I can make speed/performance critical applications and understand codes better in general

• C/C++, because it's so old that almost every where you look is C/C++ and not Rust, not that you can't write the same program in Rust, C/C++ just has more sources code written by it

• I just kinda need to able to read C/C++ for CUDA

Am i right?

past meteor May 7, 2024, 4:18 PM

#

buoyant vine Depends what you are doing. At work we store data via `safetensors` https://hug...

is DVC worth looking into? Currently I do stupid things like manually versioning my data and keeping several versions of critical parts of my code I change because I want/need absolute reproducibility 🥴

#

I'm using MLflow + optuna + tensorboard + dagster

#

I could add more tooling, if it's worth it

buoyant vine May 7, 2024, 4:20 PM

#

So yes and no, I both love and hate it.

If you have a Git LFS setup, then use git LFS it is just so much smoother in terms of configuration and pulling changes.
Otherwise if you have big objects and need to store on S3, then DVC is great for that, but it is more manual than git LFS and has a pretty awful caching system that often requires you to delete the entire local cache in order to pull new files from the remote.

#

But, it is the only modern tool that supports effectively big objects tracked by git... On S3 or other storage with minimal setup

#

😅 So I guess it is a tradeoff

past meteor May 7, 2024, 4:20 PM

#

Would you use it for tabular data?

#

That I'm just storing in postgres tbf

#

All object related stuff is in minio DB, we could use DVC there

buoyant vine May 7, 2024, 4:21 PM

#

I would use it just for managing dataset files or big binary files with git only

#

The repro stuff, param tracking, etc... Is completely useless IMO and you'll spend most of your time debugging why DVC isn't working than actually doing the runs or the code

past meteor May 7, 2024, 4:22 PM

#

good to know, those are all new features as well

#

I looked into DVC a couple of years ago and that wasn't there afaik

buoyant vine May 7, 2024, 4:22 PM

#

That being said, My experience of PyTorch Lightning + Neptune has been excellent for tracking artefacts, metrics, etc...

desert oar May 7, 2024, 4:22 PM

#

buoyant vine Depends what you are doing. At work we store data via `safetensors` https://hug...

safetensors is new to me 👀

plush bobcat May 7, 2024, 4:22 PM

#

btw guys i got a question,

TensorFlow or Pytorch

i'm just about to start learning ML and i cant decide which one's better

past meteor May 7, 2024, 4:23 PM

#

plush bobcat btw guys i got a question, TensorFlow or Pytorch i'm just about to start lear...

Pytorch

plush bobcat May 7, 2024, 4:23 PM

#

past meteor Pytorch

why?

past meteor May 7, 2024, 4:23 PM

#

buoyant vine That being said, My experience of PyTorch Lightning + Neptune has been excellent...

Yup, same idea with Mlflow

past meteor May 7, 2024, 4:23 PM

#

plush bobcat why?

Simply because it's more common nowadays. using the most popular tool has a lot of merit

desert oar May 7, 2024, 4:23 PM

#

buoyant vine The repro stuff, param tracking, etc... Is completely useless IMO and you'll spe...

i actually think DVC is great for managing "raw" data within your project. much better than a makefile

buoyant vine May 7, 2024, 4:23 PM

#

desert oar safetensors is new to me 👀

Safetensors is excellent, for us at least 😅 Since we train models via Python but deploy via Rust, so it is often very useful to be able to have that simple to use API which both langs can use

#

and store data efficiently and load quickly

plush bobcat May 7, 2024, 4:23 PM

#

past meteor Simply because it's more common nowadays. using the most popular tool has a lot ...

Yea

desert oar May 7, 2024, 4:24 PM

#

buoyant vine Safetensors is excellent, for us at least 😅 Since we train models via Python bu...

that's not our use case, but having anything standardized is better than DIYing something in json or npz or arrow

past meteor May 7, 2024, 4:24 PM

#

Pretty satisfied with my mflow + dagster + lightning + optuna set up but having repro on the data side is pretty bad for me

#

I think I'll just start logging the git commit with my experiments

desert oar May 7, 2024, 4:24 PM

#

also i think DVC is really useful for sharing data within a team, dvc push/pull specifically

buoyant vine May 7, 2024, 4:24 PM

#

desert oar that's not our use case, but having anything standardized is better than DIYing ...

Yeah that is fair, the biggest gain IMO is the setup is super simple in all the langs, unlike arrow/parquet or other which often has a bunch of extra work around it

past meteor May 7, 2024, 4:25 PM

#

Then I can freely change things but I can get repro easily by checking out at that commit 👀

desert oar May 7, 2024, 4:25 PM

#

what's the value of neptune vs. mlflow or any of the 1e12 other options out there right now?

buoyant vine May 7, 2024, 4:25 PM

#

desert oar also i think DVC is really useful for sharing data within a team, dvc push/pull ...

yeah that is what a I mean by tracking via Git, but it has a bad habbit of needing the local cache cleared if a file changes and you need to pull

desert oar May 7, 2024, 4:25 PM

#

past meteor Then I can freely change things but I can get repro easily by checking out at th...

yeah that's the whole point of DVC, it's great

desert oar May 7, 2024, 4:25 PM

#

buoyant vine yeah that is what a I mean by tracking via Git, but it has a bad habbit of needi...

ah, the cache... yeah.

past meteor May 7, 2024, 4:25 PM

#

Then I'll have to look closesly at DVC tomorrow

#

I always MacGyver until it gets bad and it's getting bad right now 😂

buoyant vine May 7, 2024, 4:26 PM

#

desert oar what's the value of neptune vs. mlflow or any of the 1e12 other options out ther...

I like MLFlow, but equally Neptune is so simple to setup, and the free tier is actually super awesome, 200GB of storage for free is a lot for most people

#

UI is great, integrations are excellent, etc...

past meteor May 7, 2024, 4:26 PM

#

Oh yeah, I think a big caveat is I'm doing things on-prem

#

that's why I went with MLflow I think

buoyant vine May 7, 2024, 4:26 PM

#

Makes sense

desert oar May 7, 2024, 4:27 PM

#

past meteor Then I'll have to look closesly at DVC tomorrow

there's also Dud, which is meant to be a kind of stripped-down DVC, but start with DVC first IMO

#

for completeness: https://kevin-hanselman.github.io/dud/

| Dud

Dud # Website | Install | Getting Started | Source Code
Dud is a lightweight tool for versioning data alongside source code and building data pipelines. In practice, Dud extends many of the benefits of source control to large binary data.
With Dud, you can commit, checkout, fetch, and push large files and directories with a simple command line i...

desert oar May 7, 2024, 4:28 PM

#

buoyant vine I like MLFlow, but equally Neptune is so simple to setup, and the free tier is a...

that's nice! but it's proprietary saas right? just to be clear

#

(compared to mlflow for example)

#

snowflake also recently rolled out a model registry thing, we might start using that to deploy models directly in-warehouse

#

not sure if it has any useful tracking/versioning features though

past meteor May 7, 2024, 4:28 PM

#

proprietary saas
[...]
snowflake

buoyant vine May 7, 2024, 4:28 PM

#

desert oar that's nice! but it's proprietary saas right? just to be clear

Yes, very much just SAAS

desert oar May 7, 2024, 4:29 PM

#

past meteor > proprietary saas > [...] > snowflake

yeah but we already pay for snowflake 😂

#

buying a new product would be harder with our current finances

past meteor May 7, 2024, 4:29 PM

#

snowflake is the one cloud tool I'm really not familiar with

#

is it basically just like big query

#

but not big query?

desert oar May 7, 2024, 4:29 PM

#

we've already significantly reduced our snowflake usage, to the point where it's an issue for our contract renewal

past meteor May 7, 2024, 4:30 PM

#

typical separation of storage and compute, data in buckets, snowflake compute puts a view over it and you can query it with SQL and pay through your nose?

desert oar May 7, 2024, 4:30 PM

#

past meteor is it basically just like big query

"yes" in that it's a cloud data warehouse built around SQL and column-oriented analytics workloads

past meteor May 7, 2024, 4:30 PM

#

Meant SQL there*

desert oar May 7, 2024, 4:30 PM

#

past meteor typical separation of storage and compute, data in buckets, snowflake compute pu...

i think that's how it works internally, but the pricing is a lot more opaque than that. i think at most you can choose which of the big 3 clouds to run on, and that's it

desert oar May 7, 2024, 4:31 PM

#

past meteor Meant SQL there*

they also now have a spark-like python interface called "snowpark"

past meteor May 7, 2024, 4:31 PM

#

Where I'm from everyone drank the databricks koolaid

#

Everyone is on lakehouse

desert oar May 7, 2024, 4:32 PM

#

they're the other way around, they want to be a warehouse where you can do everything in-warehouse. some "lake" features too though

#

you can even now deploy arbitrary code in containers. so you can run arbitrary code directly in-warehouse and pay for it with a uniform compute credit (instead of slinging data back and forth between the warehouse and e.g. ECS)

past meteor May 7, 2024, 4:32 PM

#

I feel like snowflake is pretty lakehouse-ish as well right? Don't you land data into snowflake and not into S3/Azure blob first?

#

And then you use snowflake's compute to transform it right inside the "warehouse"/lake/... whatever it is

desert oar May 7, 2024, 4:32 PM

#

snowflake supports both

past meteor May 7, 2024, 4:33 PM

#

Ok, then my intuition of what it was was correct

desert oar May 7, 2024, 4:33 PM

#

it has stages, which are just blob stores like S3. but you can mount an S3 bucket transparently as a stage

#

so it's blob storage + OLAP distributed-ish + external integrations + a spark-like interface if you want it

past meteor May 7, 2024, 4:34 PM

#

basically like databricks' lakehouse yeah

desert oar May 7, 2024, 4:34 PM

#

is "databricks lakehouse" a product? or are you just talking about the pattern of building a lakehouse around databricks and dbfs?

#

i haven't used databricks since 2020

buoyant vine May 7, 2024, 4:35 PM

#

Problem I have with Snowflake is the vendor lock in feels worse than AWS tbh

past meteor May 7, 2024, 4:35 PM

#

not a product, it's just delta + databricks + marketing

desert oar May 7, 2024, 4:35 PM

#

buoyant vine Problem I have with Snowflake is the vendor lock in feels worse than AWS tbh

is it any worse than any other data warehouse though?

past meteor May 7, 2024, 4:36 PM

#

Is snowflake "serverless"?

desert oar May 7, 2024, 4:36 PM

#

yeah it's pure saas

#

not self-hostable and completely opaque compute (priced in "credits")

#

maybe it's because we're using airflow + dbt + containers but i really don't feel that badly locked-in. we will be much more locked-in once we are deploying compute directly in-warehouse, but at least we already have a non-locked-in solution that we won't ever fully get rid of.

buoyant vine May 7, 2024, 4:38 PM

#

desert oar is it any worse than any other data warehouse though?

I think it is about inline with BigQuery lockin wise, but without the other GCP service support and no bandwidth costs.

Athena I think is pretty easy to replace, since it is litterally just re-skinned Trino, which tbh if we re-did our datalake now, I'd probably go with Trino as our main layer, so at least changing backend didnt change the queries

past meteor May 7, 2024, 4:38 PM

#

SaaS != serverless

#

Can you pay for a standard amount of compute that stays on during business hours with absolutely transparent pricing you then scale down during the night? Give or take backpressure

#

Or is the only model pay-as-you-use?

desert oar May 7, 2024, 4:39 PM

#

ah

#

https://docs.snowflake.com/en/user-guide/cost-understanding-compute#virtual-warehouse-credit-usage

buoyant vine May 7, 2024, 4:39 PM

#

IIRC it is a price per storage GB used, price per data scanned, etc...

#

Which I think all the big warehouses use really

past meteor May 7, 2024, 4:40 PM

#

that's pretty bad

#

it's really bad actually imho

buoyant vine May 7, 2024, 4:41 PM

#

Shout out to snellar BTW for the absolutely insane engine which unfortunately seems to have died https://github.com/SnellerInc/sneller

GitHub

GitHub - SnellerInc/sneller: World's fastest log analysis: λ + SQL ...

World's fastest log analysis: λ + SQL + JSON + S3. Contribute to SnellerInc/sneller development by creating an account on GitHub.

desert oar May 7, 2024, 4:41 PM

#

our warehouse is basically always-on anyway so it's easy for us to estimate pricing

past meteor May 7, 2024, 4:41 PM

#

Maybe I'm paranoid but I wouldn't feel comfy buying into a service that doesn't allow me to pick a non serverless model

past meteor May 7, 2024, 4:42 PM

#

desert oar our warehouse is basically always-on anyway so it's easy for us to estimate pric...

That's the issue, I feel like pay-as-you-use is nearly always more expensive than always-on

#

So if an always-on version exists and your things is ... always on, you can switch and save money

buoyant vine May 7, 2024, 4:42 PM

#

I think it is a trade off, I think of lot of people prefer having the serverless setup where it is cheaper initially, but then get bitten later on as their scale grows

#

But people like the convenience

desert oar May 7, 2024, 4:42 PM

#

past meteor That's the issue, I feel like pay-as-you-use is nearly always more expensive tha...

yeah but then you negotiate a contract for credits + overage pricing if you know you're going to be always-on

past meteor May 7, 2024, 4:43 PM

#

sure, I'd 100 % start serverless

desert oar May 7, 2024, 4:43 PM

#

it's all very enterprise-ey

past meteor May 7, 2024, 4:43 PM

#

but on say Azure, many services let you switch transparently

#

Like, you have a serverless SQL server (absolutely ridicuolous naming) with a managed counterpart

desert oar May 7, 2024, 4:43 PM

#

the difference is that you can't blow your cost into interstellar orbit as easily as you can on google or AWS

past meteor May 7, 2024, 4:44 PM

#

There's serverless databricks versus de-facto managed, where you switch your cluster on and off "manually" etc.

desert oar May 7, 2024, 4:44 PM

#

larger mean, lower variance -- that kind of thing

past meteor May 7, 2024, 4:46 PM

#

desert oar the difference is that you can't blow your cost into interstellar orbit as easil...

I think the point I'm trying to make is that gcp, aws and azure definitely have versions with transparent pricing in the "managed" section of their offering

#

And they're still easier than using EC2/Azure VM

limber token May 7, 2024, 4:46 PM

#

past meteor Does your data need to be public too?

Yes, it's going to be part of a scientific article so I need to make the datasets used to train the models public, and they're really just cherrypicks of multiple datasets

desert oar May 7, 2024, 4:46 PM

#

yeah, i think i get it. my response is "snowflake doesn't have that but i also am not aware of anyone having issues with it beyond it just being expensive overall"

iron basalt May 7, 2024, 4:46 PM

#

plush bobcat So in conclusion: • C/C++ because they're the mother language and by learning (...

Yes, an important detail here is that C is more than a language at this point. It also acts as the interface between languages and the operating system. So for any language to be able to do anything it needs to pass through C's stuff at some point. This means that if you know both Python and C, you can get access to almost all libraries / utilities, and also make fast things that you can use in Python (bind some C library that does not have a Python module yet or if your own private stuff). C++ is not as necessary as C, but it does make programming things a lot easier, is the foundation for C++, and so it's used everywhere (even more access to more libraries (but even just knowing C will let you read most of it)). C is also not going anywhere any time soon, and changes too slowly for you to need to keep up with its features like with other languages.

past meteor May 7, 2024, 4:47 PM

#

last addition to this is that I believe cloud marketing has won in convincing us pay-as-you-use is the only viable option so people don't complain/have issues with it 😄

limber token May 7, 2024, 4:47 PM

#

buoyant vine Depends what you are doing. At work we store data via `safetensors` https://hug...

Great, will check it out! I'm not going to be pulling frequently, just pushing

sturdy kiln May 7, 2024, 4:47 PM

#

tryna do Univariate time series on a JDIA dataset, how can i determine which variable ill be using univariate on? do i just wing it and use any one

desert oar May 7, 2024, 4:50 PM

#

past meteor last addition to this is that I believe cloud marketing has won in convincing us...

i think in the case of snowflake people use it in spite of its pricing

#

for example we also use aiven's managed timescaledb and that's just a flat price per month

desert oar May 7, 2024, 4:50 PM

#

sturdy kiln tryna do Univariate time series on a JDIA dataset, how can i determine which var...

what is JDIA?

#

normally when doing a data project you have an actual real-world objective in mind, so you do whatever accomplishes that goal

plush bobcat May 7, 2024, 4:53 PM

#

iron basalt Yes, an important detail here is that C is more than a language at this point. I...

True, so since almost everything needs to be passed by C and knowing it will surely deepen my knowledge, its a good idea to learn this but as you said, C is the main course but C++ is more like a DLC that is optional to learn

#

Thanks for the insight, mate

past meteor May 7, 2024, 5:00 PM

#

desert oar i think in the case of snowflake people use it _in spite_ of its pricing

Aside from the pricing, is it that good?

desert oar May 7, 2024, 5:00 PM

#

past meteor Aside from the pricing, is it that good?

It does everything we need it to do and they keep adding more features that we find useful. So it's certainly good enough

#

Sometimes I find myself fighting the query planner, wishing for proper database indexes. That's my only practical complaint as an individual user (as opposed to an administrator)

sturdy kiln May 7, 2024, 5:07 PM

#

desert oar what is JDIA?

dow jones ind average, its just stock prices, just like NASDAQ

#

honestly i dont have a real-world objective in mind, im just trying to demonstrate utilizing univariate with different models

desert oar May 7, 2024, 5:17 PM

#

sturdy kiln dow jones ind average, its just stock prices, just like NASDAQ

oh, DJIA

#

you're asking about whether to use open, close, or something else?

#

I think normally people use closing prices but it probably doesn't matter much for a simple univariate analysis

sturdy kiln May 7, 2024, 5:24 PM

#

oh lol sorry i mispelt the acronym, but yeah from what ive seen alot of people use Close so i went with Close anyways

craggy agate May 7, 2024, 5:33 PM

#

I agree with @hollow escarp

hollow escarp May 7, 2024, 5:33 PM

#

past meteor I wouldn't immediately know what's up, it's hard to debug if I'm not running th...

How should i debug it to get that values ?

hollow escarp May 7, 2024, 5:33 PM

#

craggy agate I agree with <@687597629387177994>

?

craggy agate May 7, 2024, 5:33 PM

#

Lmao

sturdy kiln May 7, 2024, 5:39 PM

#

whats the best type of LSTM model? apparently theres 5, Vanilla, Stacked, Bidirection, CNN-LSTM, ConvLSTM

#

or is it all situational

craggy agate May 7, 2024, 5:41 PM

#

sturdy kiln whats the best type of LSTM model? apparently theres 5, Vanilla, Stacked, Bidire...

Situational

craggy agate May 7, 2024, 5:44 PM

#

sturdy kiln whats the best type of LSTM model? apparently theres 5, Vanilla, Stacked, Bidire...

That's like saying which one is the best, apple, banana or orange. when I want to keep the doctor away, it's apple, when I want something sour, it's orange and when I need a quick snack, it's banana.

sturdy kiln May 7, 2024, 5:46 PM

#

weird how vanilla LSTM performs way shittier than MLPs

This one in particular: https://pandas.pydata.org/pandas-docs/stable/user_guide/gotchas.html#using-if-truth-statements-with-pandas

This one in particular:
https://pandas.pydata.org/pandas-docs/stable/user_guide/gotchas.html#using-if-truth-statements-with-pandas