#[tech]: help building a .txt parser
305 messages · Page 1 of 1 (latest)
.
doable 👍
python is best for this usecase
Perfect! Pls drop some ideas if you got some time :))
No worries, ty!!
relevant keywords such as 'failure' etc
how many such keywords?
i think this could be as simple as a 100-200 line python script
2-3?
At Max 10 tbh to have a cleaner result
on a server right
On shared network
cron job -> trigger the script to process the new log -> this gets triggered when there is a new log file
The error log is inside a folder and that folder has a lot of other test related files too
if you're going the script way, the err logs have to be in the same folder as script
Ahh
if they share a common file naming, then it'd be easy
Then I need to manually pick them form the folders no
They do
so
a shell script will be always up and running to check if a new err log (with a predefind regex pattern) file name appears in the shared network folder
It doesn't even have to be real time. But if it is, that is added benefit
or you can trigger this shell script on defined times like once a day with a cron job
i get it
so
- the error log will be stored on either a remote server/shared network folder
- a parsing microservice that will be checking for new logs -> all the unique errors will go to a database
- these will be fetch to an error monitoring dashboard like sentry etc
the parsing microservice will be fetching logs from the shared network periodically, a cron job will be very helpful
the logs will be stored with severity levels in the DB along with timestamps
the monitoring service (either a third-party software or an in-house solution) will be querying the DB to fetch data
I'm sorry what's a cron job?😢
are you familiar with linux ?
Noo, I've been working on windows only
oh okay
I can't do this on Linux either 😔 has to be done on my work laptop
cron is basically a job scheduler in linux
Okayy
really
Yes i can't use my personal pc for demo
😭😭😭
this is really smol company behavior
Okay i should have mentioned that before
Unfortunately they ain't small. This team mismanaged as hell
yeah sure
you should never tbh
I dont use windows, dont know how to do this
No problem. Thanks for the ideas. I'll see if i find sth on git. Boson shared a link
you need a demo or want a full up service running in production?
Demo and then full up service
windows in work laptop right?
do you have admin perms in it?
you can use wsl (windows subsystem for linux) to get a linux like command line
Yes
Yes i believe
Ahhh alr
this fairly easy if you believe
😭
I've run stuff in powershell in admin mode
Achha okay ghar jake I'll see
No worries 😂I'll figure that out
ill help nw
since the err logs do share a common file naming convention, a regular expression will help in identifying the desired files in folder containing other file types
python has inbuilt funtions to open files for parsing (combined with the regexp)
now regexp can also be used to flag the keywords "failure" etc since the file is just plain text
sets has helpful to store unique itmes
this will be the payload to put in the DB
and yeah that it
Can I pick a parser from git? Will that be an issue?
Haan github
Haan wo BT nai hai lead se puch liya maine
He said bas implement kro
ha toh kardo
Ohh yesss
but waise you will have to make changes in it tho
as you have to put flagged logs in DB as well
coding part is easy lol, the system design is debatable
😂
Lemme get the code first fir dekhti hu aage ka
Bt ty so much for your help ya

please discuss the above system design with any other coworker or lead smth
youll be appreciated

throw the mumble jumble as well
🥹🥹
realtime data pipeline 🔥
Amen. Tyyy so much honn
If it flies i might participate with this idea in the hackathon
containerisation and shit 🔥
Omg ur more corporate than me xD
throw up some kubernetes and ansible 🔥
Don't include me in nerd shit, I logged off for today
@young pond wow you nerd squad too
Ayo bullying me?
Hi hi yes needed some help 
I think a real time data pipeline to the parsing microservice will be great, as the logging seems like high throughput
no lol
I'm still in a meeting brain is going zzz
Yeah it's crazy. The logging
Laugh at this loser lmao (I have a meeting at 11)
Bro these US guys
damn rainforest rizz
AHM hai 

I can't join a meeting and not pay attention 
it's not fir employees lol
Sigh
I need to know if I wanna stay here next year or not 😼
ooof
And market stats tell us bonus ka kya scene hoga
please internship refferal
Are you new hire or a senior engg?
Senior engg
Ah then you'd need to lol
Ohhh
Hbu
Okay company designation might be senior engg but you mid level I assume
Not exactly a new hire but not a lead yet
Absolutely yes
I meant making design docs, lead level whatever
Architect ish
Next year tho 😈
Oh we do that . But that's cus I'm in sys arch role
New hires se bhi karwate but they don't own it completely
I'd appreciate a sde internship refferal
SDE ka yaha it's not too great tbh
If you're into embedded, best hai
still a big name in resume
I'm already getting discriminated due to degree
That's true
will prove as an unfair advantage
Placements ka scene kaisa hai?
Oh understand. With BCA it's tough too cos of the bias. I'd say keep doing dapper projects
Efforts never go unrecognised 💯💯
I'll put in a referral if we do an off campus internship hiring next year
Which city if you don't mind?
I'll be working on a real-time collaborative code editor with sandboxed environment s
Keep in touch with ppl in blr
Startup internships you can Target and convert in full-time
ya ya trying
Get exp for 2 years, tab tak uni ka tag ka value nai eehta
PERFECT
as big corps generally don't hire non btech folks
Yea I was gona say. I think btech is the min qual for tech roles :((

sad
Arra you work on embedded stuff that's awesome
I was looking for embedded roles during my switch, gave into money tho lol
At least I am working on C++, not embedded stuff tho but still. Loving it lol
Yaya
Haha, that's cool. Tech pay is deffo better.. this is decent too tho
I'm sure there are cronjob libraries for python. You could also just use a watchdog instead.
I also don't get why they're storing the log first and then parsing it. Corpo stuff ig :>
@young pond if this is not a pet project and something that's gonna be used in production time and time again, you can use ELK stack
Elasticsearch will search and index your logs
Logstash will push the logs to the elasticsearch
and Kibana is used for visualization and queries
this
dunno, do you work in corporate btw?
no
oh okay
same
gaara's approach reflects a seasoned engineer who has worked with production environments
Yeah I have worked to set up these things before
hey if you dont mind, what is your yoe?
5 and half
damn, senior 🫡
Everything is duct taped in the end
Very few products are mature
do you have experience with microservices?
Yeah
I have worked as a Data Scientist, Software Dev and SRE
😭
quite wide expertise
hmm
yeah i can sense it
hey doog are you a student?
yay
sent one to you too 🥂
you're also my friend now
It might be used in production but for now we're just looking for POC i guess. How does this work gaara 😭
Look it up, elasticsearch is basically a NOSQL database and you can create a cluster so that it's always up even when one or two nodes go down. Elasticsearch is specially useful for text based lookups. Logstash is like a client it can be present on multiple hosts pushing logs after parsing to the ES cluster according to what you have configured and Kibana can be used to make graphs like how many errors in past 3 hours
But you know the problem best so it might not suit your situation
You have to decide the tradeoffs
Oww ok ok. I'll read up on this. Ty gaara 🥹
You don't need to read up, I can help you
To decide whether it's the right fit
I'll go through this thread later to understand the scope of problem
wow, you're hot
Oml i can help summarise, you'll go through everything? 😭 why are u so nice
Error monitoring is usually done in the big companies using the ELK stack but there’s a tool called fail2ban in linux, I run a private server and I use fail2ban to monitor “errors” in the log files by defining what is an error and what action to take. If your use case is small and restricted to one system then you could use fail2ban else you’ll probably need a cluster and ELK stack
You hottest 🥹
Sure I needed an excuse to talk to you bbg
You shall never need an excuse to talk to me bbg 
Thanks Ani. Sadly I can't use linux :((
That's reassuring bbg thanks for the love and support 
yum
Awwwh stahpp you 
Ayyy arey tu server me nai tha shayad, plus serene wanted a thread to discuss so 
Feel free to put in ur inputs 
could you send an excerpt from the logs. they should be structured ig
I sadly can't. It's work data
oops
why you giving offis work here
I needed help 
Ah okay.
any updates on this one?
Was de prioritised. Starting work again on it today. Need to write a rough flow/approach 
New challenge is that the logs could be on various networks
You can deploy agents on each network to monitor new logs and push them to a common location
configure something to process the incoming logs and extract the relevant info and push it to elasticsearch
and kibana to visualisation
gaara's approach
you might as well use encryption for log data while pushing to the common service
this setup seems highly scalable and fault tolerant
the microservice approach makes the setup easy and highly customisable to specific business needs and it'll struggle in scaling
while the ELK stack is highly scalable ans tolerant
regex
