#How to handle concurrency problems?
1 messages ยท Page 1 of 1 (latest)
Here is an AI assisted attempt to answer your question ๐ค. Maybe it helps! In any case, a human is on the way ๐. To continue talking to the AI, you can use </chatgpt:1108714622413963314>.
To handle concurrency problems in a Java application where different users need to access the database and retrieve unique rows, you can implement optimistic locking mechanisms. One common approach is using version numbers or timestamps to track changes to the database rows.
Here's how you can achieve this:
- Add a version number or timestamp column to your database table.
- When a user fetches a row, include the version number or timestamp in the query.
- When the user updates the row, increment the version number or update the timestamp.
- Before updating the row, check if the version number or timestamp has changed since it was fetched.
- If there is a mismatch, handle it as a concurrency issue by either rejecting the update or retrying the operation.
This approach ensures that each user gets a unique row by detecting conflicts and preventing multiple users from accessing and modifying the same row simultaneously.
By implementing optimistic locking strategies like version numbers or timestamps, you can effectively manage concurrency problems in your Java application and ensure data consistency for your users accessing the database.
Row can have the same timestamp as well. Please tag me, also locking mechanism I'm using redis lock but then I want to fetch different rows for each user.
I think I can use select for update and then update but then it might lock the rows in the database and if something goes wrong the db will be affected
And suppose the database has 1million uses and 32 users taking actions of the latest requests at the same time
Uhh it's postgres
good
Actually the rows represent user data
Where we need to like approve or reject whatever the users have entered
And then we have like approvers or rejecters who take actionables on the rows based on the latest rows
So that no two approvers or rejecters can take action on the same case at the same time i use it and throw an error saying it's already being done by someone else
What I want to achieve is I don't want two different approver or rejecters to get the same case at the same time
okay so you misunderstand what redis is for, but thats fine
what is your db schema for this?
and at what point is someone assigned?
wdym db schema? I don't understand
Just the normal stuff yea
what does that table look like
create_time data1 data1_status Data1_lut data2 data2_status data_2_lut
what in this wonderful green planet of ours do any of this mean data1 data1_status Data1_lut data2 data2_status data_2_lut
hmm well
Your title and body don't align.
One min can u come dms I'll explain what exactly it is
So you use some kind of locking mechanism, not sure how redis comes into play.
no dms, share it here
Ok
So I'm storing the queue and user id in redis as key and the approver as value
That's not what redis is for.
If the key already exists in redis and values don't match with the approver/rejector I don't allow anyone else to touch it
okay so redis, in that context, can be used for "distributed locks"
i.e. locks that work with multiple servers
Or you could just use Hibernate's @Version or something equivalent.
Yea it's for distributed locks
in this case you do not need a distributed lock
I'm using spring jdbc
You just want optimistic concurrency control.
I require a lock
you require a logical lock - in that you want only one user acting on a resource
I actually want 2 different people to get two different rows at the same time which have the same timestamp
you do not neccesarily require a database lock
But I don't want them to get the same rows
ok how do I do that
describe more of the exact use case
But my program is
yep
ok so it's like a kyc system of sorts
This works with jdbc?
we made one of those at work
no it won't
Well I want one for jdbc
wdym
that is correct
and a group of admins need to approve or deny each person
yes exactly
but each admin needs to have sole control of each review
now those group of admins are many
yep
yep
so instead of an implicit lock
and the problem is each admin gets assigned the same user at times
which I'm handling with a redis lock
i.e. your database's locking functionality
but Instead of giving them an error i want to show them a different row
or a redis lock (which is insane)
Well I'm scared of the database locking functionality so I don't touch those at all
make an explicit row in a table marking that an admin has "claimed" a review
But if they get selected ta the same time u get a deadlock
to prevent that im using this
right, but we've lowered the scope of the db's involvement
what?
if we only need to have a lock during a "claim" process
thats a much shorter scope of time
after a row has been "claimed" you can then enforce that claim with some backend logic like "cannot submit if unclaimed"
and you can manually manage who does and does not have access
I don't understand
imagine the deli counter
i understand your problem
okay so imagine the deli counter
yes?
correct
what we need to do is give them each their own number
you don't want two people coming up when one number is called
yes that's correct
but once you've given out a number you no longer need them to coordinate
you just need to check "is this a number I can serve right now"
uhm yea
okay
so instead of locking a row in the database, we give them a number
i.e. if you have
kyc_document
id | created_at | updated_at | data...
then all we need to do is atomically insert a
kyc_document_review
id | kyc_document_id | reviewer_id | assigned_at | completed_at
insert?
yeah, insert
no actually we are just updating it right now
no actually we should be inserting
and we don't want to insert anything
and we want to insert something
why insert though?
how long does it take to review one of these documents?
thats super slow
point is, the longer a lock is open the more chance for shenanigans
And we don't need the same case to be audited twice
and databases do not do well with locks that are open for a long time
yea
so instead of giving someone a row and only accepting updates from that one person
you explicitly assign them to a case
ok but the point is the system already works like that so how to fix it
and thats what inserting a row does
how?
you insert data saying "this reviewer is in charge of reviewing this case"
so long as you
yea I get ur point but I don't want a particular case to be stuck to a particular reviewer
you can have an unassigning process
but this whole conversation is about how you only want one person to do a thing
it's about how I want multiple people to do different cases
i'm saying that process will be much easier to manage if you do it explicitly as opposed to praying to distributed system gods
instead of multiple people getting the same case if there's only one user at that time
but I don't want it to be explicitly assigned to someone
I want first come first serve
thats when it will be assigned
like the deli counter
they come up, first come first serve, and get their ticket
i am saying that having a database row representing the ticket is all you need
yes, we need to figure out assigning a ticket
but thats a much easier task than "keep this exclusive for a minute"
if you want to have a process where if they ignore it it gets unassigned, now you can do that
I guess I'll stick to the redis locks then
you are using a distributed lock - famously one of the hardest things in computer science to get right - for a non-distributed system
I want when someone asks for a document to review that you insert a row signifying that they are the one to review that document
man i really need to get my fundamentals right ๐ญ im screwed I need to start everything from beginning, first job and making me forget everything I've learned
we use the presence of that row to avoid assigning the same document to a different person
and in this way lower the time that you need to maintain a database lock
But we are back to the original problem
What if two people do it at the same time
now we get into the FOR UPDATE SKIP LOCKED
I was gonna use update skip locked but the db is fricking huge
Ok waittt
U want me to do this on the new table where we insert our rows?
Sorry I'm unable to follow
SELECT job_id, queue, job_type, payload, attempts, enqueued_at, process_at FROM %s
WHERE
queue = ?
AND process_at <= ?
ORDER BY process_at ASC
LIMIT 1
FOR UPDATE SKIP LOCKED
here is some SQL from a library that handles background jobs
it makes sure that only one worker handles a job at a time
Actually I was thinking of using skip locked but idk if it will screw the db
Because using that on 1Mil records = idk
What I was thinking is me removing the redis lock use update skip locked and then use commit after updating
But what if the dataset grows huve
And the locks is what I'm afraid of
locking one row at a time is fine
so long as you do it for a short period of time
do you currently have 1 million rows?
I guess what's a mess shall be let that way then I'll only use it for what I'm creating now
For one part yes
For the one in creating it'll become that over time
What happens if that person does it and then exits mid way
well thats a question you need to answer with your redis locks too
what do you want to happen?
Honestly I have no idea of what's actually happening with it, I'm reusing existing code and i doubt they delete it ๐
Well ideally it should expire within 24h and be deleted
I'll have to check
okay so what you can do is
kyc_document_review
id | kyc_document_id | reviewer_id | assigned_at | unassigned_at
have a background job that looks for document reviewers that were assigned 24 hours ago but did not finish
and explicitly mark them as unassigned
Well there's no need of using a redis lock right on this new table
I think select for update should do the job
there''s no need of a redis lock in your entire system
But yea what happens if something goes wrong in between?
What happens with that select for update?
in the query i showed, if all the rows are locked it returns no rows
no no
That i know
But what if suppose I'm person A
I get that case and something goes wrong
Idk say the server shuts down or something
Is that low locked forever?
Or do the locks expire?
If im person b and I wanna do something will I be able to see that case?
thats why we only take out the lock for a very brief period
and use them to insert an explicit record that you can build functionality on top of
because these questions are insanely hard to answer
generally no, the db isn't locked forever - but they are tied to a single connection
we do sometimes use db locks when doing prod fire fighting
no I mean for the row
what happens to it
in the scheme i am suggesting
person A gets the case
this means that a row was inserted in a table that relates people to cases
the server shuts down
the server starts up
that row is still there and thus person B will not get the case
with database locks
person A gets the case, server restarts, the locks are lost, person B can claim it
thats why we put a column on the assignment record for unassignment
we can even do something like this
Wait for unassigned?
How u do that
kyc_document_review
id | kyc_document_id | reviewer_id | assigned_at | unassigned_at
so if unassigned_at is null, you are still assigned
if unassigned_at is not null, it is no longer a valid assignment
and you can have any number of processes fill that in
and you know what else you can do here? make a conditional index
UNIQUE (kyc_document_id) WHERE unassigned_at IS NULL
I want anyone to access any random case ๐ฆ
I don't want to assign a case to someone
this means that the database will automatically enforce that only one person is able to update a case at a time
yes you do, you just want it to be a short term assignment
and you want to unassign it from someone if they take too long and open it up for someone else to grab
What if that person is not there
wait ur saying I assign it to someone
correct?
well only assign cases to them when they ask for a new case
yes
no but it's not solving my problem
2 different people ask for a case at the same time
yes
how to give them different cases
and lets walk through what will happen
person 1 will get case A first
then person 2 will try to get case A - lets ignore skip locked as an option for now
so when person 2 tries to get case A, that means that this will be executed:
Sorry to interrupt but wait ur talking about the no insert part right? Just update the status directly
INSERT INTO kyc_document_review(kyc_document_id, reviewer_id)
VALUES ('A', '2')
no, i don't consider that a valid option
because it leads to insane issues
Oh ok please go ahead
so now, this sql
it will hit this uniqueness constraint
UNIQUE (kyc_document_id) WHERE unassigned_at IS NULL
and fail
I missed a bit, but one thing I was wondering about in the initial solution. Why do you always take a lock, you could have a read only connection as well.
and you can - worst case - just try again
Ahhh ok, so u want me to insert with a constraint
So there's no locks involved smart
It'll lead to extra table and storage but ok
it might end up being better to have a tiny lock at the start to not have to deal with running in to that constraint
This will work
but yeah - thats the point of the explicit ticket system
you can manage the assignment yourself
instead of having the assignment live implicitly in something batshit like a redis lock
I didnt understand the select for update part like how exactly do I unassign the case for approver A
under what conditions will you unassign a case?
Suppose person A leaves the job
Or is on leave
How do I allow someone else to use it without a service restart
UPDATE kyc_document_review
WHERE reviewer_id = (SELECT id from reviewer WHERE email = '[email protected]') AND unassigned_at IS NULL
SET unassigned_at = NOW()
Oh ok
UPDATE kyc_document_review
WHERE ((NOW() - assigned_at) > '10'::minutes) AND unassigned_at IS NULL
SET unassigned_at = NOW()
(I forget the exact postgres time query syntax)
but yeah, you just set unassigned to now
Will the locks disappear?
like I don't understand how to remove the locked rows
nvm the query syntax
the rows will cease being locked when the connection that makes that query goes down. Hence why its useful for some stuff but not others
Even if we update the unassigned at isn't it still locked?
there is no lock
the row itself is a logical lock
what
you update your code to check "is this person assigned to the case"
and that is all
the database lock no longer exists after the row is inserted
or at all, depending
u mean updated?
oh wait
if we don't put skip locked it'll update the locked row and the lock disappears?
so if person A requests to review a document it kinda doesn't matter which one so long as it hasn't been reviewed yet?
I mean it can be random
It can be partially reviewed as well
But suppose u have to review of a user and there 2 admins
I want to only allow one admin to fetch the case
Or suppose there's 2 rows and one user comes at 20:00 and other is at 19:00 we are fetching the latest user first, I don't want both the admins to get the same user who came at 20:00 I want one user to get the 20:00 used and other the 19:00 user
INSERT INTO kyc_document_review(kyc_document_id, reviewer_id)
VALUES (
SELECT kyc_document.id, ? as reviewer_id
FROM kyc_document
WHERE NOT EXISTS (
SELECT id
FROM kyc_document_review
WHERE unassigned_at IS NULL AND completed_at IS NULL
)
)
RETURNING id;
might need some tweaking but
this requires no locks
you just assign atomically by putting the assignment conditions in the insert query
Won't it still return an error?
nope
how could it, its only inserting a row if one doesn't already exist
Even if it's at the same time?
Even for a database with a million rows?
yes
It's better than a redis lock?
way better
Ok thanks you very much sir
but you need to design your system better
Where do I learn system design?
Man i need to start my cs fundaments from scratch
Same with java coding
๐คทโโ๏ธ
there are books and stuff
There's no constraint on this table right
but i learned what i know the hard way - by making systems that were so fucked up they caused immesurable pain for everyone
there should be
So if someone else wants to approve the same user they can right
I mean there'll be a PK for sure
you don't need it, but its a nice touch to put one
especially if you want to be sure no two people are assigned at the same time
but like someone else can approve the same doc if needed right
if you want to make the system work like that, sure
oh
if you have a single column that says "status" and it is like "approved"/"unapproved"/"rejected"
yes
i implore you to more to a status log
and use a trigger to update the root status for efficiency
well, like this
here's a smidge of code
basically you have a log of what the status is
and you can even attach who asserted that status

like in my work we have this pattern a bunch of places
one is with payables
you want to send $100 to someone, but we have an approval process that people will want to go through
so person A makes the payable
person B approves it as being valid
person C says they have the money, send it
Btw the documents consist of multiple stages and 3 queues
so instead of a status on payable, we have a table record the status in a log
Approval rejection and approved
yep very similar to our system
do not just have a status column
you will want to know who did what and when
a status column we've found is sometimes needed - the query for "latest status" can get slow with a big enough table
but you want the full log
well im maintaining everything in a single table
why the fuck are you maintaining everything in a single table
atleast that's how it was created a mess
because our manager doesn't seem to care and it's my first job and everything that I learned is going down the drain at this job 
the notion that a million people are giving their personal information to this system is actually scary
because nobody knows anything
ik that's what even i thought at first
and what are you thinking at second
Switching my job
I'm not learning anything new, everything is a mess. I'm thinking of starting from scratch
@potent bramble
Your question has been closed due to inactivity.
If it was not resolved yet, feel free to just post a message below
to reopen it, or create a new thread.
Note that usually the reason for nobody calling back is that your
question may have been not well asked and hence no one felt confident
enough answering.
When you reopen the thread, try to use your time to improve the quality
of the question by elaborating, providing details, context, all relevant code
snippets, any errors you are getting, concrete examples and perhaps also some
screenshots. Share your attempt, explain the expected results and compare
them to the current results.
Also try to make the information easily accessible by sharing code
or assignment descriptions directly on Discord, not behind a link or
PDF-file; provide some guidance for long code snippets and ensure
the code is well formatted and has syntax highlighting. Kindly read through
https://stackoverflow.com/help/how-to-ask for more.
With enough info, someone knows the answer for sure ๐