#Pulling only most recent entry per unique ID

17 messages · Page 1 of 1 (latest)

void gazelle
#

We're tracking research data for different facilities in our DB (just SQLite at the moment while we develop the site on 5 different machines) the data enterers will enter information about an individual they're studying and periodically update the information while keeping the old info for making graphs and analyzing data later. At the moment, we do this by keeping a time stamp of when the data was entered, but we're struggling to figure out a query that will select only the individuals that correspond to the desired facility (foreign key), and that have the most recent timestamp as a dateTime field.

Right now we are getting all of the individuals for a facility from their table, sorting by the time, and then pulling all id's from the list generated, the first time we see them. But I know this will get slow and messy as entries increase. So, I was wondering if you guys had an idea for how to do this.

For the sake of this question you can assume:
Facility: pk, name, location
Individual: pk, ID, entryTime, FacilityFK

Also if you think this is a poor way of storing the data lmk that too, we have been trying to find ways to do it, and so far this is the only one we think will work, although they'll definitely need to have an IT team after we graduate because they will probably want to warehouse away a season's worth of data once it is over.

still pendant
#

SELECT m.*
FROM Individual as m
INNER JOIN (
-- Find Latest Entry Dt by Perosn ID
SELECT ID,max(entryTime) as latest_entry_dt
FROM Individual
GROUP BY ID
) as t
on m.ID = t.ID and m.entryTime = t.latest_entry_dt
where m.FacilityFK = DesiredFacilityFK#

void gazelle
#

I guess to clarify I was mostly looking for a way to do it through the django query system, so that I know it will work reguardless of the DB platform, since we haven't chosen one yet. I was not aware you could run raw queries on django either, but that is why I thought I would ask, becuase I know some big long SQL query could do it, but I was hoping for a Individual.objects.filter(foo=bar) that would give the same results

lofty dove
void gazelle
lofty dove
#
{
  facility: {
    pk: 1,
    latest_updated_individual: 2
  },
  facility:
    pk: 2,
    latest_updated_individual: 4
  }

something like that?

void gazelle
# lofty dove you mean latest individual per facility in a single queryset?

No, so we'll search per facility, and I want the latest update for each individual with a FK to a given facility. So, for example:

Butterfly1: @5:20 in_facility:A
Butterfly2: @5:22 in_facility:A
Butterfly3: @5:13 in_facility:A
Butterfly1: @5:25 in_facility:A
Butterfly2: @5:00 in_facility:A

And then the query on facility A would return {Butterfly1: @5:25, Butterfly2: @5:22, Butterfly3: @5:13}

(Sorry for the delayed responses, I am still going to classes right now lol)

#

Then also like if there was anything in facility B, it just wouldn't be there either

lofty dove
# void gazelle No, so we'll search per facility, and I want the latest update for each individu...

I think you are overcomplicating stuff, from what you are telling you just want to sort the facility.invidivduals by last updated timestamp

queryset = Facility.objects.prefetch_related(
  Prefetch(
    "individuals", # related_name of related model 'individual_set' by default
    queryset=Individual.objects.order_by("-entryTime")
  )
)

facility = queryset.filter(facility_name="abc")
print(facility.values("individuals")) # or individual_set depends on related_name

you will see pks this way, if you would like to see nested data take a look at JSONObject or just iterate

void gazelle
void gazelle
#

I just realized a really easy thing I could do instead which is to add an active bool which flips the old instance to be not active and the new one gets that as true. That will also work well when larvae pupate and need to be disabled because they're no longer a larvae.

#

Then I can get all in a facility that are active

lofty dove
#

I think you want some kind of record table

class Facility:
  ...

class Invidual:
  facility = FK(Facility)

class IndividualData:
  individual = FK(Product)
  ... fields of individual

You will need to customize the forms and stuff

You will create Individual instance once, then each time user wants to edit an Individual instance, you will want to create IndividualData it will be like history, so you never actually update but create a new record

#

if you are using Postgres, you could use ArrayField and push pre-update record to that array field to keep a history

void gazelle
#

Ah ok, this does make sense actually, I will look into this, and maybe into using postgres as our final DB, since we totally can (although the professor strongly encourages MariaDB)

#

Thanks for the help man, this will help a lot, since I am not a big DB guy, but I know the most for my team so I had to take on that role 😭