#Complex Data Storage Solution

1 messages · Page 1 of 1 (latest)

cedar pivot
#

I'm currently in the process of designing a complex YAML data storage system for my hugs plugin and would appreciate some thoughts and feedback on this system.

Some Prerequisite Information
- I'm aware that other database solutions would work far better and I plan on adding support for them at a later time. That being said, I want to create a system that works out of the box with yaml and can be converted back and forth to whatever storage solution the end user wants.
- I'm willing to write this from scratch.

Current Setup
Currently, I have a per-player data system. Each player has their own file with the format <UUIID>.yml and this is how the file is structured.

uuid: 03687ca0-30cf-4c3c-8c1d-3b2dd75c3ff0
name: Nothixal
settings:
  language: "en"

  huggable: true
  # There are two types of particles.
  # HEART & DAMAGE_INDICATOR

  # There will be several different animations.
  # The animations will only show up when extreme quality is enabled.
  particles: 
    type: "HEART"
    quality: "high"
    animation: "default"

  sounds: 
    menu: true
    commands: true

  indicators:
    chat: true
    actionbar: true
    bossbar: true
    titles: true
    toast: true

data:
  self_hugs: 2147483647

  normal_hugs:
    given: 2147483647
    received: 2147483647

  mass_hugs:
    given: 2147483647
    received: 2147483647

  last_hug:
    given:
      to: 03687ca0-30cf-4c3c-8c1d-3b2dd75c3ff0
      timestamp: 1642302107

    received:
      from: 03687ca0-30cf-4c3c-8c1d-3b2dd75c3ff0
      timestamp: 1642302107

  first_hug:
    given:
      to: 03687ca0-30cf-4c3c-8c1d-3b2dd75c3ff0
      timestamp: 1642302107

    received:
      from: 03687ca0-30cf-4c3c-8c1d-3b2dd75c3ff0
      timestamp: 1642302107

These files are only loaded into memory when the player is online.
By itself, the file is 1.1kB in size. (1.0kB without spacing)

#

Currently, all is fine. The system works as intended. Now I want to do more. I want to be able to generate all sorts of data.

  • Unique Hugs
  • Leaderboard data
  • First Hug Given & Received
  • Last Hug Given & Received

This is where my problems begin.
In order to generate leaderboard data, I need every file to be loaded into memory temporarily to get the data. Otherwise I would be working with incomplete data. The same thing needs to happen in order to calculate unique hugs per player.

My initial thought was to create a section in the player file to house a history of hugs.
Something like this at the bottom.

history:
  # Sender, Receiver, Hug Type, Timestamp
  - "<UUID>,<UUID>,<TYPE>,<DATE>"

The problem with this is when player's start giving massive amounts of hugs, this file will start growing in size pretty quickly. If a player was to give 1,000 hugs, the file size increases to 121kB. Multiply that by ~150 users and you now have ~125MB of user data total.

I know that file size is a balancing act with using file storage, but that amount of data seems ridiculous. Maybe there's a different way I can store the data?

Perhaps a single file where everything is written to every 5 minutes or so. (For redundancy) This way, all the data is consolidated into one area making it easier to access and every so often, the data from that file will be distributed to each individual player file. Not as a whole, but the necessary information like unique hugs.
E.G. The main file acts like a ledger with every hug ever given. It's written to disk every 5 minutes for redundancy and another task that runs every 5 minutes will analyze the data and write a shorter version of a hug "transaction" to the individual player files. To prevent massive file buildup, the file would recreate itself every day and do the same task as the old one. This could create a "hug transaction" history.

#

I'm thinking out loud at this point, but I'd appreciate some feedback or alternative ways to access and store this data.

weary radish
#

There isn’t a great way to do this with YAML. YAML just wasn’t designed for this type of complex system, there’s a reason databases exist and everything doesn’t just use flat files

cedar pivot
#

Yea, the only way to keep the file sizes down would to use smaller data types, but it would break the whole user-friendly thing I have going on.

One idea I had was once a certain amount of flat files have been created, maybe about 500, have a message that shows up to OPs encouraging them to switch to a proper database.

I did have one suggestion a while ago to sort of split the plugin up. The complex portion would only work when databases like mariadb or mongo are being used, but that seems like a pain to maintain in the long run. As I would have two "versions" of the plugin to manage.

I get why databases exist, it's just that I want to keep the experience the same regardless of storage type. It's not only easier for me, but end users.

hardy hatch
#

Why would you want to keep a user-friendly for only data storage? You want the users to edit the data?

weary radish
#

^

#

There’s no reason for simple data storage to be user friendly

weary radish
#

That’s one of the big issues

#

That’s why it’s so “user friendly” to edit in a text editor

cedar pivot
cedar pivot
cedar pivot
weary radish
weary radish
cedar pivot
#

I'm currently thinking of switching the default to either h2 or sqlite, maybe that could be used with your idea?

weary radish
#

There’s programs that let humans interact with data storage systems

#

For example, mysql has MySQL Workbench

#

MSSQL has a similar tool

#

Mongo has a command line client and graphical programs

#

Mysql has command line

#

Redis has command line and graphical

#

There’s lots of options

#

They just aren’t storing it in a human readable way, an extra step is needed to let humans read it

wanton atlas
#

IMO YAML is greate for such thing that can need to be change by server admin, and that's all

#

For Any other type of data JSON is much cleaner and easier approach

#

Or if you need to keep small file size, convert Object to byte Array and store it in text file

cedar pivot
#

I plan on adding more database options. The goal is to have all of these as an option.

Remote Options
MySQL, MariaDB, MongoDB,

Local Database Options
SQLite

Text File Options
YAML, JSON, TOML

It's just that YAML seems to be a bad choice for large scale servers. Except it's so insanely easy for end users to edit if they needed to.

#

Same can be said for TOML

wanton atlas
#

Yea, will you use some ORM library for database

cedar pivot
#

It's just that JSON is more complicated when it comes to it's syntax. Gotta be careful with it.

#

ORM Libraries?

wanton atlas
#

ORM its Kind of library for easier dealing with databae

#

Database

#

For example instead of making query by hand, there are some prepared java methods like Select(user.class). Where(user.id>10). Tolist()

cedar pivot
#

Ohhh, just looked it up. This looks way simpler and I might consider it, but the original idea was to just write things from scratch.

wanton atlas
#

(Fake code)

cedar pivot
#

Like, manual sql queries, json reads, etc.

wanton atlas
#

Hmm this is pretty hard topic

#

There will be a lot of code to do with Reflections

cedar pivot
#

Well, I know MySQL, so writing the queries isn't an issue there. I also know how to navigate Mongo. I haven't worked much with JSON or TOML, so those might be the harder ones for me to implement.

#

It's just the flat files that are the issue here.

wanton atlas
#

But you should not write any query in your code

cedar pivot
#

At least in file size.

wanton atlas
#

That's the point of working with many databases

cedar pivot
wanton atlas
#

Bc everytime you need to be carefull about sql injection

cedar pivot
#

Yea, but that would be pretty difficult to do with my setup. The queries are defined within methods. The only parameters that could change would be UUIDs for lookups and validating those are easy.

wanton atlas
#

Maybe instead of using sql try mongodb

cedar pivot
#

Everything is commented out so I can test the plugin, but it'll be fixed eventually.,

wanton atlas
#

It work like a big hashmap

cedar pivot
#

I plan on adding Mongo support as well.

#

It's just that end users will have options. So if someone needs MySQL for their server, the plugin can provide. Same with Mongo. If they can't use remote databases, they have fallback options such as SQLite or flat files.

wanton atlas
#

What design patterns you want to use?

#

Bc there will be a lot of abstraction to handle that many data store points

cedar pivot
#

Oh, I have that figured out already. I have an interface called PlayerDataManager. Each database solution implements it and I can add certain variations of it if needed.

If you look at how I currently handle YAML files, you'll notice that I have a YAMLPlayerData class inside of it which has methods for the individual files.
https://gitlab.com/Nothixal/hugs/-/blob/master/core/src/main/java/me/nothixal/hugs/managers/data/types/yaml/YAMLPlayerDataManager.java

I'll likely have a similar setup for the other database types.

#

In fact, I have that same thing in the MySQL class I linked earlier.

wanton atlas
#

You think one interface would be good enought?

cedar pivot
#

I believe so. I have been trying to figure out alternatives, but one massive one just works. 😛

#

Cause the interface only contains relating to setting player data.
Things like connections I handle in another class and just access that class within the class that uses the interface.

wanton atlas
#

In commercial project each method thst use sql query goes through few "filters"

#

For example exception filter

#

Permision filter

#

Validation filter

#

And when one of those filter returns False method is cancelled

#

And the Repository Pattern is highly use

cedar pivot
#

Oh, well I bet I can make that work with my interface. As of right now, the queries are in another class. So I can do validation checks in that class and if the result returns true, pass it to the interface. Otherwise return an error.

wanton atlas
#

Ok, the thing i said about filters its called

#

Chain of responsiblity desing pattern

#

It Might be help, i use it while plug-in enable for initializing stuff

cedar pivot
#

Haven't heard of those patterns before, but I'll look into them.

wanton atlas
#

Have you been working with Spring?

cedar pivot
#

Not before, no. I usually just program as a hobby. What with work being 10 hour shifts and all. I'm only now just getting back into updating this plugin.

wanton atlas
#

Oh but you wanna be programmer?

#

Or Rather working in less boring job?

cedar pivot
#

My current job isn't bad, but I'd rather program things. Or at least have the knowledge to.

#

I love making things, but man are they complicated sometimes.

wanton atlas
#

Well you want to make one of the most complicated stuff when it comes to Programming :p

cedar pivot
#

Well I like to make quality projects that will last. Especially if other people are going to end up using them.

wanton atlas
#

MutliTasking and database 2 mose complicated topic

cedar pivot
#

It just so happens that they need to be really complicated. xD

wanton atlas
#

Not to implement but to make it Works

cedar pivot
#

Well do you have any links to some good guides?

wanton atlas
#

Em do you know Reflections?

cedar pivot
#

A little. I lean more towards abstraction based projects rather than reflection, but I do use it from time to time.

wanton atlas
#

And will you need to get whole Objects from database

#

Or only some fields?

cedar pivot
#

Possibly both. Most of the time it'll just be some fields. The only time I would need the whole object is when I'm doing things for leaderboards or finding unique data amongst the entirety of the database.

wanton atlas
#

Ok

#

And do you know Builder desing Pattern?

cedar pivot
#

Yes

wanton atlas
#

so when it comes to query i would suggest you to do builder

#

for building query

#
.select()
.where("player.id = $1",12)
.andWhere("player.name = $1","mike")
.toList();```
#

this is how it might looks

#

so this give you big flexibility and you will avoid doing query by string

cedar pivot
#

Interesting. Wouldn't I need the ORM for this though? Because if I'm not writing the queries, then it would be the job of the ORM correct?

wanton atlas
#

i mean the code i send is ORM

#

ORM fancy name for executing query with writing it by hand

#

so you propalby do little ORM for your purpose

cedar pivot
#

Ah, I see.

wanton atlas
#

and the would be example what this orm would do under the hood

#
        var query = "SELECT * FROM UserData WHERE id = 12 AND name = 'mike';
        var sqlResult= SQL.exectuteQuery(query);
        var result = new Array<UserData>()
        for(data:sqlResult)
        {
           var object = UserData.newInstance();
            object.getField("id").set(object,data.get("id"); 
 object.getField("name ").set(object,data.get("name ");  
result.add(object);
        }
return result```
#

so it would be hard to implement but then you can use it for ANY class you want

#

and this is huge benefit

cedar pivot
#

This sounds like a pretty good solution, but I'll have to look more into this stuff after work tomorrow. It's new territory. Appreciate the suggestion though.

wanton atlas
#

ye, and when it comes to JSON use GSON library its very easy

#

thats all code you need to generate json file

weary radish
#

And not just string replacement like in the code you sent

cedar pivot
weary radish
#

You just hard code the queries with ? in them for the things you’ll be replacing

cedar pivot
#

Yea, that’s how I normally create them. Just gotta work on getting hikaricp into the mix and it’ll be smooth sailing from there.

weary radish
#

Use maven