Complex Data Storage Solution | SpigotMC | Page 1

cedar pivot Jan 16, 2022, 4:35 AM

#

I'm currently in the process of designing a complex YAML data storage system for my hugs plugin and would appreciate some thoughts and feedback on this system.

Some Prerequisite Information
- I'm aware that other database solutions would work far better and I plan on adding support for them at a later time. That being said, I want to create a system that works out of the box with yaml and can be converted back and forth to whatever storage solution the end user wants.
- I'm willing to write this from scratch.

Current Setup
Currently, I have a per-player data system. Each player has their own file with the format <UUIID>.yml and this is how the file is structured.

uuid: 03687ca0-30cf-4c3c-8c1d-3b2dd75c3ff0
name: Nothixal
settings:
  language: "en"

  huggable: true
  # There are two types of particles.
  # HEART & DAMAGE_INDICATOR

  # There will be several different animations.
  # The animations will only show up when extreme quality is enabled.
  particles: 
    type: "HEART"
    quality: "high"
    animation: "default"

  sounds: 
    menu: true
    commands: true

  indicators:
    chat: true
    actionbar: true
    bossbar: true
    titles: true
    toast: true

data:
  self_hugs: 2147483647

  normal_hugs:
    given: 2147483647
    received: 2147483647

  mass_hugs:
    given: 2147483647
    received: 2147483647

  last_hug:
    given:
      to: 03687ca0-30cf-4c3c-8c1d-3b2dd75c3ff0
      timestamp: 1642302107

    received:
      from: 03687ca0-30cf-4c3c-8c1d-3b2dd75c3ff0
      timestamp: 1642302107

  first_hug:
    given:
      to: 03687ca0-30cf-4c3c-8c1d-3b2dd75c3ff0
      timestamp: 1642302107

    received:
      from: 03687ca0-30cf-4c3c-8c1d-3b2dd75c3ff0
      timestamp: 1642302107

These files are only loaded into memory when the player is online.
By itself, the file is 1.1kB in size. (1.0kB without spacing)

#

Currently, all is fine. The system works as intended. Now I want to do more. I want to be able to generate all sorts of data.

Unique Hugs
Leaderboard data
First Hug Given & Received
Last Hug Given & Received

This is where my problems begin.
In order to generate leaderboard data, I need every file to be loaded into memory temporarily to get the data. Otherwise I would be working with incomplete data. The same thing needs to happen in order to calculate unique hugs per player.

My initial thought was to create a section in the player file to house a history of hugs.
Something like this at the bottom.

history:
  # Sender, Receiver, Hug Type, Timestamp
  - "<UUID>,<UUID>,<TYPE>,<DATE>"

The problem with this is when player's start giving massive amounts of hugs, this file will start growing in size pretty quickly. If a player was to give 1,000 hugs, the file size increases to 121kB. Multiply that by ~150 users and you now have ~125MB of user data total.

I know that file size is a balancing act with using file storage, but that amount of data seems ridiculous. Maybe there's a different way I can store the data?

Perhaps a single file where everything is written to every 5 minutes or so. (For redundancy) This way, all the data is consolidated into one area making it easier to access and every so often, the data from that file will be distributed to each individual player file. Not as a whole, but the necessary information like unique hugs.
E.G. The main file acts like a ledger with every hug ever given. It's written to disk every 5 minutes for redundancy and another task that runs every 5 minutes will analyze the data and write a shorter version of a hug "transaction" to the individual player files. To prevent massive file buildup, the file would recreate itself every day and do the same task as the old one. This could create a "hug transaction" history.

#

I'm thinking out loud at this point, but I'd appreciate some feedback or alternative ways to access and store this data.

weary radish Jan 16, 2022, 4:48 AM

#

There isn’t a great way to do this with YAML. YAML just wasn’t designed for this type of complex system, there’s a reason databases exist and everything doesn’t just use flat files

cedar pivot Jan 16, 2022, 4:59 AM

#

Yea, the only way to keep the file sizes down would to use smaller data types, but it would break the whole user-friendly thing I have going on.

One idea I had was once a certain amount of flat files have been created, maybe about 500, have a message that shows up to OPs encouraging them to switch to a proper database.

I did have one suggestion a while ago to sort of split the plugin up. The complex portion would only work when databases like mariadb or mongo are being used, but that seems like a pain to maintain in the long run. As I would have two "versions" of the plugin to manage.

I get why databases exist, it's just that I want to keep the experience the same regardless of storage type. It's not only easier for me, but end users.

hardy hatch Jan 16, 2022, 6:19 AM

#

Why would you want to keep a user-friendly for only data storage? You want the users to edit the data?

weary radish Jan 16, 2022, 7:17 AM

#

^

#

There’s no reason for simple data storage to be user friendly

weary radish Jan 16, 2022, 7:18 AM

#

cedar pivot Yea, the only way to keep the file sizes down would to use smaller data types, b...

Smaller data types would not help you with YAML, it all gets stored as text in the file

#

That’s one of the big issues

#

That’s why it’s so “user friendly” to edit in a text editor

cedar pivot Jan 16, 2022, 7:58 PM

#

hardy hatch Why would you want to keep a user-friendly for only data storage? You want the u...

I want them to be able to edit the data if necessary. Right now, there is absolutely no reason to as everything is handled by the plugin. However, should the need arise, users won’t be confused as hell or have to remote in to a server just to change a value or replace a file.

cedar pivot Jan 16, 2022, 7:59 PM

#

weary radish There’s no reason for simple data storage to be user friendly

That is simply not true. Yes most of the time it’s not necessary, but it’s far easier to read and understand what you are doing should you ever need to edit the data.

cedar pivot Jan 16, 2022, 8:01 PM

#

weary radish Smaller data types would not help you with YAML, it all gets stored as text in t...

Would it be possible to use a data format that doesn’t save it as strings?

weary radish Jan 16, 2022, 8:51 PM

#

cedar pivot Would it be possible to use a data format that doesn’t save it as strings?

Not while keeping it user friendly

weary radish Jan 16, 2022, 8:52 PM

#

cedar pivot That is simply not true. Yes most of the time it’s not necessary, but it’s far e...

Mass data storage is not intended to be easily human read because it’s inefficient to store things that way. You can introduce a program that allows your mass data to be read and translated by the computer for a human to read and accomplish exactly the same result for the human, while removing the inefficiency involved in storing everything as text

cedar pivot Jan 16, 2022, 10:26 PM

#

weary radish Mass data storage is not intended to be easily human read because it’s inefficie...

Now that sounds interesting. Could you elaborate a little more?

#

I'm currently thinking of switching the default to either h2 or sqlite, maybe that could be used with your idea?

weary radish Jan 16, 2022, 10:33 PM

#

There’s programs that let humans interact with data storage systems

#

For example, mysql has MySQL Workbench

#

MSSQL has a similar tool

#

Mongo has a command line client and graphical programs

#

Mysql has command line

#

Redis has command line and graphical

#

There’s lots of options

#

They just aren’t storing it in a human readable way, an extra step is needed to let humans read it

wanton atlas Jan 17, 2022, 4:14 AM

#

IMO YAML is greate for such thing that can need to be change by server admin, and that's all

#

For Any other type of data JSON is much cleaner and easier approach

#

Or if you need to keep small file size, convert Object to byte Array and store it in text file

cedar pivot Jan 17, 2022, 4:19 AM

#

I plan on adding more database options. The goal is to have all of these as an option.

Remote Options
MySQL, MariaDB, MongoDB,

Local Database Options
SQLite

Text File Options
YAML, JSON, TOML

It's just that YAML seems to be a bad choice for large scale servers. Except it's so insanely easy for end users to edit if they needed to.

#

Same can be said for TOML

wanton atlas Jan 17, 2022, 4:20 AM

#

Yea, will you use some ORM library for database

cedar pivot Jan 17, 2022, 4:20 AM

#

It's just that JSON is more complicated when it comes to it's syntax. Gotta be careful with it.

#

ORM Libraries?

wanton atlas Jan 17, 2022, 4:22 AM

#

ORM its Kind of library for easier dealing with databae

#

Database

#

For example instead of making query by hand, there are some prepared java methods like Select(user.class). Where(user.id>10). Tolist()

cedar pivot Jan 17, 2022, 4:23 AM

#

Ohhh, just looked it up. This looks way simpler and I might consider it, but the original idea was to just write things from scratch.

wanton atlas Jan 17, 2022, 4:23 AM

#

(Fake code)

cedar pivot Jan 17, 2022, 4:23 AM

#

Like, manual sql queries, json reads, etc.

wanton atlas Jan 17, 2022, 4:24 AM

#

Hmm this is pretty hard topic

#

There will be a lot of code to do with Reflections

cedar pivot Jan 17, 2022, 4:25 AM

#

Well, I know MySQL, so writing the queries isn't an issue there. I also know how to navigate Mongo. I haven't worked much with JSON or TOML, so those might be the harder ones for me to implement.

#

It's just the flat files that are the issue here.

wanton atlas Jan 17, 2022, 4:25 AM

#

But you should not write any query in your code

cedar pivot Jan 17, 2022, 4:25 AM

#

At least in file size.

wanton atlas Jan 17, 2022, 4:26 AM

#

That's the point of working with many databases

cedar pivot Jan 17, 2022, 4:26 AM

#

wanton atlas But you should not write any query in your code

Why not? I know how to sanitize inputs and I don't have a public api, so everything would be internal anyways.

wanton atlas Jan 17, 2022, 4:27 AM

#

Bc everytime you need to be carefull about sql injection

cedar pivot Jan 17, 2022, 4:28 AM

#

Yea, but that would be pretty difficult to do with my setup. The queries are defined within methods. The only parameters that could change would be UUIDs for lookups and validating those are easy.

#

If you want to look at the class I'm preparing, it's here: https://gitlab.com/Nothixal/hugs/-/blob/master/core/src/main/java/me/nothixal/hugs/managers/data/types/mysql/MySQLPlayerDataManager.java

GitLab

core/src/main/java/me/nothixal/hugs/managers/data/types/mysql/MySQL...

GitLab.com

wanton atlas Jan 17, 2022, 4:30 AM

#

Maybe instead of using sql try mongodb

cedar pivot Jan 17, 2022, 4:30 AM

#

Everything is commented out so I can test the plugin, but it'll be fixed eventually.,

wanton atlas Jan 17, 2022, 4:30 AM

#

It work like a big hashmap

cedar pivot Jan 17, 2022, 4:30 AM

#

I plan on adding Mongo support as well.

#

It's just that end users will have options. So if someone needs MySQL for their server, the plugin can provide. Same with Mongo. If they can't use remote databases, they have fallback options such as SQLite or flat files.

wanton atlas Jan 17, 2022, 4:32 AM

#

What design patterns you want to use?

#

Bc there will be a lot of abstraction to handle that many data store points

cedar pivot Jan 17, 2022, 4:35 AM

#

Oh, I have that figured out already. I have an interface called PlayerDataManager. Each database solution implements it and I can add certain variations of it if needed.

If you look at how I currently handle YAML files, you'll notice that I have a YAMLPlayerData class inside of it which has methods for the individual files.
https://gitlab.com/Nothixal/hugs/-/blob/master/core/src/main/java/me/nothixal/hugs/managers/data/types/yaml/YAMLPlayerDataManager.java

I'll likely have a similar setup for the other database types.

#

In fact, I have that same thing in the MySQL class I linked earlier.

wanton atlas Jan 17, 2022, 4:37 AM

#

You think one interface would be good enought?

cedar pivot Jan 17, 2022, 4:38 AM

#

I believe so. I have been trying to figure out alternatives, but one massive one just works. 😛

#

Cause the interface only contains relating to setting player data.
Things like connections I handle in another class and just access that class within the class that uses the interface.

wanton atlas Jan 17, 2022, 4:39 AM

#

In commercial project each method thst use sql query goes through few "filters"

#

For example exception filter

#

Permision filter

#

Validation filter

#

And when one of those filter returns False method is cancelled

#

And the Repository Pattern is highly use

cedar pivot Jan 17, 2022, 4:41 AM

#

Oh, well I bet I can make that work with my interface. As of right now, the queries are in another class. So I can do validation checks in that class and if the result returns true, pass it to the interface. Otherwise return an error.

wanton atlas Jan 17, 2022, 4:42 AM

#

Ok, the thing i said about filters its called

#

Chain of responsiblity desing pattern

#

It Might be help, i use it while plug-in enable for initializing stuff

cedar pivot Jan 17, 2022, 4:44 AM

#

Haven't heard of those patterns before, but I'll look into them.

wanton atlas Jan 17, 2022, 4:44 AM

#

Have you been working with Spring?

cedar pivot Jan 17, 2022, 4:45 AM

#

Not before, no. I usually just program as a hobby. What with work being 10 hour shifts and all. I'm only now just getting back into updating this plugin.

wanton atlas Jan 17, 2022, 4:45 AM

#

Oh but you wanna be programmer?

#

Or Rather working in less boring job?

cedar pivot Jan 17, 2022, 4:46 AM

#

My current job isn't bad, but I'd rather program things. Or at least have the knowledge to.

#

I love making things, but man are they complicated sometimes.

wanton atlas Jan 17, 2022, 4:48 AM

#

Well you want to make one of the most complicated stuff when it comes to Programming :p

cedar pivot Jan 17, 2022, 4:49 AM

#

Well I like to make quality projects that will last. Especially if other people are going to end up using them.

wanton atlas Jan 17, 2022, 4:49 AM

#

MutliTasking and database 2 mose complicated topic

cedar pivot Jan 17, 2022, 4:49 AM

#

It just so happens that they need to be really complicated. xD

wanton atlas Jan 17, 2022, 4:50 AM

#

Not to implement but to make it Works

cedar pivot Jan 17, 2022, 4:51 AM

#

Well do you have any links to some good guides?

wanton atlas Jan 17, 2022, 4:55 AM

#

Em do you know Reflections?

cedar pivot Jan 17, 2022, 4:56 AM

#

A little. I lean more towards abstraction based projects rather than reflection, but I do use it from time to time.

wanton atlas Jan 17, 2022, 4:57 AM

#

And will you need to get whole Objects from database

#

Or only some fields?

cedar pivot Jan 17, 2022, 4:58 AM

#

Possibly both. Most of the time it'll just be some fields. The only time I would need the whole object is when I'm doing things for leaderboards or finding unique data amongst the entirety of the database.

wanton atlas Jan 17, 2022, 4:59 AM

#

Ok

#

And do you know Builder desing Pattern?

cedar pivot Jan 17, 2022, 4:59 AM

#

Yes

wanton atlas Jan 17, 2022, 5:02 AM

#

so when it comes to query i would suggest you to do builder

#

for building query

#

.select()
.where("player.id = $1",12)
.andWhere("player.name = $1","mike")
.toList();```

#

this is how it might looks

#

so this give you big flexibility and you will avoid doing query by string

cedar pivot Jan 17, 2022, 5:05 AM

#

Interesting. Wouldn't I need the ORM for this though? Because if I'm not writing the queries, then it would be the job of the ORM correct?

wanton atlas Jan 17, 2022, 5:06 AM

#

i mean the code i send is ORM

#

ORM fancy name for executing query with writing it by hand

#

so you propalby do little ORM for your purpose

cedar pivot Jan 17, 2022, 5:07 AM

#

Ah, I see.

wanton atlas Jan 17, 2022, 5:09 AM

#

and the would be example what this orm would do under the hood

#

        var query = "SELECT * FROM UserData WHERE id = 12 AND name = 'mike';
        var sqlResult= SQL.exectuteQuery(query);
        var result = new Array<UserData>()
        for(data:sqlResult)
        {
           var object = UserData.newInstance();
            object.getField("id").set(object,data.get("id"); 
 object.getField("name ").set(object,data.get("name ");  
result.add(object);
        }
return result```

#

so it would be hard to implement but then you can use it for ANY class you want

#

and this is huge benefit

cedar pivot Jan 17, 2022, 5:15 AM

#

This sounds like a pretty good solution, but I'll have to look more into this stuff after work tomorrow. It's new territory. Appreciate the suggestion though.

wanton atlas Jan 17, 2022, 5:18 AM

#

ye, and when it comes to JSON use GSON library its very easy

#

#

thats all code you need to generate json file

weary radish Jan 17, 2022, 6:19 PM

#

wanton atlas Bc everytime you need to be carefull about sql injection

Not if you use prepared statements like you’re supposed to

#

And not just string replacement like in the code you sent

cedar pivot Jan 17, 2022, 6:24 PM

#

weary radish And not just string replacement like in the code you sent

So would hard coding the queries be a real issue then? At least with how I’m doing it?

weary radish Jan 17, 2022, 6:24 PM

#

cedar pivot So would hard coding the queries be a real issue then? At least with how I’m doi...

Not at all

#

You just hard code the queries with ? in them for the things you’ll be replacing

#

https://docs.oracle.com/javase/tutorial/jdbc/basics/prepared.html

Using Prepared Statements (The Java™ Tutorials > ...

This JDBC Java tutorial describes how to use JDBC API to create, insert into, update, and query tables. You will also learn how to use simple and prepared statements, stored procedures and perform transactions

cedar pivot Jan 17, 2022, 6:27 PM

#

Yea, that’s how I normally create them. Just gotta work on getting hikaricp into the mix and it’ll be smooth sailing from there.

weary radish Jan 17, 2022, 6:30 PM

#

Use maven

#Complex Data Storage Solution