#V10 Document Data Schema
1 messages · Page 1 of 1 (latest)
Some Discussion Ground Rules
I know most of you have opinions about just about anything/everything, but I would like to limit this conversation to developers who have been actively using the existing DocumentData API:
- Having prior knowledge of what the current API does will make it easier for me to explain proposed changes.
- This will make it easier to collect feedback on the proposal that is grounded in the current functionality of the system.
- This will also make it easier for me to understand the extent to which this will be a widely-breaking change (or not) across the developer community.
If you join this discussion, please begin by linking to a file or files in your git repo where you have created custom DocumentData definitions so that I can see how you are currently using the API.
What change are we considering?
Currently in V8/V9 document data schema are defined using basic objects with some attributes like:
someField: {
type: String,
required: true,
default: ""
}
While this approach is nice in its simplicity, it has some limitations when it comes to supporting more sophisticated data structures like arrays of complex objects or embedded (inner) data structures - neither of which are currently supported without defining ancilary DocumentData objects. Furthermore, this approach requires a significant amount of logic in the DocumentData class to interpret how to parse/clean/validate input fields depending on their declared type.
The new approach that we are considering makes the fields of DocumentData more feature-rich and powerful as instances of a DataField class. These DataField instances are able to encapsulate parsing/cleaning/validation while allowing for more elegant recursive structures. For example, the field in my above example would now be declared as:
someField = new fields.StringField({required: true, default: ""})
A more practical example
Under V9, the data definition for ActiveEffectData looks like this:
class EffectDurationData extends DocumentData {
static defineSchema() {
return {
startTime: fields.field(fields.NUMERIC_FIELD, {default: null}),
seconds: fields.NONNEGATIVE_INTEGER_FIELD,
combat: fields.STRING_FIELD,
rounds: fields.NONNEGATIVE_INTEGER_FIELD,
turns: fields.NONNEGATIVE_INTEGER_FIELD,
startRound: fields.NONNEGATIVE_INTEGER_FIELD,
startTurn: fields.NONNEGATIVE_INTEGER_FIELD
}
}
}
class EffectChangeData extends DocumentData {
static defineSchema() {
return {
key: fields.BLANK_STRING,
value: fields.BLANK_STRING,
mode: fields.field(fields.NONNEGATIVE_NUMBER_FIELD, {default: CONST.ACTIVE_EFFECT_MODES.ADD}),
priority: fields.NUMERIC_FIELD
}
}
}
class ActiveEffectData extends DocumentData {
static defineSchema() {
return {
_id: fields.DOCUMENT_ID,
changes: {
type: [EffectChangeData],
required: true,
default: []
},
disabled: fields.BOOLEAN_FIELD,
duration: {
type: EffectDurationData,
required: true,
default: {}
},
icon: fields.IMAGE_FIELD,
label: fields.BLANK_STRING,
origin: fields.STRING_FIELD,
tint: fields.COLOR_FIELD,
transfer: fields.field(fields.BOOLEAN_FIELD, {default: true}),
flags: fields.OBJECT_FIELD
}
}
}
The classes EffectDurationData and EffectChangeData are defined as subclasses of DocumentData - but this is more of a workaround than an intentional design. These data objects are not useful on their own, and exist only to serve the needs of the parent ActiveEffectData class.
In the proposed V10 approach, ActiveEffectData can be declared with these inner objects in-line to declare nested data structures:
class ActiveEffectData extends DocumentData {
static defineSchema() {
return {
_id: new fields.DocumentIdField(),
changes: new fields.ArrayField(new fields.SchemaField({
key: new fields.StringField({required: true, label: "EFFECT.ChangeKey"}),
value: new fields.StringField({required: true, label: "EFFECT.ChangeValue"}),
mode: new fields.NumberField({integer: true, initial: CONST.ACTIVE_EFFECT_MODES.ADD, label: "EFFECT.ChangeMode"}),
priority: new fields.NumberField()
})),
disabled: new fields.BooleanField(),
duration: new fields.SchemaField({
startTime: new fields.NumberField({initial: null, label: "EFFECT.StartTime"}),
seconds: new fields.NumberField({integer: true, positive: true, label: "EFFECT.DurationSecs"}),
combat: new fields.ForeignDocumentField(documents.BaseCombat, {label: "EFFECT.Combat"}),
rounds: new fields.NumberField({integer: true, positive: true}),
turns: new fields.NumberField({integer: true, positive: true, label: "EFFECT.DurationTurns"}),
startRound: new fields.NumberField({integer: true, positive: true}),
startTurn: new fields.NumberField({integer: true, positive: true, label: "EFFECT.StartTurns"})
}),
icon: new fields.FilePathField({categories: ["IMAGE"], label: "EFFECT.Icon"}),
label: new fields.StringField({required: true, label: "EFFECT.Label"}),
origin: new fields.StringField({nullable: true, blank: false, initial: null, label: "EFFECT.Origin"}),
tint: new fields.ColorField({label: "EFFECT.IconTint"}),
transfer: new fields.BooleanField({initial: true, label: "EFFECT.Transfer"}),
flags: new fields.ObjectField()
}
}
}
Notice that the changes array of EffectChangeData and the duration object of EffectDurationData have been folded in to the ActiveEffectData schema. This example illustrates one level of depth with such recursive structures, but there is no depth limitation, for example duration could have its own inner object defined as a fields.SchemaField.
What are the anticipated benefits of this change?
-
Our expectation is that in V10 and beyond more package authors will use the
DocumentDataAPI to define custom data structures, having the advantage of more robust data cleaning and validation. -
the new approach envisioned in V10+ will make it really easy to generate standard Foundry-style form fields for data objects, making it easy to generate forms to configure and customize the data attributes of your custom data structure.
-
Additionally, a goal of V10 is to enable game system authors to easily define document type-specific data objects. It is common for game system data to contain arrays of objects or inner objects (for example things like
data.attributes.strength). This new syntax for declaring data schema will make it much easier to define such complex schema.
Thoughts, feedback, questions, concerns?
Is a SchemaField a shorthand for defining the inner DocumentData objects, or do they remain plain JS objects?
If you implement this is it likely to be optional or will you also tighten the schema for actors etc. So that we can't just define an empty object in the schema and populate it elsewhere? (Please no!)
Two questions really:
- How easy would the migration be and how easy would it be to hook up into things that already exist? Like if I use DocumentData for custom themes, how easy will I be able to replace how I am already handling custom data with, say, system settings?
- Tangetically related, Would this data structure allow additional add-ons specifically for rendering the data, i.e. a hook for a rendering plugin to replace handlebars?
A SchemaField is used to define an "inline" inner data schema, as shown above. You can still embed an inner DocumentData object also, although that uses a DocumentDataField.
IMO typescript already does schema for JS objects pretty good. I'm certain there also other standardized JS/JSON schema systems. In a lot of ways I think you're trying to reinvent the wheel here.
We do have support for a generic object which can contain arbitrary keys/values as an ObjectField. This is used for things like flags or for general uses where the author does not want to define an explicit inner object schema.
This is, in part, why I asked you to link me to your existing use of DocumentData in your code so I can look at it and get an idea of how easy/complex the migration process would be for you.
Would this data structure allow additional add-ons specifically for rendering the data, i.e. a hook for a rendering plugin to replace handlebars?
That is a stretch goal.
As a TS user, there are some failings in TS that this does cover. TS only cares about types, not about what is "valid". For example, you can't say floating numbers between 0 and 5.
I have reviewed a few existing options, and did not love any of them. If you have a favorite framework here that you think I should review before deciding to make a custom solution please feel free to link me.
Yeah, this is my big worry. It looks to me like there's going to have to be a massive rewrite of existing systems.
Are you using custom DocumentData in your system? If so please link me to it!
We've only merged in pretty minimal subclassing for DocumentData:
export class ArmorData extends BasePhysicalItemData {
static DEFAULT_ICON = "systems/pf2e/icons/default-icons/armor.svg";
}
I've played around with using DocumentData for system data objects, but client-side schema validation complicated performing migrations
TS does not provide runtime validation however, which the schema does.
Wow, I read that wrong. So unless you're extending the DocumentData classes in any way, this will basically be a non-issue?
(Because I don't do that, I don't see myself needing to do that either. Please disregard my question, if that's the case)
Oh yeah I have a feedback/thought and a big ask: Is there a chance the DB storage of this data can be updated when reading data, especially for Arrays? For instance right now if I use the sting format for saving data in an array it converts the array into an object: setProperty('data.someArray.0.someproperty') converts {someArray: [{someProperty: 'foo'}]} to {someArray:{0:{someProperty: 'foo'}}}
Not sure if that is something that can be adjusted in this DocumentData
If you do not subclass DocumentData you would be unaffected by this proposal.
I think that is a separate problem, as the functions that do that are technically seperate.
I personally do not recommend arrays in data, but that just my advice.
No guarantees, but it may be possible as a side-effect of this change to provide better APIs for updating a specific indexed element within an ArrayField
@surreal harbor Does it need to schema only JSON data or JS (i.e. function and class types)?
it's not currently supported to have a Function as a data type in DocumentData, so I don't think that would be changing
@surreal harbor Have you looked at AJV?
That #_initializeData alters the contents of the _source property seems like it would make it hazardous to define a schema for system data when it comes time to perform a migration
A way for a system's migration framework to get its hands on source data prior to it getting run through schema validation would seem necessary
I had not, looking at it now. It looks pretty good for basic schema, but I'm looking at their syntax for things like inner objects or arrays of objects and it looks pretty clunky to me. Don't want to judge too quickly so I'm going to do a bit more reading
okay, nevermind - this looks a bit more straightforward than i first thought, there's some good stuff here
@surreal harbor Ok, well if you do decide to roll your own. My criticism of what you've shown us is the schema is defined with code. I think a good schema should be defined with data (i.e. JSON, XML or YAML). Since it makes it more portable, testable, and consistent. If you define a scheme with code people will create things like loops to generate the scheme, which makes it difficult for people to see what they need to match.
Fields support some configuration options like custom validation or cleaning functions which would not be possible to define in JSON/XML/YAML/etc..
defining schema in code also allows for fields to reference related objects by reference rather than simply by name
But it's not really portable between say a system and a module implemented for a system (or multiple systems)
Uh, yea, it's not portable. The schema for system owned documents belongs to the system.
Do you have an example in mind of where/how this would be a problem?
Well the goal of any schema is validation between different segments of code. A contract between how data is communicated. If a module wanted to validate it's data before passing it to the system or another module
This is for validating documents and updates to said documents, not about validating data one is passing to other modules.
You can get the schema from the document and validate it at anytime if that was just the goal to not pass invalid document data.
I see what Kage means, the scope for this is intended to be more broad than just Documents, ideally this approach would be useful for all sorts of cases where a package author wants to define a data model
but I don't see where the problem comes in, if a module or system defines a data schema it can expose/export that schema such that other code can also use it
I mean, in a more abstract case. People looking to integrate foundry databases outside of foundry. Compendium importing/exporting, third party services, etc.
I can give you one case, Moulinette allows Patron creators to create assets (which could eventually include documents). Would be useful for this service to validate system specific documents outside of foundry completely
I'm just recommending allowing portability for future cases outside of your immediate use case.
I see, there are lots of considerations for what makes the best solution, but I think supporting non-Foundry applications is a non-priority for my decision making here.
Sure, just giving some push back to help you see all the possible angles.
Appreciate that - for a bit more context - do you currently subclass DocumentData in one of your packages?
Not in a published package. But I am implementing a homebrew system in foundry.
Which of course, does have to extend DocumentData
In case there's any misunderstanding - game systems do not have to do that - in fact almost none currently do
which is partly why I'm trying to understand in this thread how many devs would be affected by changes here
Oh, well maybe I'm doing my system wrong lol
Not really.
I want to say at least a handful do?
The DocumentData stuff did not exist when I created my system. A lot of others are in that boat.
Some do, those are the people Atropos is looking for. But it isn't required for most systems AFAIK
sure
I saw the ground rules message too
pf2e does but makes minimal use of it
it's nice for instanceof checking at the very least
- setting per-actor/item type default icons
using it for system data remains scary
Hopefully that’s something that could be surmounted in V10, depending on what is currently spooky
mainly the migration problem
I would agree that the current system is not yet ideal for system data
to the extent that we tighten a system-data schema, pre-migration data would be lost
Migration meaning the need to migrate old incompatible data values to new compatible ones?
right
Presumably that migration could be done as part of initializing the DocumentData source?
if pre-migration system data gets run through the schema shredder first, old properties would get dropped, etc.
That sounds like it could work, yeah
Though our migrations are async
hairy
Could I make an argument that schema (data structure), validation (data integrity), and cleanup (data upgrade, alterations and corrections) are three separate concerns and don't necessarily need to share a unified approach?
@somber ether probably worth standardizing a workflow where migrations can/should occur using raw source data before documents are constructed.
which could allow async operations
I suppose we could reach inside game.data for original "dirty" data
the async bit is in part to grab fresh item data from compendiums
it's a little off-topic for this thread, but maybe we can make figuring out a solution here an objective for V10 prototyping
pretty much all game systems need to perform data migrations from time to time, it would be good to provide a standard approach for that rather than needing each system to roll their own solution
you can make that argument!
I'm not sure I agree with it though
data validation depends on schema, and curing/corecting invalid data also depends on both schema and validation
Are the custom DocumentData subclasses you're thinking of for the purpose of this thread the sort that may inherit directly from DocumentData (rather than ActorData, etc.)?
so the degree to which they are separate concerns is ... to me ... tenuous at best
I don't think mixing light coercions with validation is any separation-of-concerns crime
To me the concern with data structure is preventing things like null accessors. The value of being non-negative value for data.duration.startTime is pointless if data.duration is undefined or null. Data structure is about paths and types. While validation is about the data being within acceptable ranges. If you check that your data is structured correctly, validation becomes inherently more reliable.
I've got a general thought. When I was poking this all in my dev env, I kinda felt that the schema ended up duplicating the template data structure in template.json, feels like it's close to violating DRY (TS makes it worse due to wanting interfaces).
Do you think providing a default in the schema and removing that structure in template.json is something could be done? Of course with appropriate transition period.
yes, a stretch goal of this would be to eliminate or reduce the importance of the template.json file... HOWEVER... it is important that the template be loaded server-side and we are not going to load system JS code on the server side.
I'm not sure how to reconcile those, but what I think might be possible is to provide a utility function that would automatically generate the template.json file from a DocumentData class definition
so your DocumentData could be the "source of truth" for what the schema is, and then you could generate the necessary JSON file from there
Yes, I really don't want some random dev (including me) to poke server side code.
it's not elegant though ,so maybre there is a better solution
I was largely just thinking to kill the server being responsible for initializing a new document and pushing that client side in schema.
At least for system data.
Ironically, @unreal pendant's wanting of a pure data schema would allow the server to validate system schema.
yep, it would, at the expense of other features
Why is that ironic? That's literally how and why the rest of the world uses schemas.
not true, there's plenty of examples of highly regarded ODM/ORMs which define schema in code
Ironic, because I was just pushing back on it since I failed to grasp a reason for it within Foundry, just to invent one down the line, thus felt ironic to me.
Just for reference (not my module): Stairways extends DocumentData (https://gitlab.com/SWW13/foundryvtt-stairways/-/blob/development/src/StairwayData.js). It actually also creates a custom document that uses this class as its data.
Thanks for the link @blissful ledge, do you know who the author is in Discord?
ideally, we could run the migrations on the server rather than the first logged in client. that would be the safest IMO
i think migrations are the only reason we don't use it in pf2e
I also found this: https://github.com/Xbozon/storyteller/blob/main/main.js#L242
I think the author is @frank bear.
I would like that, too, as it has multiple benefits. But this goes into "execute system code on the server side" territory again, which is not an option that will be considered, as far as I understand.
We would not allow 3rd party code to execute on the server side, unfortunately
So that is not an option
that's arbitrary though. i'm just saying that ideal solution for our problem.
It's far from arbitrary, it's a very intentional design decision
There are plenty of reasons to deny that.
being intentional doesn't mean it's not arbitrary
arbitrary from a math sense
regardless, i'm not arguing that anything should be changed. Just that we're stuck with a bad implementation because of the design decision.
I certainly would say it is not an arbitrary decision, and it’s definitely in the will never happen category
There should be other good ways to improve system data migration though
I think there could potentially be an option for allowing server side migrations of system data that does not depend on system code being executed. If there was a wayto describe migrations in a declarative way on the data level, it might not be necessary to actually execute system code. It might limit a bit what kind of migrations can be performed, but it might cover most of the relevant cases.
Regardless, I think this is turning a bit off topic, so unless Atro wants to discuss (server side) migration improvements in more detail here, I would suggest we return to the original topic.
(I guess discussing how the DocumentData changes would interact with the common migration patterns that are currently used is on topic)
Out of curiosity, is DocumentData something new to V9? Or is it just something that one indirectly uses (via class extensions)?
i'm just saying, that, due to the migration, this entire conversation is useless because we (pf2e) cannot use it
I haven't seen it come up at all in any discussions in #module-development
it's there since 0.8. Usually, most people don't need to touch it, aside from using instances of it (e.g. Actor#data is of type ActorData, which extends DocumentData)
Ah, so it's more for system developers.
perhaps, although you don't have access to server-side migrations today and yet - somehow - the pf2e system still manages to exist 😉
For anybody really who has a use for it. Lot's of systems don't touch it at all. There are only a few modules that do (I linked 2 above).
we don't use this schema stuff
I don't think the design of DocumentData inherently makes that problem worse - although what stwlam mentioned earlier in this thread about the need to async migrations is something that is not currently supporteed
i think they're only really async due to the update methods being async
and we have a class of migrations that aren't just updating documents
We sometimes pull the latest copy of a particular item from a compendium and swap out the actor's older version with it. That's always going to be async
It's a shame you can't just give the server/client a JSON schema: https://json-schema.org/
that is an option that can be considered (and has been in the past) - but I believe it's too limiting for our needs
Am I right in understanding that the intention is to give more flexibility for use cases like mine with the new DocumentData tools? In that I've got the basics in my system.json (defining that each actor has skills, stunts, etc.) but the specifics are implemented in code by my setup and character editing tools. It sounds like in future I'd be able to define a specific DocumentData model which makes the schema for the objects inside stunts, skills etc. if I wanted to with a lot more depth than can currently be supported by the top-level system.json schema?
yeah, that would be one objective
Sounds like something I'll want to consider upgrading to, then, as it would centralise a lot of the validation I currently handle myself. I'm glad it won't be an enforced migration and that I can keep doing it the way I am now until I'm ready to make the leap though.
If this is about validation, let me also pull in @green pollen into this conversation. There is a lot of custom validation logic in https://github.com/Wasteland-Ventures-Group/WV-VTT-module
(not extending DocumentData, though, I think)
it has some limitations [...] structures like arrays
One limitation I'm currently facing (I'm working on macro support and would like to have a list of macro ID's that are active).
Thoughts, feedback, questions, concerns?
- I like the idea and already have a use in the near future for it.
- It would be really nice to deprecate the old format (with an converter function for the next 1 or 2 versions) to not break current modules, upgrading modules to new foundry versions has been a great pita in the past.
- (more of a nitpick) the current set of fields not really match their name e.g.
NONNEGATIVE_NUMBER_FIELD(is required but has a default set - not really an optional number). there isREQUIRED_POSITIVE_NUMBERwithout a default butREQUIRED_NUMBERhas a default - I'd really love to see support for custom documents on the server side (maybe with client-side only validation as a first step) because manually hacking in new documents requires wonky workarounds (see https://gitlab.com/SWW13/foundryvtt-stairways/-/blob/7800039e35698e752f5fd0f8e5808f2876401745/src/dataQuirks.js) and as far as I looked into the server code there was no reason why it shouldn't be possible with some modifications to the validation - much like the "custom" documents for systems.
Thanks for the thoughts @unreal elk. A few responses:
It would be really nice to deprecate the old format (with an converter function for the next 1 or 2 versions) to not break current modules, upgrading modules to new foundry versions has been a great pita in the past.
I would try to provide backwards compatibility for anything that was previously afields.*const
(more of a nitpick) the current set of fields not really match their name e.g. NONNEGATIVE_NUMBER_FIELD (is required but has a default set - not really an optional number). there is REQUIRED_POSITIVE_NUMBER without a default but REQUIRED_NUMBER has a default
The plan would be to retire these completely in favor of field instances where you have more control
I'd really love to see support for custom documents on the server side (maybe with client-side only validation as a first step) because manually hacking in new documents requires wonky workarounds
A separate issue, but there is a proposal for a "Basic Document" which gives a flexible template to use for somewhat arbitrary document types
Just for reference, I only validate my own system data, nothing outside of that. And what I do is 90% JSON schema. There is a miniscule amount of custom code to compare values against another, but that's about it.
Also I'm not sure what we currently have to do to prevent updates is ideal. Right now we either have to use hooks or override _preCreate and _preUpdate and throw in them.
an advantage of using this proposed approach is any attempted updates or creations which contain invalid data would be blocked with informative errors
I'm not saying anything against throwing errors. It just feels a bit unusual compared to the rest of foundry, that I've seen so far. But that might just be subjective.
Based on my experience with Storyteller, when you create your document types, you need to implement the database on its own, in my case, it's a trivial json in the settings. At the same time, the registration of this type itself took me quite a long time because of the lack of guides about it.
But surprisingly, it turned out to be quite realistic to use.
I'm not sure that the decision to somehow describe the scheme in the database itself is so necessary, as read/write to the file, with game amounts should not be too slow.
One of the things I have planned for V10 is that you would be able to use a DataField as the type of a game setting, which would help by handling cleaning/validation/etc... for such settings
It will be pretty useful
I don't know if this is something worth considering, but it could be useful to have some sort of versioned schema
So you could have a v1 schema, and then when you do a major change you create a new v2 schema. You can tell the system to load the data using the v1 schema and then write a migration that converts between them
Documents could keep track of which version they are on to make tracking migrations easier rather than having to guess based on system version checks or data inspection
This would probably be easiest for a setup where the schema was stored in static json
@inner lintel it's an idea that has value on its own, to be sure, but I am pretty hesitant about that given the complexity that might be required to pull it off in a way that plays nicely with every other component of the system.
Probably most useful to game systems where the data model changes more frequently than for core data types
@surreal harbor I don't know if this gives you any ideas at all, but this is my go-to library for data validation (in Python): https://github.com/kolypto/py-good
I find the DSL-like design very flexible.
looks powerful, but I have to admit I don't love the syntax from looking over examples