#Grammar Rules defining the stream format of Java object serialization
1 messages · Page 1 of 1 (latest)
<@&987246399047479336> please have a look, thanks.
Here is an AI assisted attempt to answer your question 🤖. Maybe it helps! In any case, a human is on the way 👍. To continue talking to the AI, you can use </chatgpt:1108714622413963314>.
Java object serialization stream format is defined by the grammar rules specified in the Object Serialization Specification.
The stream format consists of a sequence of data elements such as object instances, class descriptions, and primitive data types. Each data element is encoded with specific markers and lengths to indicate its type and structure within the stream.
The grammar rules define how objects are serialized into a byte stream, including handling cyclic references, class versioning, and externalizable objects.
Useful links:
What do the grammar rules for the Java object serialization stream format mean, and how should I interpret them?
stream:
magic version contents
contents:
content
contents content
content:
object
blockdata
object:
newObject
newClass
newArray
newString
newEnum
newClassDesc
prevObject
nullReference
exception
TC_RESET
newClass:
TC_CLASS classDesc newHandle
classDesc:
newClassDesc
nullReference
(ClassDesc)prevObject // an object required to be of type ClassDesc
superClassDesc:
classDesc
newClassDesc:
TC_CLASSDESC className serialVersionUID newHandle classDescInfo
TC_PROXYCLASSDESC newHandle proxyClassDescInfo
classDescInfo:
classDescFlags fields classAnnotation superClassDesc
className:
(utf)
serialVersionUID:
(long)
classDescFlags:
(byte) // Defined in Terminal Symbols and Constants
proxyClassDescInfo:
(int)<count> proxyInterfaceName[count] classAnnotation
superClassDesc
proxyInterfaceName:
(utf)
fields:
(short)<count> fieldDesc[count]
fieldDesc:
primitiveDesc
objectDesc
primitiveDesc:
prim_typecode fieldName
objectDesc:
obj_typecode fieldName className1
fieldName:
(utf)
className1:
(String)object // String containing the field's type,
// in field descriptor format
classAnnotation:
endBlockData
contents endBlockData // contents written by annotateClass
i doubt many people here know anything about the internal binary format used by javas default serialization
prim_typecode:
'B' // byte
'C' // char
'D' // double
'F' // float
'I' // integer
'J' // long
'S' // short
'Z' // boolean
obj_typecode:
'[' // array
'L' // object
newArray:
TC_ARRAY classDesc newHandle (int)<size> values[size]
newObject:
TC_OBJECT classDesc newHandle classdata[] // data for each class
classdata:
nowrclass // SC_SERIALIZABLE & classDescFlag &&
// !(SC_WRITE_METHOD & classDescFlags)
wrclass objectAnnotation // SC_SERIALIZABLE & classDescFlag &&
// SC_WRITE_METHOD & classDescFlags
externalContents // SC_EXTERNALIZABLE & classDescFlag &&
// !(SC_BLOCKDATA & classDescFlags
objectAnnotation // SC_EXTERNALIZABLE & classDescFlag&&
// SC_BLOCKDATA & classDescFlags
nowrclass:
values // fields in order of class descriptor
wrclass:
nowrclass
objectAnnotation:
endBlockData
contents endBlockData // contents written by writeObject
// or writeExternal PROTOCOL_VERSION_2.
blockdata:
blockdatashort
blockdatalong
blockdatashort:
TC_BLOCKDATA (unsigned byte)<size> (byte)[size]
blockdatalong:
TC_BLOCKDATALONG (int)<size> (byte)[size]
endBlockData:
TC_ENDBLOCKDATA
externalContent: // Only parseable by readExternal
(bytes) // primitive data
object
externalContents: // externalContent written by
externalContent // writeExternal in PROTOCOL_VERSION_1.
externalContents externalContent
newString:
TC_STRING newHandle (utf)
TC_LONGSTRING newHandle (long-utf)
newEnum:
TC_ENUM classDesc newHandle enumConstantName
enumConstantName:
(String)object
prevObject:
TC_REFERENCE (int)handle
nullReference:
TC_NULL
exception:
TC_EXCEPTION reset (Throwable)object reset
magic:
STREAM_MAGIC
version:
STREAM_VERSION
values: // The size and types are described by the
// classDesc for the current object
newHandle: // The next number in sequence is assigned
// to the object being serialized or deserialized
reset: // The set of known objects is discarded
// so the objects of the exception do not
// overlap with the previously sent objects
// or with objects that may be sent after
// the exception
I'm not quite sure what these represent.
I want to deepen my understanding of the internal structure of the stream.
okay. well, i cant help u with that since i have no idea of these things - and most others probably dont have either
Got it. Thank you. It seems okay even if I don't understand this.
I'll look into it when I have time.
This code converts an object into a serialized byte array—does this mean that the stream internally looks like this?
The grammar rules of the Java serialization stream are like a fixed template or a blueprint.
"This grammar is like a map for analyzing byte streams. For example, when looking at a byte sequence such as AC ED 00 05 73 72 ...:
AC ED 00 05 → header
73 → TC_OBJECT (a new object)
72 → TC_CLASSDESC (class description)
70 → TC_NULL (null reference)
This grammar allows us to interpret what each byte means."
What do you not understand ?
There's essentially zero value to understanding the internals of this stream format.
- You'll only use JDK classes to produce and consume these streams
- The serialisation feature is actively being replaced (by a mechanism that doesn't constrain the stream format directly)