# Changeset format description

A changeset is a list of instructions that can be applied to a database for the purpose of modifying its state. It is also the input and output of the OT conflict resolution algorithm.

## Building blocks

### Integer encoding

Integers are encoded as variable-length integers of arbitrary size as LEB128. See [integer_codec.hpp](../src/realm/noinst/integer_codec.hpp) for the implementation.

### Strings

Strings are represented as (size, data...) in positions that expect a string values. Implementations of changeset parsers may convert this into a `StringBufferRange`, which has (offset, size) acting as a pointer into the changeset's serialized representation.

### String interning

A special instruction, `InternString = 0x3f` indicates that a particular integer value in the changeset represents a particular string. This allows for compact representation of string identifiers, as well as fast string-comparison during conflict resolution (integer comparison versus linear byte comparison).

`InternString` instructions may appear anywhere in a changeset stream, but to allow optimizations based on it, they must indicate contiguous, increasing integer values, and the set of interned string values may not contain duplicates.

## Value types

Every value is preceded by an integer indicating its type.

The following value types are supported:

|  | Type prefix | Encoding | Remarks |
| - | :----: | -------- | ------- |
| NULL | `0` | Ø | No payload data. |
| Int | `1` | `<Int64>` ||
| Bool | `2` | `<UInt8>` ||
| String | `3` | `<N: UInt64> <N bytes>` |  |
| Binary | `4` | `<N: UInt64> <N bytes>` |  |
| Timestamp | `5` | `<s: Int64> <ns: Int64>` | Since UNIX epoch|
| Float | `6` | `<4 bytes>` | IEEE-754 |
| Double | `7` | `<8 bytes>` | IEEE-754 |
| Decimal128 | `8` | `<cx: UInt64> <exp: Int32> <sign: Int32>` | `cx` is the coefficient |
| Link | `9` | `<primary key> <table: InternString>` ||
| ObjectId | `10` | `<12 bytes>` ||
| GlobalKey | `-1` | `<hi: UInt64> <lo: UInt64>` ||
| ObjectValue | `-2` | Ø | No payload data. See [Embedded objects](#embedded-objects).|


## Table identifiers

Tables are identified by an `InternString` representing the _class_ name corresponding to that table.

**NOTE:** In Core, user-visible tables have the prefix `class_`, in order to maintain the option of having internal tables not visible to the user. However, the Sync protocol should only operate on user-visible data structures, so uses the user-visible class name instead of the table name.

## Object identifiers

Objects are identified by their unique key in the table scope. If the table has a primary key column, the object's ID is the primary key. Furthermore, string primary keys are interned.

Suppported primary key value types:

- `NULL`
- String (`InternString`)
- Signed integer (`Int64`)
- ObjectId

If the table does not have a primary key column, we use the `GlobalKey` (exposed by Core) as the object's primary key. In effect, all object have "primary keys", but `GlobalKey`s happen to be locally autogenerated.

Note that `NULL` is a valid primary key value, but it is untyped; we do not distinguish between a NULL integer and a NULL string, for example.

~~~
primary key ::= NULL
            | <InternString>
            | <Int64>
            | <ObjectID>
            | <GlobalKey>
            ;                  
~~~

## Paths

A path indicates a position in the database, for instance a field of an object or an array element.

A path consists of zero or more path elements. A path element is either an unsigned integer (array index) or an interned string (object field or dictionary key).

Paths correspond to "dotted paths" in MongoDB.

~~~
element ::= <UInt32>
        | <InternString>
        ;

path ::= Ø
     | <element> <path>
     ;
~~~

Note that all paths are scoped under `table.object.field`. Semantically, this prefix is treated as an implicit prefix of the path in question, but is represented separately in each instruction for convenience.

### Path encoding

Paths are encoded as `<N: UInt64> <N elements>`.

Each path element is either:

- `<index: UInt32>`
- `<-1> <field: InternString>`.

Since array indices can never be negative, `-1` is used to indicate that the next integer is an `InternString`.

## Embedded Objects

The `ObjectValue` value is used to indicate that an embedded object should be created at a particular place. A `Set` or `ArrayInsert` instruction with an `ObjectValue` indicates that the field or array value is an embedded object, and that a new object should be created at that position, if one does not already exist.

Erasing an embedded object is either `Set(object.field, NULL)` in the case of objects embedded in fields, or `ArrayErase(index)` in the case of objects embedded in a list. Note that once Core supports NULLs in lists of links, the alternative form `Set(array.index, NULL)` can also be used to erase an embedded object.

## Instructions

Instruction type byte in parentheses.


Common structs

~~~
enum KeyType = Int | String | ObjectId | GlobalKey;
~~~

### AddTable (`0x00`)

~~~
struct AddTable {
    table: InternString,
    spec: TopLevelTable | EmbeddedTable
}

struct TopLevelTable {
    pk_field: InternString,
    pk_type: KeyType,
    pk_nullable: bool,
    asymmetric: bool,
}

struct EmbeddedTable {}
~~~

### EraseTable (`0x1`)
~~~
struct EraseTable {
    table: InternString
}
~~~

### CreateObject (`0x2`)
~~~
struct CreateObject {
    table: InternString,
    object: PrimaryKey,
}
~~~

### EraseObject (`0x3`)
~~~
struct EraseObject {
    table: InternString,
    object: PrimaryKey,
}
~~~

### Set (`0x4`)
~~~
struct Set {
    table: InternString,
    object: PrimaryKey,
    path: Path,
    value: Payload,

    union {
        is_default: bool,
        prior_size: UInt32,
    }
}
~~~
### AddInteger (`0x5`)
~~~
struct AddInteger {
    table: InternString,
    object: PrimaryKey,
    path: Path,
    value: Int64,
}
~~~

### AddColumn (`0x6`)
~~~
struct AddColumn {
    table: InternString,
    field: InternString,
    type: PayloadType,
    nullable: bool,
    list: bool,
    link_target_table: InternString
}
~~~

### EraseColumn (`0x7`)
~~~
struct EraseColumn {
    table: InternString,
    field: InternString
}
~~~

### ArrayInsert (`0x8`)

**NOTE:** Path must end with an array index.

~~~
struct ArrayInsert {
    table: InternString,
    object: PrimaryKey,
    field: InternString,
    path: Path,
    value: Payload,
    prior_size: UInt32
}
~~~

### ArrayMove (`0x9`)

**NOTE:** Path must end with an array index (the move-from index).

~~~
struct ArrayMove {
    table: InternString,
    object: PrimaryKey,
    field: InternString,
    path: Path,
    ndx_2: UInt32,
    prior_size: UInt32,
}
~~~

### ArrayErase (`0xa`)

**NOTE:** Path must end with an array index.

~~~
struct ArrayErase {
    table: InternString,
    object: PrimaryKey,
    field: InternString,
    path: Path,
    prior_size: UInt32,
}
~~~

### Clear (`0xb`)

**NOTE:** Contrary to other container instructions, `Clear` does not end with an index or key. It clears the container itself, and not a value at a particular index.

~~~
struct Clear {
    table: InternString,
    object: PrimaryKey,
    field: InternString,
    path: Path,
    prior_size: UInt32, // ignored
}
~~~
