reMarkable .lines File Format
Happy holidays everyone!
The days before Christmas I took two evenings for a little project.
I tried to understand the binary file format the reMarkable e-ink tablet creates when drawing on it, the .lines
(nowadays .rm
) file.
.rm
files are part of the stored notebooks (and annotated PDFs) on the tablet.
It is stored together with further textual meta information (JSON) under .local/share/remarkable/xochitl/
.
Since this is my first time decoding a binary format my approach might have been a bit… unusual. Anyway, this blog post will explain you what I found and a follow-up post will explain the how (cliffhanger!).
Notebooks
New and annotated files on the reMarkable tablet are called notebooks. A notebook consists of pages. Each page has one background layer (PNG or PDF page) and up to five layers for your drawings. Each layer then stores a set of lines which again consists of points (dots? ;) ).
(Note: I am defining the wordings as I go to names that appear reasonable to me.)
A single line can have exactly one color, one brush type & one base size. Each point inside a line has three attributes: two coordinates, a pressure and two angles for the pen’s steepness to the surface.
.rm Format
reMarkable .lines
(now .rm
) files are contiguous, uncompressed binaries with sections similar to the RAM representation of a plain struct
(POD).
The byte order is little endian and standard data types are used (ASCII chars, four-Byte ints and IEEE floats).
The first 43 Bytes consist of the char-text reMarkable lines with selections and layers
and zero-padding.
Update v3: now reMarkable .lines file, version=3
and 10 spaces.
If follows an int32_t
(4 Bytes) with the number of pages.
(Removed in v3, now one page per file.)
After that, loop over the retrieved number of pages - the file format contains no end-of-page, end-of-file or similar symbols.
Directly at the beginning of the now following page section, the next 4 Bytes are an int32_t
again for the number of layers inside the page.
Again, loop over the retrieved number of layers for the next reads.
A layer then in turn has first 4 Bytes with an int32_t
for the number of lines inside it.
You now guessed it, loop over the retrieved number of lines for the next reads.
Now a line has the following attributes: 4 Bytes int32_t
for the brush type, 4 Bytes int32_t
for the color of the line, and an other 4 Bytes “padding” which in all my files is zero.
Then follows the base brush size in 4 Bytes float32_t
aaaannd the number of points inside the current line as int32_t
(4 Bytes).
Deducting from the file header ... with selections ...
, I would bet the always-zero padding attribute should likely correspond to selected (yes/no) from the selection tool.
The selection tool (🜊) in the tablet’s GUI allows to select a list of complete lines from an intersection with a selected free form for move/scale/duplicate operations.
Anyway, this value is in all my files serialized as zero (maybe a bug) which corresponds to the observation that between closing and opening a file selections are not restored.
Finally, points:
You guessed it again, reading the next block of points needs to be done in a loop as retrieved in the number of points attribute of the current line.
A point consists of five times 4 Bytes of float32_t
in the order of: X coordinate, Y coordinate, pen pressure, pen rotation to X axis, pen rotation to Y axis.
Update v3: now x
, y
, speed
, direction
, width
, and pressure
.
… and that is it!
Magic Numbers & Ranges
The integers in the file format for types can take certain magic numbers mapping them to implemented functionality. Floating point attributes have ranges (and units).
Line Attributes
The display seems to support a single shade of gray and black. Since lines can be drawn on top of each other, this makes three colors (including white lines):
The brush types selectable in the GUI are represented as:
While the base brush size can vary between the following values:
Point Attributes
Remarkably, coordinates are stored as float32_t
instead of a possible representation in int
(e.g. even int16_t
!).
This probably just saves some type conversion during the rendering or might originate from the representation in the touch driver.
Nevertheless, from the values I saw it does not look too likely that the touch resolution is higher than the display resolution.
Coordinates are withing the range of the resolution of the display: 0.0
to 1404.
for X
and 0.0
to 1872.
for Y
:
The pressure of drawing a certain point can vary between zero and one:
The pen rotation is the most interesting attribute.
As in the sketch below, the rotation is given in Radian to the normal of the page for both the angle to X and Y axis.
The range can be in principle from minus Pi half to Pi half, although the maximum will unlikely be reached as the pen tip will not touch the surface if you try to hold it parallel to the surface ;)
Input validation fans take care: if a user rams the pen from the backside through the tablet and manages to draw a point with it, the full mathematical range from minus Pi to Pi for this input can be reached!
Here is again the full cheat sheet for your enum
implementations :)
From Pen to Paper
Now that we understand how each and every stroke on our tablet is stored we can ask the question: What defines a rendered line?
Very similar to Gimp’s interface of drawing lines with pens, a rendered representation will now place “brushes” on each dot. Brushes are partly-transparent PNGs that are placed on top of each other. I guess the reMarkable is writing points with a constant sampling rate, which will make slowly drawn lines look continuous while fast strokes are reveling the beautiful structure of the selected brush type (try it with a nice big size!).
The interesting parts: of course, the brush PNG needs to be oriented during rendering. Contrary to what one would guess for e.g. a stylograph (not implemented in the reMarkable), these are not rotated according to the pen’s rotation during the draw but to the tangents of two consecutive points during draw. This gives a paint brush like orientation of the rendered structure of lines - for all available brushes…
The actual width of a line assembles from four attributes. Besides the brush type (PNG template) the pressure and the rotation to the surface determine the width of a line.
Fun fact: The latter could actually be represented with only one attribute for the currently implemented brushes… so maybe we will see a calligraphy pen some time via an update? And by the way, what does brush type 6
stand for? :)
Fun fact 2: if you export a stroke via PDF (currently 1.4) the rotation of the PNG brushes is broken and always appears to be zero in all readers I tried (reported; seen in Gimp, Inkscape, Evince, Okular, …).
What to Do Next?
We do now understand our own data, our very own files we wrote with our reMarkable tablet. So what’s next and why to understand it anyway?
For easier handling for the format, we should define its formal grammar in a more sophisticated way than my writing above.
As good open source contributors, we can also add recognition of the file format to the file
command:
Now:
Then:
Update 2018-07-25: Just put this in your $HOME/.magic
and it will add file recognition!
Update 2018-07-26: Aaaaand… it’s mainline now, coming with file
v5.35 :-)
Update 2019-02-09: Updated to version 3
We of course need an API in your favorite language for reading & writing .rm
files.
On top of such, we can write converters to render .rm
files, e.g. allowing to change brush types in post processing to add a stylograph brush type for writing as with fancy dip pens :)
Update 2017-12-29: Hey let’s just start right away.
Here is my little C++ file API and some first render examples to PNG: lines-are-beautiful
Update 2018-07-29: Also supports writing .lines
(.rm
) files now.
Also, we could with such converters improve exports - e.g. an exported PDF of the reMarkable tablet is easily 20 MB in size with the current vendor implementation and contains the bug described above.
And we could add our own cloud backends, e.g. to upload the small binary files to our own Nextcloud instances.
You have more ideas or are already implementing some of the above? Feel free to share it as well! :) (See my public contact details in the footer of the page.)
Updates
Just after publishing I found this reddit. It seems the missing 4 Bytes could indeed be something from the select-and-transform tool. Let’s see if we can get a non-zero value in there :)
2017-12-29: I just got note that the reMarkable wiki and the reHackable GitHub organization are also really far in their converters and file format documentations, including a working .lines to svg converter for all strokes on the device! Kudos, that’s great work everyone! :)
2018-10-30: The latest update just announced hand-writing detection, improved PDF export and beta SVG export :-)
2018-09-24: I drafted an implementation in Rust: lines-are-rusty
2019-02-09: The .lines
format has slightly changed to version 3 end of 2018.
Notebooks are now stored as one file per page, the file ending is .rm
and the points attributes are now: x
, y
, speed
, direction
, width
, and pressure
.
See this update (or that one).
Referral Program
On Oct 4th, 2018 reMarkable started a program to get 85 Euro off and me 85 Euro cashback. So let’s give this a try, here is my referrer link: buy a reMarkable tablet (affiliate link)
Version
“Codex”: 0.0.4.81
/etc/version
: 20170911122159