JSONLines — My Favourite Format

code
data formats
Published

April 4, 2026

This post will only be of interest to people who regularly need to move data between programs, or store experimental results. If that sounds like you, read on — I want to convince you that JSONLines is almost certainly the right format for the job.

The problem with serialisation formats

There is no shortage of ways to serialise structured data. Protocol Buffers, MessagePack, Cap’n Proto, Avro, Thrift — all well-engineered, all fast, all widely used. The problem isn’t performance. The problem is that you need a library.

In practice, this means finding a binding for your language, hoping it’s maintained, hoping it compiles cleanly on your platform, and hoping it supports the version of the schema format you need. If you work in Python or Java, this is usually fine. If you work in GAP, or a niche constraint solver, or some research language you wrote yourself, you are out of luck. Even for mainstream languages, I’ve lost more time than I’d like to library version mismatches and broken builds for serialisation libraries I didn’t especially want in the first place.

JSONLines: the minimum viable format

JSONLines is almost absurdly simple. A file is a sequence of lines. Each line is a valid JSON object. That’s it.

To read and write JSONLines, your language needs exactly two capabilities:

  1. Read and write JSON.
  2. Read and write a single line of text.

Almost every programming language has both of these. If yours doesn’t support the second one, I leave that as an exercise for the reader.

There is no schema to define, no code to generate, no binary encoding to decode. You can inspect a JSONLines file with head, tail, grep, or any text editor. You can concatenate two files with cat. You can count records with wc -l.

Why not CSV?

CSV is the other obvious “simple” choice for structured data. The problem is that CSV is barely a format at all — there are countless dialects, quoting rules vary, and there’s no standard way to represent nested data. Worse, a CSV file tells you nothing about what its columns mean. You have to know, or guess, or read the README that someone hopefully wrote.

JSONLines is self-documenting. Each line carries its own field names:

{"solver": "minion", "instance": "queens-12", "time_s": 0.34, "solutions": 14200}
{"solver": "minion", "instance": "queens-13", "time_s": 1.87, "solutions": 73712}

If you add a new field next month — say "peak_memory_mb" — old records still parse fine. They just don’t have that field. No schema migration, no breaking change.

Why not plain JSON?

You could store the same data as a single JSON array. The problem is that a JSON array must be complete to be valid. If your experiment runner crashes halfway through, you get a truncated file that no JSON parser will accept.

With JSONLines, each line is independently valid. If a process is interrupted, you lose at most the partially-written last line. Every complete line before it is fine. This matters a lot when you’re running hundreds of experiments overnight. If 3 out of 200 runs crash, you lose 3 lines, not 3 files. You can just cat *.jsonl and get on with the analysis.

This also means you can write results incrementally — append a line after each experiment finishes, and the file is always in a consistent state. No need to hold everything in memory and write it all at the end.

A concrete example: GAP talking to C++

Let me show how I’ve used JSONLines for inter-process communication. The Vole package needs GAP (a computer algebra system) to communicate with a program written in Rust. We use JSONLines over pipes, and it works well. Here’s a simplified version of the same idea using C++.

We need three pieces: a GAP program, a C++ program, and some pipes for them to talk through. A pipe acts much like its name — one program writes into it, another reads out of it. We’ll use fifos, which you can create from the command line:

mkfifo togap.pipe
mkfifo fromgap.pipe

First, a minimal example to check the plumbing works. The GAP side:

# Write to this pipe
outfile := IO_File("fromgap.pipe", "w");
# Read from this pipe
infile := IO_File("togap.pipe", "r");
# Send a message. GAP calls IO_Flush as part of IO_WriteLine.
IO_WriteLine(outfile, "Hello from GAP!");
# Read a message
str := IO_ReadLine(infile);
Print("GAP read: ", str, "\n");

And the C++ side:

#include <iostream>
#include <fstream>

int main(void)
{
    std::ifstream in("fromgap.pipe");
    std::ofstream out("togap.pipe");

    std::string line;
    std::getline(in, line);
    std::cout << "C++ read: " << line << "\n";
    // std::endl flushes the stream
    out << "Hello from C++!" << std::endl;
}

Run these in two terminals and they should exchange messages. A couple of things that can go wrong:

  • Nothing happens — did you run mkfifo first to create the pipe files?
  • One program hangs — pipes block until both ends are open. If both programs open their reading pipe first, they’ll deadlock. Make sure the open order is consistent.

Adding JSONLines

Now let’s do something useful. We’ll turn GAP into a small server that answers queries about the transitive groups library, with each request and response as a single JSON object on its own line:

LoadPackage("json");

outfile := IO_File("fromgap.pipe", "w");
infile := IO_File("togap.pipe", "r");
while true do
    command := JsonStringToGap(IO_ReadLine(infile));
    if command[1] = "nrgroups" then
        ret := NrTransitiveGroups(command[2]);
    elif command[1] = "size" then
        ret := Size(TransitiveGroup(command[2], command[3]));
    elif command[1] = "exit" then
        QUIT_GAP();
    else
        Print("Unknown command: ", command);
        QUIT_GAP();
    fi;
    IO_WriteLine(outfile, GapToJsonString(ret));
od;

The C++ client:

#include <iostream>
#include <fstream>

int main(void)
{
    std::ifstream in("fromgap.pipe");
    std::ofstream out("togap.pipe");

    // Get number of transitive groups on 6 points
    out << "[\"nrgroups\", 6]" << std::endl;
    std::string line;
    std::getline(in, line);
    int count = std::stoi(line);
    // GAP arrays start from 1
    for(int i = 1; i <= count; ++i) {
        out << "[\"size\", 6, " << i << "]" << std::endl;
        std::getline(in, line);
        int size = std::stoi(line);
        std::cout << "TransitiveGroup(6," << i << ") has size " << size << "\n";
    }
    out << "[\"exit\"]" << std::endl;
}

Notice that neither side needed a specialised serialisation library. GAP has the json package; the C++ side is just concatenating strings. Yes, I know you shouldn’t build JSON by string concatenation. In this case, all the values are either integers or short ASCII command names — no double quotes, no backslashes, no control characters, so there’s nothing that needs escaping. If you were passing arbitrary user-supplied strings, you’d want a proper JSON library. Except, C++ doesn’t have a standard package manager, and I can’t be bothered to spend an afternoon picking a good cross-platform JSON library right now. So: string concatenation it is, with the understanding that the values stay boring.

The format is human-readable, so debugging is straightforward — you can literally cat the pipe to see what’s going on.

If one of the programs crashes, the other can detect the broken pipe and continue. When Vole’s Rust component crashes on malformed input (it happens), GAP notices and reports the error to the user rather than dying itself. That kind of fault isolation comes for free with process separation and pipes.

In short

JSONLines works because it asks almost nothing of your toolchain and gives you a format that is human-readable, self-documenting, trivially extensible, and robust to failure. I’ve used it for inter-process communication, experimental logging, and data exchange between languages that have nothing else in common. I keep reaching for it because I’ve never found a situation where the added complexity of a “proper” serialisation format was worth the hassle.