Home

Base64 Is Fast Now, Actually

This is a short post.

I was wondering recently, if I had to send a binary file across a text stream to a browser - i.e., embedded in JSON - what would be the fastest way to do that.

Clearly, base64'ing the content is a valid approach and always has a 4/3 overhead.

However, one thought I had is that JS strings are fundamentally just arrays of unsigned 16-bit integers (or []uint16, if you're writing lots of Go, like me). So we can actually just convert any kind of data to a JS string and then encode it again as UTF-8 without risk of loss.

The Thesis

What we do is basically take a Uint8Array ("native bytes"), reinterpret it as a Uint16Array, then convert it to a JS string. The code looks a bit like:

function encode(raw: Uint8Array): string {
  const u16 = new Uint16Array(raw);
  const s = new TextDecoder('utf-16le').decode(u16);
  return JSON.stringify(s);
}

…this actually doesn't work for two reasons:

  1. our data might have an odd number of bytes

  2. TextDecoder actually generates U+FFFD for bad data, which random binary data will have - it's not actually string data.

Both can be solved - in Node, using Buffer, and in the browser using a few hacks.

The Results

Here is a table of all the results (for a single encode/decode round-trip in ms, but tests run 100s of times). I used an audio file of about 1.4mb, and a zipped version of the same file (~600k).

Chart comparing speed differences for encoding data
built-in base64 is stupidly fast

So: native base64 is stupidly fast. But encoding via UTF-16 string isn't far behind.

To explain the options:

The code to generate this data is here.

Analysis

The UTF-16 approach is interesting, and remarkably fast. It generates a totally valid string, a random excerpt looks like:

"…弼並于忒퍛孾븭䍗伐뾠䫼藸왬愣邲틫違纝Ꟍ牿\udf00왻﫶星멝ꟁ郧葓놹쑘澓燕㺪枢궅᧧澵춢呪垦ൢួ醝⠷銗䇎\udd2c흅蝗ꟛ퍾慙瘝\udc9d縱쯙꾵\udeef篙䶄Ḛ煩忌೷떵䵺업\ude9f輫\udcdc㾺뤻촸㯆孟\udcb9꡸뉺捸蜹躯액ꌀ瑍幘㕢譎뉛ᡦḝ萿힓ܞꪾඞ䣾闧忓槢\udf8e⭖㹠❎⁼ﷳ봝霗טּ棭\ud8dfጚ᭲ꄝ졷憢뙡뷪憓ä䧻弱ᘋᢃ䘶ر哤樹Ꮴ䮙ᔾᎻꩮꐱ趫\ud875摁狃渣ᅝ쿮꣖곷㰰뗊뼥ꎝ᜷\udbcfﴞᮽ轺稯올熽䝂鏅᧵눎᜾ᐋ퉖鋲\udaee쉼⇅뾠奰᳨詻닞굉\ud82dϓ綋놼☳뎭㳉趽畵ꨨ恫ꬬ洕ꈞ㨷鮊ꨋጦ渚陾翿䂼콦爧섺樼ඓ浧乖\ud872༗ḻ࿀徚ꟽ\udfa1疋庨㼖븖⦊⿧…"

But it's also not any smaller than base64. For the two files, the UTF-16 approach is 173% and 154% of the original size, respectively.

Remember, what we're doing is:

But part of the issue is that most data here ends up having no valid representation in UTF-8, so it ends up being escaped - that's the \udc9d and friends we see above.

And yes, we could start inventing our own binary formats… but that's not the point, the theory is that this must still be valid JSON.

Whereas, base64 is consistently 4/3, or 133% the size of the original.

So What?

This article has a pretty clear outcome, which was literally in the title.

You should probably encode data using the native base64 methods. They're really fast and consistent.

For users on old Chrome, - provide a polyfill maybe for the next year or so, which will be slower, but basically penalize them for not upgrading.

Thanks for reading! 🕺