Get a random record from 126K lines of JSON

Play with the actual "experiment" here.

tldr:

Don't load huge JSON into memory, pick a random spot in the file, stream in a string about twice the size of the expected record, find a complete JSON object, parse it, and return. Code at the bottom ↓

Background

I came across an article on muxup.com about how the doodles were added to their footer. That led me down the path of exploring the skull illustrations users had created for Google's "Quick, Draw!" Then I found myself with a 76MB .ndjson file in my downloads folder and an idea!

(btw, .ndjson is newline delimited JSON -- pretty helpful here!)

The Idea

Pick a random skull doodle and render it with an HTML custom element. Extend it as a web component so it could fetch new skulls (eventually I'll animate the drawing). The front-end was pretty straight forward, here's my Enhance element.

The crux: reading a random line from a huge file in AWS Lambda without loading the whole thing into memory such that handler execution takes several seconds and no one wants to wait that long to see a mediocre skull drawing.

The 76MB file contains 126,174 skull doodles, so I didn't want to load all that into Lambda RAM and pick a random one from a huge array. This would make response time super slow: like a whole second. Nobody has that kind of time.

Instead, I stream out 2kb from a random position in the file, look for a whole JSON object in that section, parse it, and return it. All in ~150ms.

import fs from 'node:fs'

const filePath = './skulls.ndjson' // 1 skull per line
const stats = fs.statSync(filePath) // get filesize
const cursor = Math.floor(Math.random() * stats.size)
const fileStream = fs.createReadStream(filePath, {
  start: cursor, // start is likely mid-line
  end: cursor + 2000, // line is typically ~1kb, grab 2
  encoding: 'utf8',
})

// collect the full length of 2kb
let chonks = ''
for await (const chonk of fileStream) chonks += chonk

// find the first and second newlines
const n1 = chonks.indexOf('\n')
const n2 = chonks.indexOf('\n', n1 + 1)
const line = chonks.slice(n1 + 1, n2)

const 💀 = JSON.parse(line)

Here's the full handler (it's a part of a larger Architect app with Enhance).

from DynamoDB (5.8ms) to HTML (13ms) in 18ms