Skip to content

Instantly share code, notes, and snippets.

@sloanlance
Last active May 22, 2025 15:17
Show Gist options
  • Save sloanlance/c3bf746b6396f60d321f5535e1ced892 to your computer and use it in GitHub Desktop.
Save sloanlance/c3bf746b6396f60d321f5535e1ced892 to your computer and use it in GitHub Desktop.
jq: JSONL ↔︎ JSON conversion

jq: JSONL ↔︎ JSON conversion

Prerequisites

  • jqhttps://jqlang.org/ — "like sed for JSON data"

    There are several options available for installing jq. I prefer to use Homebrew: brew install jq

  • JSONL → JSON

    jq -s '.' input.jsonl > output.json
  • JSON → JSONL

    jq -c '.[]' input.json > output.jsonl

Note: This document is now included in Cookbook · jqlang/jq Wiki.

@frfernandezdev
Copy link

🫶

@delano
Copy link

delano commented Mar 22, 2024

Legend

@sloanlance
Copy link
Author

sloanlance commented Mar 22, 2024 via email

@sloanlance
Copy link
Author

sloanlance commented Mar 22, 2024 via email

@EdGaere
Copy link

EdGaere commented Apr 1, 2025

It's unbelievable, jq can solve this in one line. I was about to embark on writing yet-another-python-script.py to convert a gzipped nested JSON file to JSONL, but thankfully I came across this post.

Suppose you have a JSON like this:

{
   "meta" : { }

 , "data" : [
    { "idx" : 1
      , "input" : "ABC"
     , target : "123"
     , some_other_field : "zzz" 
    },

   { "idx" : 2
      , "input" : "DEF"
     , target : "456"
     , some_other_field : "zzz" 
    }, 
   ...

 ]
}

You can use the following one-line command to extract the 'data' array, keep the 'input' and 'target' fields only, and generate JSONL:
gunzip -c somefile.json.gz | jq .data | jq -c '.[] | {input, target}'

Output

{"input" : "ABC, "target" : "123"}
{"input" : "DEF, "target" : "456"}
...

@sloanlance
Copy link
Author

sloanlance commented May 20, 2025

@EdGaere, thanks for the example! I thought I could improve (i.e., shorten) the command you wrote. In your example, you called jq twice in the pipeline, but it can be done with one call instead…

gzcat somefile.json.gz | jq -c '.data[] | {input, target}'

I.e., combining the filters: .data + .[].data[].

Cleaning the test data

I wanted to test this with your data. `jq` didn't like your hand-edited data's unquoted keys, like `some_other_field`, so I cleaned up the data first…
{
  "meta": {},
  "data": [
    {
      "idx": 1,
      "input": "ABC",
      "target": "123",
      "some_other_field": "zzz"
    },
    {
      "idx": 2,
      "input": "DEF",
      "target": "456",
      "some_other_field": "zzz"
    }
  ]
}

(Maybe jq has some option to ignore errors like unquoted keys.)

Using the test data

Running the shortened command I gave above gives the output…
{"input":"ABC","target":"123"}
{"input":"DEF","target":"456"}

Notice that the -c option for jq compacts the output without whitespace in each record. It's more compact than the hand-edited output of your example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment