Skip to content

Instantly share code, notes, and snippets.

@dskvr
Last active June 11, 2025 00:50
Show Gist options
  • Save dskvr/535e29c47278ca9254c1d02e6585e6e2 to your computer and use it in GitHub Desktop.
Save dskvr/535e29c47278ca9254c1d02e6585e6e2 to your computer and use it in GitHub Desktop.
NIP-TN

Understood. I’ll revise the NIP proposal to define a new dedicated event kind 7043 for tone events. The JSON schema will remain in the content field, and we’ll expand tag usage to include both standard t hashtags and new single-letter custom tags for structured classification—enabling better filtering per NIP-01. I’ll make those updates and notify you when the draft is ready for review.

NIP-TN - Tone Events (Sound Representation)

Preamble

draft optional author:YourName discussions-to:#music NIP: 7043 Title: Tone Events (Sound Representation) Author: Your Name (or Nostr pubkey) Status: Draft Type: Standard Created: 2025-06-10

Abstract

This proposal defines a new Nostr event type (kind: 7043) called a Tone event. Tone events represent a single sound (tone) in a synthesis-agnostic, musical format. The event’s content field is a JSON-encoded schema describing the tone’s parameters – including pitch, duration, amplitude (volume), timbre (sound quality or instrument), modulation (e.g. vibrato/tremolo), envelope (volume shape over time), and an extension field for additional data. By using a dedicated kind instead of the generic kind:1 text note, Tone events remain isolated from normal text posts to prevent misinterpretation or accidental mutation. Tone events are immutable and uniquely identify one sound: once published, a tone’s definition does not change.

The proposal also prescribes robust use of event tags to classify and organize Tone events for discovery. It uses standard topic tags (t tags) for broad categorization (such as tone, fx, alert, music) and introduces custom single-letter tags (i, n, s) for fine-grained classification by instrument category, pitch class, and musical scale or key. These tags enable fast filtering of Tone events via NIP-01/12 query mechanisms. We provide guidelines to use these tags effectively without redundant duplication of information. While the full tone specification lives in the JSON content, the tags capture key searchable facets to support efficient discovery.

This NIP is structured as follows: it motivates the need for Tone events, defines relevant terms, specifies the event format (including JSON schema and tagging scheme) with versioning support, gives examples of Tone events, and discusses design considerations and future implications.

Motivation

Why a dedicated Tone event? As Nostr grows beyond text-based notes into richer content, there is a need for representing sound in a standardized way. Applications may want to share simple melodies, alert sounds, instrument presets, or sound effects on Nostr. However, using existing text note events (kind:1) is unsuitable. Tone data in a text note could be misinterpreted by clients, pollute text timelines, or be inadvertently edited or deleted. A separate kind (7043) provides a clear boundary and meaning, so clients can handle these events as audio definitions rather than normal messages. It prevents unintended side effects like a tone definition being treated as a comment or being replaced by mistake. By isolating Tone events, we ensure they remain immutable reference objects that can be reliably shared and referenced (each Tone event ID uniquely identifies a specific sound definition).

Interoperability and Synthesis-Agnostic Design: We want a tone format that any audio synthesis library or digital instrument can interpret. This means using common musical terms (pitch, amplitude, envelope, etc.) rather than engine-specific settings. By standardizing the JSON schema, different clients and services can generate or play the described sound consistently. For example, a music app could read a Tone event and synthesize the note, or a relay could index tones by musical attributes. Without a standard, one app’s sound format would be incompatible with another’s. A Nostr-wide convention enables a portable “sound byte” that any platform can use.

Efficient Discovery via Tags: In Nostr’s base protocol, event content is not directly searchable. To find events by criteria (like “all piano tones” or “all tones at pitch C”), we rely on tags that relays index and filter on. NIP-12 extended Nostr to allow querying events by arbitrary single-letter tags. This proposal leverages that: Tone events embed key attributes as tags so they can be easily found without full-text search. For instance, a user could subscribe to all events with an instrument tag i=piano or all events tagged n=C to collect all C-note tones. Standard t hashtag tags (like tone, music, fx) allow topical search (e.g. all Tone events or specifically those meant as music). Without these tags, discovering tones would be difficult, defeating the purpose of sharing them. Thus, robust tagging is central to our design.

Avoiding Duplication and Spam: A careful tagging strategy is needed to provide useful indices without bloating relay databases or duplicating the detailed JSON content. NIP-12 notes that single-letter tags should only be used for meaningful metadata and not abused, to keep relay indexes efficient. This proposal encourages using a small number of concise tags per tone (covering broad categories and one value per classification axis). By limiting tags to essential classifications (instrument, pitch class, scale, etc.), we avoid redundant data (for example, not tagging every parameter like duration or exact frequency, which would be excessive). This ensures that relays can index Tone events usefully without performance issues, and clients can filter tones by high-level attributes without parsing every event’s JSON.

In summary, is motivated by the need for a clear, shareable sound event format on Nostr that is isolated from text notes, universally interpretable, and easily searchable by musical criteria.

Definitions

Tone: In this context, a “tone” refers to a single audible sound with defined properties (such as a musical note or sound effect). It typically has a definite pitch (frequency), duration, loudness, and timbral character. A tone could be a musical note (e.g. A4 piano note), an alert beep, or any standalone sound.

Tone Event: A Nostr event of kind 7043 representing a single Tone. It encapsulates the tone’s defining parameters in its content JSON and uses tags for classification. Tone events are immutable and meant to serve as unique, shareable references to sounds (one event per distinct sound).

Pitch: The fundamental frequency of the tone, determining how “high” or “low” the sound is. In this spec, pitch can be expressed either as a frequency in hertz (e.g. 440.0 for A4) or as a musical note name with octave (e.g. "A4"). (Clients SHOULD interpret note names relative to the standard A4=440 Hz tuning). If a tone has no definite pitch (e.g. a noise burst), the pitch may be set to null or omitted.

Duration: The length of time the tone sounds, in seconds. This is the intended audible duration of the sound from start to finish. (It can encompass the full envelope – attack to release – of the tone.)

Amplitude: The overall volume or loudness of the tone. Typically this is a normalized value between 0.0 and 1.0 (where 1.0 is full volume). It may correspond to the peak amplitude of the sound. (Clients MAY interpret amplitude in their own units, but SHOULD treat 1.0 as a reference maximum level to avoid clipping.)

Timbre: The sound quality or tone color – essentially what instrument or waveform the tone sounds like. Timbre distinguishes a piano note from a violin playing the same pitch, for example. In this spec, timbre is described in a synthesis-agnostic way: it could be a simple waveform label ("sine", "square", "sawtooth", "noise" etc.), an instrument name ("piano", "guitar", "violin"), or any descriptor of the harmonic content. This field gives an idea of the tone’s character, without tying it to a specific synthesizer implementation.

Envelope: The amplitude envelope of the tone – how the volume of the sound evolves over time. We use the common ADSR (Attack-Decay-Sustain-Release) model to describe the envelope. The envelope can be represented as an object with subfields, for example:

  • attack: time (seconds) from silence to full amplitude at the start.
  • decay: time to drop from initial peak to the sustain level.
  • sustain: sustain level (a fraction of peak amplitude, 0.0–1.0) during the main phase of the tone.
  • release: time to fade from sustain level back to silence at the end.

Not all sounds require a complex envelope (for instance, a simple beep might have an instantaneous attack and no sustain). If no envelope is specified, it implies a default of immediate attack, constant sustain equal to the amplitude, and immediate or very quick release at the end of the duration.

Modulation: Any periodic or dynamic modulation applied to the tone. This typically means low-frequency oscillation or other parameter changes over time that affect the sound’s pitch or amplitude. Common examples are vibrato (pitch modulation) and tremolo (volume modulation). The modulation field can be an object describing the type of modulation and its parameters, for example:

  • type: the modulation target or nature (e.g. "vibrato" for pitch mod, "tremolo" for volume mod, or "filter" if modulating a filter frequency, etc.).
  • rate: modulation frequency in Hz (e.g. 5 Hz vibrato rate).
  • depth: modulation depth (the extent of pitch change in semitones for vibrato, or amplitude variation for tremolo, etc.).

The modulation definition is kept general so it can cover various use cases. If no modulation is needed, this field can be omitted.

Extension Field (any): A flexible, open-ended field in the content JSON (keyed as "any") for any additional data or future extensions. This allows including synthesis parameters or metadata that are not covered by the core schema. For example, one might include a custom parameter (like a filter cutoff frequency, effect settings, a sample file reference, etc.) under "any". The value of "any" can be of any JSON type (object, array, string, etc.), and it is up to the producer and consumer of the event to understand any data placed here. Unknown data in "any" can safely be ignored by clients that do not recognize it. This provides forward compatibility for new features without breaking the core schema.

Hashtag Tag (t tag): A tag in the Nostr event used for topical categorization. The t tag (short for “topic” or colloquial "hashtag") is a standard Nostr convention for marking an event with a subject label. In Tone events, t tags are used to broadly classify the context of the tone, for example:

  • "tone" – a general marker to identify the event as a tone definition.
  • "music" – indicates the tone is musical in nature (part of music/melody).
  • "fx" – indicates a sound effect.
  • "alert" – indicates the tone is an alert/notification sound. These are free-form hashtags, but we expect certain common ones (like the above) to be used for Tone events for consistency. t tags are indexed by relays for search, allowing users to discover tones by topic.

Instrument Tag (i tag): A custom single-letter tag ("i") introduced by this NIP to classify the tone by its instrument category or source. The value of the i tag is typically a broad instrument family or type (e.g. "synth", "piano", "guitar", "drums", "string", "brass"). It can also be a specific instrument name if desired (like "violin"), but the intention is to allow quick filtering by instrument type. By convention, only one i tag should be present per Tone event (the tone has one primary instrument identity). This tag helps queries like “find all tones that sound like a piano” (filter #i:["piano"]) or “all synthesized tones” (#i:["synth"]).

Pitch Class Tag (n tag): A custom tag ("n") used to denote the musical pitch class of the tone. The tag value is the note name (ignoring octave), e.g. "C", "C#" (or "Db"), "D", ... up to "B". Only one n tag should be used per event. This allows filtering tones by note regardless of octave – for example, #n:["C"] would find tones that are some form of C (C3, C4, etc.). If the tone’s pitch is not a standard musical note (e.g. a in-between frequency or no pitch), the n tag can be omitted. (If a tone is microtonal or noise, it doesn’t fit a pitch class.) The pitch class tag is especially useful in musical contexts – e.g., finding all tones that are the note A for tuning reference, or grouping tones by note name.

Scale/Key Tag (s tag): A custom tag ("s") representing a musical scale or key context for the tone. The value might be a key signature or scale name, such as "Cmaj" (C major scale), "Amin" (A minor), "Blues", "Pentatonic", etc. This tag is optional and should be used at most once per event. The purpose is to classify a tone as being part of a certain musical scale or key center, if relevant. For instance, if a tone is intended for a song in G major, one might tag it s: "Gmaj". Or a set of tones might all be tagged s: "Pentatonic" to indicate they form a pentatonic scale. This can assist in filtering tones that belong to a certain musical context (e.g., find all tone events in the scale of C Major). If the tone is just an isolated sound with no particular scale context, the s tag can be omitted.

Immutability: In this NIP, Tone events are considered immutable, meaning once a tone is published, it is not updated or replaced. Each tone event stands alone as a unique record of a sound. (By Nostr convention, event kinds in the 1000–9999 range are regular events that relays store without automatic replacement. Kind 7043 falls here, so multiple Tone events from the same author will all be kept individually.) Immutability ensures that if someone references a Tone event by ID, they will always get the same content (sound definition) and it won’t be silently changed. If a user wants to modify a tone, they should publish a new Tone event rather than editing the old one.

Versioning: The Tone content schema includes a version field (v) to indicate the format version of the tone definition. This allows the schema to evolve over time. Versioning provides forward compatibility: clients can check the v field and handle the content appropriately if they support that version. The initial version defined by this NIP is 1. Future versions could add new fields or change semantics; by labeling the version, old clients can recognize they might not fully understand a newer format and either degrade gracefully or ignore those events. The v field is a simple integer (or semantic version string) and is part of the content JSON.

Specification

Event Kind 7043 – Tone Event Structure

A Tone event is a Nostr event with kind value 7043. This kind is reserved for representing a single sound/tone definition. Clients MUST use kind 7043 (and NOT kind 1 or any other kind) for Tone events to ensure they are distinct from regular text notes and other content types. According to NIP-01, new kinds are defined by proposals like this to give specific meaning to events. The kind number 7043 has no special behavior in relays beyond being a "regular" event (persisted by relays, not automatically replaced or deleted). In other words, Tone events behave like normal posts in terms of relay storage, except their content and intended usage are specialized.

Event Fields:

A Tone event uses the standard Nostr event fields:

  • id: (32-byte hex) computed event identifier (as per NIP-01).
  • pubkey: (32-byte hex) author’s public key.
  • created_at: timestamp.
  • kind: 7043.
  • tags: an array of tags (see Tagging Schema below).
  • content: a stringified JSON object containing the tone definition (see Content Schema below).
  • sig: signature.

The .content field MUST be a JSON object encoded as a string. (This is similar to how kind 0 metadata events embed a JSON in the content field.) Consumers of the event will parse the JSON string to extract the tone parameters. The JSON object schema is defined below. Note: Although we present the content as JSON, in the actual event it is a string (e.g., "{\"v\":1, \"pitch\":440.0, ...}"). Clients and relays do not interpret this content beyond storing/transmitting it; it’s the client’s job to parse it if they understand kind 7043.

Tone events are immutable and non-replaceable. There is no defined mechanism to update a tone event in place. Each published Tone event is a unique snapshot of a sound. If an author wants to correct or change a tone, they SHOULD issue a new event (possibly with a new tag indicating version or an e tag referencing the old event if they want to link them, though linking is not part of this NIP). Relays will store all Tone events separately (since 7043 is in the regular event range). This ensures that references to a tone (by event id) remain valid over time and always point to the original content.

Content JSON Schema

The content of a Tone event is a JSON object containing the following fields. This schema uses synthesis-agnostic but musically common terminology. All keys are lowercase strings. Clients SHOULD ignore any fields not recognized (for forward compatibility), except where specified as required.

  • v (version): Required. An integer or string indicating the version of the tone schema. For this initial specification, v MUST be 1. This field allows the format to evolve. Future versions may introduce new fields or changes; the version helps clients decide how to parse or if they support the event. If a client encounters a version higher than it supports, it SHOULD handle what it can (if the format is backwards-compatible) or ignore the event if the format is unknown. (Using an integer version makes it easy to do simple comparisons.)

  • pitch: Required in most cases. The pitch of the tone. This can be represented in one of two ways:

    • As a number (integer or float) – interpreted as frequency in Hertz. For example, 440.0 means A4 (440 Hz).
    • As a string – interpreted as a musical note name with octave, using English notation A through G, optional # (sharp) or b (flat), followed by an octave number. For example, "A4" for 440 Hz, "C#5" for C-sharp in 5th octave, etc.

    Clients SHOULD support at least one of these representations (and ideally both). If a string note is provided, the client can convert it to a frequency for synthesis. We assume standard 12-tone equal temperament tuning unless otherwise specified. If the tone does not have a definite pitch (e.g. a noise or percussion hit), this field MAY be set to null or omitted. (If omitted or null, it signals an unpitched sound; clients can handle accordingly, such as using noise generation.)

  • duration: Required. A number (integer or float) specifying the tone’s duration in seconds. This is the length of time the tone should play from start to end. For example, 0.5 for a half-second beep, or 2.0 for a 2-second note. The duration includes the sustain portion; if an envelope with a release is specified, the total sound might slightly exceed the duration (depending on how the client schedules the release). Clients SHOULD treat this as the intended audible length and fit the envelope to it. (For instance, a client may assume the duration is from note-on to note-off, after which the release begins. Alternatively, if simpler, a client might approximate that the tone stops after duration seconds regardless, possibly truncating a long release.)

  • amplitude: Required. A number representing the volume or amplitude of the tone. Typically this is a normalized linear value between 0.0 and 1.0. 1.0 should correspond to a default maximum volume that is considered safe or standard on the client (perhaps 0 dBFS in audio terms). Values above 1.0 are not recommended (to avoid clipping or unintended loudness). For example, 0.8 would be 80% of full volume. This can be thought of as the peak amplitude of the tone. Clients MAY apply their own global volume scaling in addition to this (e.g., a user volume knob), but this field is the relative loudness of this tone.

  • timbre: Required. A description of the tone’s timbre or instrument sound. This is typically a short string. It can take one of the following forms:

    • A basic waveform identifier: e.g. "sine", "square", "triangle", "sawtooth", "noise". These correspond to simple oscillator types or noise.
    • An instrument name or family: e.g. "piano", "electric piano", "violin", "flute", "guitar", "synth-pad", "drum". This is more descriptive and relies on the client to approximate that timbre. (Clients could map known instrument names to available presets or sample banks if they have them. If not recognized, treat as a hint – e.g. unknown names might just default to a general or skip if cannot be reproduced.)
    • Other descriptive terms: e.g. "bell", "chirp", "whistle" – anything that gives an idea of the sound’s quality.

    The timbre field is intentionally free-form to allow flexibility, but using common names is encouraged for interoperability. If the tone’s exact timbre cannot be described simply, it might either require an extended definition in the any field or be left as a general category here. (For example, a complex FM synth patch might just say "synth" in timbre, with details of operators in any.)

  • envelope: Optional. An object defining the tone’s envelope (volume over time). If omitted, the tone is assumed to have a default envelope of full volume for the entire duration (or a simple on/off). If provided, the envelope object may contain:

    • attack (number, seconds) – time for volume to rise from 0 to full at the start. If 0 or very small, the tone starts at full volume immediately.
    • decay (number, seconds) – time for volume to decrease from the peak (after attack) down to the sustain level.
    • sustain (number, level 0.0–1.0) – the relative volume level during the sustain phase (after decay). The sustain phase typically lasts until the tone is released (or until the duration if we assume the tone is held that long). If not given, a default sustain level might be 1.0 (meaning no drop after attack).
    • release (number, seconds) – time for volume to fall from sustain level to 0 at the end of the tone.

    These parameters approximate a standard ADSR envelope. For example, a piano-like tone might have attack:0, decay:0.3, sustain:0.5, release:1.0 (fast strike, then decays to half volume and a slow fade). A sustained organ tone might have attack:0.1, decay:0, sustain:1.0, release:0.2 (some fade-in, no decay, full sustain, short release). If a parameter is omitted, clients can assume a sensible default (e.g. if decay is missing but sustain is given, it could mean an immediate drop to sustain; if sustain is missing, assume 1.0 sustain; if release missing, assume quick release).

  • modulation: Optional. An object describing any modulation applied to the tone. This spec does not rigidly define the format, but it suggests the following possible subfields:

    • type: a string indicating the modulation type or target. Examples: "vibrato" (modulating pitch), "tremolo" (modulating amplitude), "wow" (if modulating filter or other effect – just an example).
    • rate: number (Hz) indicating how fast the modulation oscillates. For instance, 5 for 5 Hz vibrato (5 cycles per second).
    • depth: number indicating the depth or intensity of modulation. The interpretation depends on type (for vibrato, this could be pitch range in semitones or a fraction of a tone; for tremolo, it could be amplitude variation amount 0–1).
    • Optionally, shape or waveform of the LFO (e.g. wave: "sine" as default; could allow "triangle", etc).

    The modulation field can be extended as needed. Clients that recognize standard fields should apply them; if unknown modulation info is present, a client can ignore it (the tone will just play without that mod). If no modulation object is present, assume the tone is static (no vibrato/tremolo).

  • any: Optional. A field that can hold any additional JSON data for extensions. This could be an object or any JSON value. It serves as a bucket for future expansion or custom parameters:

    • For example, an implementation might include { "filter_cutoff": 2000, "filter_resonance": 0.8 } inside "any" to specify a filter on the tone.
    • Another use might be adding metadata like "description": "This tone is the notification sound used in MyApp v2.0" or "tags": ["lofi", "experimental"] (additional descriptive tags unrelated to Nostr tag indexing).
    • If the tone is actually a sample-based sound, one might include a reference like "sample_url": "https://example.com/sounds/123.wav" or a hash of a sample file in "any".

    This NIP doesn’t standardize any specific fields inside any – it is free for experimentation. Clients MUST ignore data in any that they do not understand. Producers SHOULD use any rather than adding new top-level fields, to avoid conflicts with future official fields. (If a future NIP or version 2 adds new top-level keys, it will use v to differentiate; otherwise, custom data goes in any.)

JSON Schema Summary: In summary, a Tone event’s content JSON has the structure (in informal notation):

{
  "v": 1,
  "pitch": <number or string or null>,
  "duration": <number>,
  "amplitude": <number>,
  "timbre": <string>,
  "envelope": {             (optional)
     "attack": <number>,
     "decay": <number>,
     "sustain": <number>,
     "release": <number>
  },
  "modulation": {           (optional)
     "type": <string>,
     "rate": <number>,
     "depth": <number>,
     "...": "..."           (other modulation params)
  },
  "any": <any JSON value>   (optional extension)
}

All numeric values are in units of seconds (for times) or hertz (for frequency) or normalized 0–1 (for levels), unless otherwise specified. Strings are used for names and enumerations. This schema is designed to be easily parseable and convertible to parameters of common audio libraries (for example, Web Audio API, MIDI synthesizers, etc.). The naming aligns with musical concepts so that even if a specific synth doesn’t use ADSR, it can approximate it, etc.

Tagging Schema and Conventions

Tone events make heavy use of the tags array to provide metadata that relays index for search and filtering. By NIP-01 convention, single-letter tag keys (like e, p, or custom ones) are indexed by relays and queryable. NIP-12 further allows generic tag filtering on any single-letter key. We define the following usage of tags in Tone events:

  • Hashtag Tags (t): Tone events SHOULD include one or more "t" tags (topic hashtags) for general classification. At minimum, a Tone event MUST include ["t","tone"] as one tag – this marks the event as a tone definition. (Having this common tag allows relays or clients to quickly filter all tone events by querying {"#t":["tone"]}.) Additional t tags can describe the context or intended usage of the tone:

    • If the tone is part of music or meant as a musical note, include ["t","music"].
    • If the tone is a sound effect ( Foley, UI sound, etc.), include ["t","fx"].
    • If the tone is specifically an alert or notification sound, include ["t","alert"].
    • Other examples: ["t","sfx"] (for sound effects, similar to fx), ["t","instrument"], or any custom hashtag relevant to the tone.

    Hashtag values are lowercase by convention (like typical hashtags). Use them sparingly and meaningfully; 1–3 t tags is usually sufficient. This provides broad categories for users to find tones (e.g., someone looking for notification sounds could search #t:["alert"]). Keep in mind that the presence of these tags is user-driven classification, so consistency in tag usage (like always tagging “tone”) is important for network-wide utility.

  • Instrument Tag (i): A single-letter tag "i" is used to classify the tone’s instrument or sound source category. Each Tone event MAY include at most one i tag. The i tag’s value SHOULD be a broad instrument category or commonly recognized instrument name. Example tag entries:

    • ["i","synth"] – indicating the sound is synthesized or a synth instrument.
    • ["i","piano"] – indicating a piano sound (could be acoustic or electronic).
    • ["i","guitar"] – a guitar pluck or chord sound.
    • ["i","drums"] or ["i","percussion"] – percussive/non-melodic sound.
    • ["i","brass"] – a brass instrument (like trumpet).
    • ["i","violin"] or ["i","strings"] – string instrument.

    The value is free-form text (one word preferred). It’s recommended to choose a general category rather than very niche terms, so that filtering is effective. For example, tag as "keyboard" instead of "harpsichord" if you want the tone to be found with other keyboard instruments. (The specific "harpsichord" character can still be in the timbre content, but the tag might say "keyboard" to group it with pianos/organs for search.) On the other hand, if a specific instrument is important (say you want specifically “violin”), you can tag "violin". The key is to avoid duplicate tagging: do not include multiple i tags like ["i","violin"] and ["i","strings"] on the same event, as that is redundant. Pick one category that best represents the tone. By policy, relays index only the first value of a tag, and having multiple i tags would just create separate indexes; it's more efficient to have one clear instrument classification per tone.

  • Pitch Class Tag (n): The "n" tag denotes the note name (pitch class) of the tone. A Tone event SHOULD include an n tag if the sound has a definite pitch that corresponds to a musical note. Only one n tag is used per event. The value MUST be the pitch class as a capital letter A–G, optionally with a # for sharp or b for flat. Do not include the octave number here – this tag is for class only. Examples:

    • ["n","C"] (for C natural, any octave),
    • ["n","G#"] (for G sharp),
    • ["n","Fb"] (for F-flat, which is musically E but someone might denote it as such contextually).

    Use sharps or flats consistently (sharps preferred unless the musical context calls for a flat notation). The pitch class tag allows grouping all Cs together, etc. If a tone’s pitch is not on a standard chromatic scale (e.g. 440 Hz which is A, or 445 Hz which is slightly sharp of A4, etc.), you can choose the nearest pitch class or omit the n tag. If the tone is unpitched (like a drum hit or noise), do not include an n tag at all. This way, a search for a certain note (#n:["A"]) will only return events that have that note classification. Keep n tags to a single value – never something like ["n","A","C"] (if a sound somehow blends two notes, it’s better considered as two tones or just omit pitch class).

  • Scale/Key Tag (s): The "s" tag is used to label the musical scale or key context of the tone, if applicable. At most one s tag may appear per Tone event. The value is typically a short string naming the scale or key. Examples:

    • ["s","Cmaj"] – C major scale.
    • ["s","Amin"] – A minor.
    • ["s","Blues"] – a blues scale (typically assumed maybe in A or general blues).
    • ["s","Dorian"] – Dorian mode (would ideally accompany a root note but if not specified, could mean a generic Dorian scale).
    • ["s","Gmajpentatonic"] or ["s","G-major-pentatonic"] – you can encode more complex names like a key plus scale type.

    There isn’t a rigid format mandated, but it’s recommended to keep it concise: if including a root note, capitalize it (and use maj/min abbreviations for major/minor). If mode or scale type, just name it. The s tag is useful if tones are part of a collection (for instance, a set of tone events forming a whole scale could all share the same s tag). If a tone doesn’t have a particular key context (e.g., a random note or an effect sound), you should omit the s tag. Avoid using multiple s tags on one event (e.g., tagging a note as belonging to two scales – that’s typically unnecessary; choose the most relevant scale or none).

Tagging Guidelines:

  • Required tags: Every Tone event MUST include at least the ["t","tone"] tag. This differentiates it from any other event type. It is also recommended to include either a music or fx context tag via t if it clearly falls into one of these broad categories.
  • Single occurrence: The custom tags i, n, s SHOULD appear at most once each. In other words, you shouldn’t have two instrument tags or two pitch tags on the same event. One tone = one instrument, one pitch, one scale context typically. If an event accidentally had multiple of the same (e.g. ["i","piano"] and ["i","keyboard"] together), consumers SHOULD treat it as ambiguous or simply consider the first. It’s better to avoid that situation by design.
  • Avoid redundancy: Do not use tags to restate information that is already obvious from the content unless it is needed for searching. For example, if the content’s timbre is "piano", you might use ["i","keyboard"] as the tag to categorize it broadly, rather than repeating "piano" as both timbre and instrument tag. Similarly, do not tag the exact frequency or duration – those are not useful for indexing (no one is likely to search by “1.5s duration”). Tags should capture categorical info, not raw data. The JSON content is where precise data lives; tags are for classification.
  • Discovery use-case: Think about what queries you want to enable. Use tags such that someone can find the tone without scanning content. E.g., if you create a set of bird-call tones, you might tag them ["t","fx"] and ["i","animal"] or ["i","bird"]. If you make a series of scale notes, tag them with the scale (s) so they can be fetched as a group. The tags essentially serve as an index for the tone library on Nostr.
  • No large tags: Tag values should be short keywords. Don’t attempt to encode long descriptions or JSON in tags. If some extra classification doesn't fit these single-letter tags, consider using a different approach (e.g., include it in content "any" or define a new tag key via another NIP). Single-letter tags are a scarce resource and should remain human-meaningful metadata to avoid spamming relay indexes.
  • Compatibility: The i, n, s tags are not standard in older Nostr events, but as single-letter tags, relays that support NIP-12 will index them just like #e or #p. Clients can therefore filter by #i, #n, #s in their requests. It is advisable for any client implementing Tone events to include these tags when sending, and to use them when querying tones. Standard relays will treat unknown single-letter tags generically (indexing them because of the convention). This means the introduction of i, n, s does not break anything; it piggybacks on the existing tag indexing system.

Summary of Tags:

  • ["t","tone"](Topic) Identifies the event as a tone/sound. (Always present)
  • ["t","music"](Topic) Tone is musical (a note, part of music).
  • ["t","fx"](Topic) Tone is a sound effect.
  • ["t","alert"](Topic) Tone is an alert/notification sound.
  • ... other t tags as needed, one entry per tag.
  • ["i","<instr>"](Instrument) Instrument category or name (one per event). E.g. <instr> = synth, piano, guitar, drums, etc.
  • ["n","<note>"](Note/Pitch Class) Note name A–G (with #/b if needed) for the tone’s pitch class (one per event). E.g. C, F#, Bb.
  • ["s","<scale>"](Scale/Key) Scale or key context (one per event). E.g. Gmaj, Emin, Blues.

These tags collectively allow fast indexing on different facets of the tone:

  • By topic (#t): All tones vs only music vs only fx, etc.
  • By instrument (#i): find by instrument type.
  • By note (#n): find by note name.
  • By scale (#s): find tones in a certain key or scale.

Clients and relays already implement filtering on tags like #t for hashtags, and generic tag filtering for others via NIP-12. For example, a client can send a subscription filter {"kinds":[7043], "#i":["piano"]} to get all piano tones, or {"#n":["C"], "#t":["music"]} to get all musical tones that are a C note. Relays that support NIP-12 will handle these queries. (If a relay doesn’t support NIP-12, one can still fetch all Tone events and filter client-side by tags, but that’s less efficient.)

Versioning and Compatibility

The current schema version is 1. All Tone events SHOULD include "v":1 in their content. In the future, if a new version (e.g. 2) is introduced, those events will have "v":2. Clients should use the version to decide how to parse:

  • If the client only knows version 1 schema and sees v:2, it should check if the event maybe has the same fundamental fields or if it should ignore it. Ideally, future versions will be designed to be backward compatible or at least backward ignorable (unknown fields can be skipped).
  • The presence of a version field also means we do not need to assign a new kind number for minor iterative improvements – we can extend the content and bump the version. However, major incompatible changes might still necessitate a different kind or a separate NIP. (For example, if a radically different sound representation were needed, that might be a separate kind rather than confuse versioning.)

The any field is the preferred way to extend functionality within version 1. Minor additions can be done by using any to carry extra data, which doesn’t break older clients (they’ll ignore it). Only if something fundamentally changes (like we want a completely different envelope scheme) would we consider incrementing v.

Relays do not need to be aware of the version; they treat content as opaque. It’s purely for clients and forward compatibility.

Examples

Below are examples of Tone events and how they are constructed, demonstrating the JSON content and tags for different scenarios.

Example 1: Simple Alert Beep

An example Tone event for a simple alert sound – a short, high-pitched sine wave beep, meant as a notification tone.

  • Description: A 0.3-second sine wave at 440 Hz (A4), loud volume, no fancy envelope (starts and stops abruptly or with default minimal envelope). Tagged as an alert/fx tone.

  • Tags:

    • t:tone (identifies as tone event)
    • t:alert (this is an alert sound)
    • i:synth (instrument category: synthesized tone)
    • n:A (pitch class A)
    • (No s tag, since it’s not in a musical scale context specifically)

The event might be represented as:

{
  "kind": 7043,
  "pubkey": "<author pubkey>",
  "created_at": 1690000000,
  "tags": [
    ["t", "tone"],
    ["t", "alert"],
    ["i", "synth"],
    ["n", "A"]
  ],
  "content": "{\"v\":1,\"pitch\":440.0,\"duration\":0.3,\"amplitude\":1.0,\"timbre\":\"sine\"}",
  "id": "<event id>",
  "sig": "<signature>"
}

For readability, the content JSON string is shown here unescaped:

{
  "v": 1,
  "pitch": 440.0,
  "duration": 0.3,
  "amplitude": 1.0,
  "timbre": "sine"
}

This content defines: version 1, pitch 440Hz (which is A4), 0.3s duration, full amplitude (1.0), timbre “sine” wave. We did not specify an envelope or modulation, meaning it’s an immediate on/off tone. A client synthesizing this could simply play a 440Hz sine tone at full volume for 0.3 seconds. The tags indicate this is a tone (tone), specifically an alert sound (alert) generated by a synth (synth) with pitch class A. A user searching for alert sounds could filter by #t:["alert"] and find this event. If searching by instrument, #i:["synth"] would include this among synthesized tones.

Example 2: Musical Instrument Note

This example showcases a musical note tone, with a specified envelope and musical context.

  • Description: A 2-second piano note, D5 (around 587 Hz), medium loudness, with a piano-like envelope (quick attack, some decay, sustain, and release). This might represent a sample or synthesized piano tone.

  • Tags:

    • t:tone (tone event)
    • t:music (it's a musical note)
    • i:keyboard (instrument category: keyboard instrument; using a broad category for piano)
    • n:D (pitch class D)
    • s:Gmaj (this note is used in context of G major scale, perhaps it's the fifth of G major)

Event structure (with content string shortened for clarity):

{
  "kind": 7043,
  "pubkey": "<author pubkey>",
  "created_at": 1690001234,
  "tags": [
    ["t", "tone"],
    ["t", "music"],
    ["i", "keyboard"],
    ["n", "D"],
    ["s", "Gmaj"]
  ],
  "content": "{
    \"v\":1,
    \"pitch\":\"D5\",
    \"duration\":2.0,
    \"amplitude\":0.8,
    \"timbre\":\"piano\",
    \"envelope\": { \"attack\":0.0, \"decay\":0.3, \"sustain\":0.5, \"release\":1.0 },
    \"modulation\": { \"type\":\"vibrato\", \"rate\":5, \"depth\":0.2 }
  }",
  "id": "<event id>",
  "sig": "<signature>"
}

Formatted content JSON:

{
  "v": 1,
  "pitch": "D5",
  "duration": 2.0,
  "amplitude": 0.8,
  "timbre": "piano",
  "envelope": {
    "attack": 0.0,
    "decay": 0.3,
    "sustain": 0.5,
    "release": 1.0
  },
  "modulation": {
    "type": "vibrato",
    "rate": 5,
    "depth": 0.2
  }
}

This defines a version 1 tone with pitch "D5" (which the client will interpret as D in the 5th octave, presumably around 587 Hz). Duration 2.0 seconds. Amplitude 0.8 (80% volume). Timbre "piano" suggests the client should make it sound like a piano (perhaps by using a piano sample or similar harmonic content). The envelope is given: no attack delay (immediate strike), a decay of 0.3s to sustain level 0.5 (so it hits loud then drops to 50% volume), then sustains at 0.5 until release, and release is 1.0s (a fade-out of 1 second). Modulation: a subtle vibrato (type: vibrato) at 5 Hz with depth 0.2 (this depth might be interpreted as 0.2 semitones or some fraction – up to the client, but it's a gentle vibrato). The tags categorize it: it's a tone for musical use, from a keyboard family instrument, the note is D, and we tag "Gmaj" to indicate perhaps it's part of G major context.

If a user wants to find all piano/keyboard tones, they filter #i:["keyboard"]. If they want all D notes, #n:["D"]. All tones in G major scale could be fetched via #s:["Gmaj"]. Note that we used a broad instrument tag (keyboard) instead of the specific "piano". This is a design choice to group similar instruments; the specific timbre "piano" is in content. We avoided tagging both "piano" and "keyboard" to not duplicate. Another Tone event that was an organ could also be i:keyboard and timbre "organ" – a search for #i:["keyboard"] would find both piano and organ tones.

Example 3: Percussive Noise Effect

As a third example, consider a tone event that doesn’t have a pitch, like a snare drum hit or a noise burst used for an effect.

  • Description: A 1-second white noise burst, medium-high volume, used as a sound effect (for example, simulating a splash or a burst sound).

  • Tags:

    • t:tone
    • t:fx (sound effect)
    • i:percussion (categorized under percussion instruments)
    • (No n tag, because noise has no specific pitch)
    • (No s tag, not in a musical scale)

Event content might be:

{
  "v": 1,
  "pitch": null,
  "duration": 1.0,
  "amplitude": 0.9,
  "timbre": "noise",
  "envelope": {
    "attack": 0.0,
    "decay": 0.0,
    "sustain": 1.0,
    "release": 0.2
  }
}

Tags:

"tags": [
  ["t", "tone"],
  ["t", "fx"],
  ["i", "percussion"]
]

Here, pitch is explicitly null to denote no definite pitch. Timbre "noise" tells the client to generate white noise. The envelope has no attack or decay (instant on, stays at full volume sustain), and a short release of 0.2s to avoid an abrupt stop (the noise will fade out for 0.2s at the end of the 1.0s duration). This simulates something like a quick burst of noise with a bit of tail. The tags label it as a tone event, specifically a sound effect, and instrument category percussion (we treat unpitched percussion as percussion category). Anyone looking for percussion or drum sounds could search #i:["percussion"]. Searching for #t:["fx"] would turn up this and other sound effects. Since there's no n tag, it won't appear in note-based searches (which is correct).

These examples illustrate how to use the content schema and tags in practice. They also demonstrate avoiding tag duplication: e.g., we didn't tag example 3 with both "fx" and "alert" (only fx, since it's not specifically an alert). We didn't tag example 2 with both "piano" and "keyboard" – just one instrument category. In each case, the content JSON provides full details, while tags summarize key points for indexing.

Considerations

  1. Isolation and Client Handling: By design, Tone events (kind 7043) are separate from text notes. Clients that do not recognize kind 7043 will simply ignore these events (or at most display the raw content string, which would be JSON gibberish to an end-user). This is acceptable, as older or general clients can skip unknown kinds without harm. On the other hand, clients that do support Tone events should NOT treat them like text: e.g., they shouldn’t be displayed in general feeds or allow replying as if they were messages. Instead, supporting clients might list them in a “Sounds” section or provide a play/generate button. The separation ensures no accidental interactions (like a user trying to like or zap a tone event expecting text content).

  2. Immutability and Uniqueness: Tone events are immutable. Each event is a unique representation of a sound at a point in time. Authors should refrain from reusing the same event ID for different sounds (which is not possible unless they reuse keys and deliberately craft the same content anyway). If an update is needed (say the author wants to tweak the envelope), they must publish a new Tone event. There is no replaceable kind for tones (we intentionally chose 7043 in the "regular" range, not the replaceable ranges). This approach guarantees that any reference (via an e tag or external link) to a Tone event ID will always point to the exact original tone. It also means if one is building a library of sounds, adding a new version will not overwrite the old – clients might need to handle possibly multiple versions of a tone if that occurs (for example, an app could choose the latest by an author if it knows a certain naming scheme, but that's outside this spec).

  3. Relay Storage and Performance: Tone events are small in size (the content JSON is typically a few hundred bytes at most). They should not pose any unusual burden on relays in terms of storage. Relays will store them just like any other event since they are in the regular kind range. Indexing: We heavily use tags for indexing by relays. Single-letter tags are indexed by convention, so the use of i, n, s, t fits into existing relay index structures. As with any metadata tags, if these are overused or spammed, it could bloat indexes. However, our guidelines limit the number of tags per event and encourage reuse of common values (e.g., many events will use the same tag values like "tone", "music", etc., which is efficient for indexing). Relay operators should not need special-casing for kind 7043, aside from possibly recognizing it as a known kind (for informational purposes or to advertise support).

  4. Efficient Search and Filtering: The primary way to discover Tone events is via tag filtering (per NIP-12). Users can combine filters to narrow down sounds (by kind, by tags). For example, a query to find all violin tones in D major might specify {"kinds":[7043], "#i":["violin"], "#s":["Dmaj"]}. This requires the relay to support arbitrary tag queries (which most do, as NIP-12 is widely adopted). Without tags, one would have to download all tone events and parse content to find matches, which is impractical. Thus, proper tagging is crucial. We’ve balanced including enough tags for search without mirroring every detail. In practice, broad searches (instrument, note, scale) get you a candidate set, then a client could further filter by reading content if needed (e.g., find all C notes via #n:["C"], then among those check which have amplitude>0.5 if someone needed that — though that second part would be client-side). The combination of relay-side tag filtering and client-side content parsing can yield powerful queries without putting undue load on relays.

  5. Audio Synthesis and Playback: This NIP does not mandate how a client should generate or play the tone – just how to describe it. Different clients might use different methods (one might use the Web Audio API oscillators, another might use a MIDI soundfont, etc.). Because of this, the resulting sound might not be identical across implementations, especially for complex timbres. The goal is a reasonable approximation: e.g., a "piano" timbre tag might be rendered as an actual sampled piano on one app, but maybe as a sine wave (if nothing better) on another. This is acceptable; the event defines the intention of the sound. For critical use-cases (like exact sound design), producers might include more data in any or even share a recording via other means. But for most cases (notifications, simple music), this format suffices. Clients SHOULD ensure that playing a tone from untrusted sources is done safely — for example, respect the amplitude (don’t blast at 100% volume unexpectedly) and possibly provide a user setting to limit loudness or duration. A malicious actor could create a “tone” with amplitude 1.0 and a very long sustain to annoy users; clients can mitigate by imposing sane limits (for instance, they might ignore tones longer than, say, 10 seconds or require user confirmation).

  6. Content Safety and Size: Tone events contain textual parameter data only. There’s no binary audio content directly. This means they pose minimal risk in terms of containing malware or such (unlike if we allowed binary audio files, which could be large or exploitable in audio decoders). The JSON should be small – if someone tries to stuff extremely large data into the any field (like an entire audio sample in base64), relays may reject it due to size or clients may simply not handle it. This spec assumes tone events are lightweight. For actual audio file sharing, another approach (like NIP-94 for media references or a new kind) might be more appropriate. We deliberately keep this to parametric descriptions for simplicity and low bandwidth. Also, by using JSON, the content is human-inspectable (to some degree) and easily transformable (e.g., a gateway service could convert a tone JSON to a MIDI note or a sound file).

  7. Extensibility: The inclusion of versioning and the any field is meant to future-proof the Tone event format. For example, future considerations might include:

    • Supporting polyphonic or chord events (one event containing multiple pitches or a chord). This NIP handles one tone = one pitch. A future extension might define a kind for chords or use any to list multiple notes (but that would complicate the idea of a “single sound”; more likely a separate kind or a list linking multiple tone events would be used).
    • Additional sound parameters like stereo panning, reverb or echo settings, etc. These could be added in a future version or placed in any under an agreed key.
    • Interactivity or dynamic tones: e.g., if a tone is meant to respond to some real-time input (this is out of scope for now).

    By having a clear core schema and a place for extras, we can accommodate new needs without breaking the old. Version 1 should cover most simple and moderately complex sounds. If a future change is minor (e.g., adding an optional pan field for stereo position), it could be done in a backward-compatible way (clients ignoring it would still function). If a future change is major, it might use v:2 or a separate NIP.

  8. Adoption and Usage: For this NIP to be useful, client and developer adoption is needed. We envision uses such as:

    • A “sound marketplace” or repository where creators post Tone events for various instruments or effects, which others can search and use.
    • Social applications where users share short melodies or riffs by posting sequences of Tone events (or referencing Tone events in some structured way).
    • Nostr clients allowing custom notification sounds: for example, you could set the sound your client plays for a new message by referencing a Tone event ID. Because Tone events are on Nostr, someone could publish a cool notification sound and many users could use it by its event ID.
    • Collaborative music making: although this format alone is one-note, multiple Tone events could be combined by a client to form songs. (A future NIP might define how to compose a sequence or playlist of Tone events, possibly via a kind of list event that references multiple Tone event IDs in order).

    As a standard, NIP-TN will enable these scenarios in an interoperable way. It is optional (clients don't have to implement it if it's not in their scope), but it opens a novel avenue for audio content on Nostr.

In conclusion, the Tone event kind 7043 provides a robust, structured way to represent sounds on the Nostr network. It balances detail (a rich JSON schema for content) with discoverability (concise tags for filtering). By adhering to this spec, implementers ensure that tone definitions can be shared and found across clients, without cluttering the core text note ecosystem. This enhances Nostr as a protocol not just for social text, but for multimedia and creative applications, all while playing nicely with the decentralized, user-centric nature of the network.

References:

  • NIP-01: Basic protocol – defines event format, tag indexing convention
  • NIP-12: Generic Tag Queries – allows querying any single-letter tags; outlines hashtag usage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment