Skip to content

Instantly share code, notes, and snippets.

@JJediny
Last active July 22, 2025 13:48
Show Gist options
  • Save JJediny/f3fe0b9b424c37a6833314108d8a68da to your computer and use it in GitHub Desktop.
Save JJediny/f3fe0b9b424c37a6833314108d8a68da to your computer and use it in GitHub Desktop.

Of course! Here is a detailed cheatsheet for the advanced parameters shown in your image. I've explained what each one does, when you should adjust it, and what kind of prompts or desired outputs would benefit most from tuning.


LLM Fine-Tuning Parameters Cheatsheet

This cheatsheet helps you understand and tune advanced parameters to control the output of a Large Language Model (LLM). The key is balancing coherence (makes sense, factual, predictable) with creativity (novel, diverse, unexpected).

Core Concepts: Sampling

Most of these parameters control sampling, which is how the model chooses the next word (token) from a list of possibilities.

  • For Creative & Diverse output: You want the model to consider more, less-likely options.
  • For Factual & Coherent output: You want the model to stick to the most likely, predictable options.

Parameter Breakdown

Parameter What It Does When to Tune It & Recommended Values Best For Prompts/Outputs Like...
Temperature Controls the randomness of the output. Higher values mean more randomness. A low temperature makes the model more confident and deterministic. A high temperature allows it to pick less likely words. Low (0.1 - 0.5): For predictable, stable output.
Medium (0.6 - 0.9): Default, good balance.
High (1.0 - 2.0): For highly creative, diverse, or experimental output.
Low Temp: Code generation, math problems, factual Q&A, summarizing text.
High Temp: Brainstorming ideas, writing poetry or fiction, creating character dialogue.
Top K Restricts the model to choosing its next word from only the K most likely options. Lower K means less choice. Low (1 - 20): Makes the output very safe and predictable.
High (50+): Gives the model more freedom.
Set to 0 to disable. Often used with a low temperature.
Low K: Generating structured text like a list or a JSON object where you don't want strange deviations. Useful to prevent the model from going off-topic.
Top P (Nucleus Sampling) Restricts the model to choosing from a "nucleus" of options that have a cumulative probability of P. It's more dynamic than Top K. A Top P of 0.9 means the model considers the most likely tokens that add up to 90% probability. Low (0.1 - 0.8): Prevents the model from picking very weird or "tail-end" tokens. Promotes coherence.
High (0.9 - 1.0): Allows for more creativity and diversity. 1.0 disables it. Most experts prefer tuning Top P over Top K.
Low P: Writing professional emails, technical documentation, or analysis where factual accuracy and coherence are key.
High P: Creative writing, roleplaying, generating multiple different marketing slogans.
Min P A newer parameter that sets a minimum probability floor for a token to be included in the sampling pool. It helps remove the "long tail" of very unlikely, often nonsensical tokens, improving quality. Low (0.01 - 0.1): A good starting point to cut out gibberish without overly restricting creativity. 0 disables it. Use in combination with Top P. Any prompt where quality is important. It's a subtle but effective way to prevent the model from generating nonsensical words, especially when using high Temperature or Top P.
Mirostat (Alternative Sampler) An advanced sampling method that aims for a target level of "surprise" (perplexity) in the output. It dynamically adjusts the sampling to keep the text engaging but coherent. If you use Mirostat, you should disable Temperature, Top P, and Top K. Enable (Mode 1 or 2): When you want long, coherent, and non-repetitive text.
Mirostat Tau (target surprise, ~3-6): Higher tau = more creative/surprising.
Mirostat Eta (learning rate, ~0.1): How quickly it adapts. Default is usually fine.
Long-form storytelling and creative writing. It excels at helping the model stay "in character" or on-topic for thousands of words without becoming boring or repetitive. Great for roleplaying.
Frequency Penalty Applies a penalty to words based on how many times they have already appeared in the text. Higher values discourage repetition. Low (0.1 - 0.5): Gently discourages repeating the same words.
High (1.0 - 2.0): Strongly penalizes repetition. Can make text feel a bit forced if set too high.
Almost all long-form generation. Essential for writing articles, stories, or summaries to ensure varied vocabulary. Less needed for short Q&A or code.
Repeat Last N (or Repetition Penalty) Applies a penalty specifically to repeating a sequence of the last N tokens. It's a direct tool to stop the model from getting stuck in a loop (e.g., "I am a robot. I am a robot. I am a robot."). N (64 - 512): The size of the history window to check for repeats. 64 is a common default. The penalty value itself is often a separate setting (not shown here, but often ~1.1). When the model gets stuck in short-term loops. Very useful for chat and roleplaying where the conversational flow can easily become repetitive.
Stop Sequence A specific string of text that will immediately stop the generation. The model will not output the stop sequence itself. Define one or more sequences (e.g., \n\n, ###, User:). Generating structured data: Stop after a specific field.
Character Roleplay: Stop when the model tries to generate the user's line (e.g., stop at User:).
Code Generation: Stop at the end of a function block }.
Seed The starting number for the random number generator. Setting a specific seed ensures that you get the exact same output for the same prompt and parameters. -1 or blank: Random seed for a different result each time.
Any specific integer (e.g., 42, 1337): For reproducible results.
Reproducibility. If you found a perfect output and want to be able to generate it again, or if you are testing the effect of a single parameter change.
Max Tokens (num_predict) The maximum number of tokens (words/sub-words) the model is allowed to generate in a single response. Low (50 - 200): For short, concise answers.
High (1000 - 4000+): For long-form content like an article or story chapter.
Set it based on your expected output length. If you ask for a "one-sentence summary," set it low. If you ask for a "blog post," set it high to avoid the response being cut off.
Context Length The total number of tokens the model can "remember" from the current conversation/document. Lower values save memory/VRAM.
Higher values give the model more context to work with, leading to better continuity in long conversations. Max value depends on the model (e.g., 2048, 4096, 32k).
Long conversations or document analysis. If you are asking follow-up questions about a long text you pasted, you need a high context length.

Practical Recipes: Quick Presets

Use Case Temperature Top P Frequency Penalty Notes
Creative Writing / Brainstorming 1.0 - 1.3 0.9 0.5 High creativity, but the penalty prevents it from getting too repetitive. Could also use Mirostat instead of Temp/Top P.
Factual Q&A / Code Generation 0.2 - 0.4 0.95 0.2 Low temperature for deterministic, factual output. Top P is kept high so it doesn't get "stuck" if the answer is complex.
Balanced Chat / General Use 0.7 - 0.8 0.9 0.2 The standard "best of both worlds" setting. Coherent but not robotic.
Character Roleplaying 0.85 1.0 0.4 A slightly higher temperature for character flavor. Use a Stop Sequence like "User:" to prevent the AI from speaking for you.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment