Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save jpalala/3cd1076a2879a6ed9851e0e3956c330b to your computer and use it in GitHub Desktop.

Select an option

Save jpalala/3cd1076a2879a6ed9851e0e3956c330b to your computer and use it in GitHub Desktop.
Limitations of using LLMs to code

Large language models (LLMs) are very good at generating code that matches patterns they’ve seen during training. That strength also creates several important limitations.

1. Pattern matching ≠ true software understanding

LLMs generate code by predicting likely token sequences, not by reasoning about program correctness the way a compiler or experienced engineer would.

This means they can:

  • Produce code that looks correct but fails edge cases
  • Misunderstand implicit requirements
  • Combine incompatible APIs or frameworks
  • Generate logically inconsistent implementations

For example:

  • Correct syntax but wrong algorithm
  • Correct algorithm but broken concurrency handling
  • Correct output for examples but failure under production conditions

2. Limited novelty beyond training patterns

LLMs work best when:

  • The task resembles common open-source examples
  • The framework/library is well represented online
  • The architecture follows familiar conventions

They struggle more with:

  • Novel algorithms
  • Proprietary systems
  • Unusual architectures
  • Emerging libraries with little training data
  • Deep domain-specific business logic

If a problem has few examples online, model quality usually drops sharply.


3. Weak long-range consistency

Large codebases require:

  • Architectural coherence
  • Shared abstractions
  • Stable interfaces
  • Dependency management
  • Multi-file reasoning

LLMs often lose consistency across:

  • Many files
  • Long conversations
  • Large refactors
  • Evolving requirements

Typical failures:

  • Renaming functions inconsistently
  • Breaking hidden dependencies
  • Reintroducing removed logic
  • Diverging coding styles

4. Hallucinated APIs and libraries

A common limitation is inventing:

  • Functions that do not exist
  • Incorrect method signatures
  • Fake configuration options
  • Nonexistent packages

This happens because the model predicts “plausible-looking” code patterns rather than verifying against live documentation.

Example:

db.connect_async(timeout=30)

The syntax may look realistic even if the actual SDK has no such method.


5. Poor guarantees of correctness

Generated code is probabilistic, not verified.

LLMs generally do not guarantee:

  • Formal correctness
  • Memory safety
  • Race-condition safety
  • Security robustness
  • Performance constraints

Especially risky areas:

  • Cryptography
  • Authentication
  • Financial systems
  • Distributed systems
  • Embedded/real-time systems
  • Infrastructure automation

Human review and testing remain essential.


6. Difficulty with hidden context

Real-world software development depends heavily on:

  • Team conventions
  • Business rules
  • Legacy system behavior
  • Operational constraints
  • Organizational priorities

LLMs only know what appears in the prompt/context window.

Missing context often causes:

  • Wrong assumptions
  • Overengineering
  • Underengineering
  • Incompatible design choices

7. Limited debugging depth

LLMs can help debug common issues, but they often:

  • Misdiagnose root causes
  • Suggest generic fixes
  • Chase symptoms instead of system behavior
  • Fail on nondeterministic bugs

Hard problems include:

  • Timing bugs
  • Distributed tracing issues
  • Production-only failures
  • Resource exhaustion
  • Kernel/runtime interactions

8. Security weaknesses

Generated code may introduce:

  • SQL injection vulnerabilities
  • Unsafe deserialization
  • XSS vulnerabilities
  • Hardcoded secrets
  • Broken auth flows
  • Insecure defaults

Because insecure examples exist in training data, the model may reproduce them unless explicitly guided.


9. Context window limits

Even advanced LLMs cannot fully “hold” massive systems in working memory.

This limits:

  • Whole-repo reasoning
  • Large-scale migrations
  • Deep dependency analysis
  • Cross-service architecture understanding

Tooling like retrieval systems, agents, and repository indexing helps, but doesn’t completely solve this.


10. No real execution awareness by default

Without external tools, LLMs do not:

  • Run the code
  • Observe runtime behavior
  • Profile memory
  • Execute tests
  • Verify outputs

So they may confidently produce code that:

  • Does not compile
  • Fails tests
  • Deadlocks
  • Leaks memory
  • Performs poorly

Agentic systems with execution environments reduce this problem significantly.


11. Overfitting to “average” solutions

LLMs tend toward statistically common implementations.

That can lead to:

  • Boilerplate-heavy designs
  • Mediocre abstractions
  • Generic architectures
  • Conventional but suboptimal solutions

Exceptional engineering often requires:

  • Tradeoff analysis
  • Domain insight
  • Performance intuition
  • Creative simplification

These are still areas where strong human engineers outperform models.


12. Maintenance and evolution problems

Generated code may work initially but age poorly because:

  • It lacks clear rationale
  • Architectural decisions are inconsistent
  • Hidden assumptions are undocumented
  • Refactoring discipline is weak

This can create long-term technical debt if teams accept generated code without strong review standards.


Where LLMs work best

LLMs are strongest at:

  • Boilerplate generation
  • CRUD applications
  • Test scaffolding
  • Documentation
  • Refactoring assistance
  • API integrations
  • Repetitive transformations
  • Learning unfamiliar frameworks
  • Generating examples

They are less reliable for:

  • Mission-critical infrastructure
  • Safety-critical systems
  • Novel algorithmic work
  • Deep systems engineering
  • Security-sensitive code without expert review

The practical reality

The current best workflow is usually:

  1. Human defines architecture and constraints
  2. LLM accelerates implementation
  3. Automated tests validate behavior
  4. Human engineers review critical paths
  5. Tooling verifies correctness/security/performance

So the limitation is not simply “LLMs can’t code.” It’s that they generate code from statistical patterns rather than grounded semantic understanding of the entire software system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment