Skip to content

Instantly share code, notes, and snippets.

@shawngmc
Last active September 5, 2025 22:22
Show Gist options
  • Save shawngmc/bd825f0f2d56b941cf64ad49409e50dd to your computer and use it in GitHub Desktop.
Save shawngmc/bd825f0f2d56b941cf64ad49409e50dd to your computer and use it in GitHub Desktop.
Optimizing C/C++ compilation

I'm getting interested in optimized compiling.

C/C++

  • Flags
    • Gentoo's Safe CFLAGS might be a great starting point!
    • march:
      • ZNVER4/x86_64v4 looks interesting.
      • ZNVER4 will cause tuning (-mtune), x86_64v4 will not.
      • Gentoo's Safe CFLAGS might be a great starting point
      • GCC can try to detect via -march=native and checked via gcc -v -E -x c /dev/null -o /dev/null -march=native 2>&1 | grep /cc1 | grep mtune, but this sometimes downgrades - sometimes all the way to generic!
      • GLibc /lib/ld-linux-x86-64.so.2 --help doesn't map well.
    • -O3 vs -O2
      • On unoptimized code, -O3 can help a lot. Otherwise, largely the same.
      • O3 can cause issues when used systemwide, largely because of bad code that depends on undefined behaviors being consistent
    • -flto: 2-3x compile time, sometimes a minor regression, sometimes a double-digit improvement
    • -pipe: uses more memory, but speed up compilation (not binary execution)
    • Cachy OS has other optimizations (LTO, etc.): https://wiki.cachyos.org/features/kernel
  • Compilers
    • GCC, Clang and LLVM-GCC all trade blows.

Rust

  • Flags
    • The equivalent of CFLAG -O3 is default
    • target-cpu
      • Auto-detection is via target-cpu=native, and can be checked with rustc -C target-cpu=help

Sources:

# Get flags
readarray -td ' ' CPU_FLAG_ARRAY < <(lscpu -J | jq -r '.lscpu[] | select(.field == "Flags:") | .data' | tr -d '\n')
# x86_64_v1 - lm cmov cx8 fpu fxsr mmx syscall sse2
cpu=$([ $(jq -c -n '$ARGS.positional | contains(["lm", "cmov", "cx8", "fpu", "fxsr", "mmx", "syscall", "sse2"])' --args "${CPU_FLAG_ARRAY[@]}") ] && echo "x86_64v1" || echo "x86_64v0")
# x86_64_v2 - cx16 lahf_lm popcnt sse4_1 sse4_2 ssse3
[[ "$cpu" == "x86_64v1" ]] && cpu=$([ $(jq -c -n '$ARGS.positional | contains(["cx16", "lahf_lm", "popcnt", "sse4_1", "sse4_2", "ssse3"])' --args "${CPU_FLAG_ARRAY[@]}") ] && echo "x86_64v2" || echo "x86_64v1")
# x86_64_v3 - avx avx2 bmi1 bmi2 f16c fma abm movbe xsave
[[ "$cpu" == "x86_64v2" ]] && cpu=$([ $(jq -c -n '$ARGS.positional | contains(["avx", "avx2", "bmi1", "bmi2", "f16c", "fma", "abm", "movbe", "xsave"])' --args "${CPU_FLAG_ARRAY[@]}") ] && echo "x86_64v3" || echo "x86_64v2")
# x86_64_v4 - avx512f avx512bw avx512cd avx512dq avx512vl
[[ "$cpu" == "x86_64v3" ]] && cpu=$([ $(jq -c -n '$ARGS.positional | contains(["avx512f", "avx512bw", "avx512cd", "avx512dq", "avx512vl"])' --args "${CPU_FLAG_ARRAY[@]}") ] && echo "x86_64v4" || echo "x86_64v3")
# Strict superset of x86_64v4 - https://github.com/CachyOS/CachyOS-PKGBUILDS/issues/359
# zNVer4 - sse3 sse4a aes pclmul prfchw fxsr xsaveopt fsgsbase rdrnd mwaitx adx rdseed clzeo clfulshopt xsavec xsaves sha lzcnt clwb rdpid wbnoinvd vaes vpclmulqdq pku znver3 avx512dq avx512fma avx512bf16 avx512vbmi avx512vbmi2 gfni avx512vnni avx512bitalg avx512vpopcntdq evex512
[[ "$cpu" == "x86_64v4" ]] && cpu=$([ $(jq -c -n '$ARGS.positional | contains(["sse3", "sse4a", "aes", "pclmul", "prfchw", "fxsr", "xsaveopt", "fsgsbase", "rdrnd", "mwaitx", "adx", "rdseed", "clzeo", "clfulshopt", "xsavec", "xsaves", "sha", "lzcnt", "clwb", "rdpid", "wbnoinvd", "vaes", "vpclmulqdq", "pku", "znver3", "avx512dq", "avx512fma", "avx512bf16", "avx512vbmi", "avx512vbmi2", "gfni", "avx512vnni", "avx512bitalg", "avx512vpopcntdq", "evex512"])' --args "${CPU_FLAG_ARRAY[@]}") ] && echo "znver4" || echo "x86_64v4")
# Strict superset of znver4 - https://www.phoronix.com/news/AMD-Zen-5-Znver-5-GCC
# znver5 - avxvnni movdiri movdir64b avx512vp2instersect prefretch
[[ "$cpu" == "znver4" ]] && cpu=$([ $(jq -c -n '$ARGS.positional | contains(["avxvnni", "movdiri", "movdir64b", "avx512vp2instersect", "prefretch"])' --args "${CPU_FLAG_ARRAY[@]}") ] && echo "znver5" || echo "znver4")
echo "CPU Arch: ${cpu}"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment