Skip to content

Instantly share code, notes, and snippets.

View harshavardhana's full-sized avatar
🌚
I may be slow to respond.

Harshavardhana harshavardhana

🌚
I may be slow to respond.
View GitHub Profile
@harshavardhana
harshavardhana / README.md
Last active March 8, 2026 06:20
PySpark parquet overwrite pattern β€” tests partition prefix visibility after overwrite on S3/MinIO

PySpark Parquet Overwrite β€” Partition Prefix Visibility Test

Tests that after Spark overwrites partitioned Parquet files on S3/MinIO, the date-level partition prefixes remain visible in delimited ListObjectsV2 so that Spark's partition discovery still works correctly.

What the test does

  1. Generates two sample CSV files (batch1.csv, batch2.csv) with the same schema and same date partitions but different values.
@harshavardhana
harshavardhana / minio-aistor-vs-oss-final.md
Last active January 27, 2026 07:22
MinIO AIStor vs MinIO OSS - Complete Technical Comparison (13,061 commits analyzed)

MinIO AIStor vs MinIO OSS - Complete Technical Comparison (13,061 commits analyzed)

MinIO AIStor vs MinIO OSS - Complete Technical Comparison

Analysis based on full commit history review of 13,061 commits


Table of Contents

@harshavardhana
harshavardhana / minio-aistor-vs-oss-comprehensive.md
Created January 24, 2026 08:30
MinIO AIStor vs MinIO OSS - Comprehensive Technical Comparison (2800+ commits analyzed)

MinIO AIStor vs MinIO OSS - Comprehensive Technical Comparison

Analysis based on full commit history review of 2800+ commits


Table of Contents

  1. Executive Summary
  2. Codebase Statistics
@harshavardhana
harshavardhana / replication-dataflow.md
Last active December 23, 2025 19:58
MinIO AIStor: Synchronous vs Asynchronous Replication Dataflow

MinIO: Synchronous vs Asynchronous Replication

Synchronous Replication

Client waits for replication to complete before receiving response.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚        β”‚  PUT    β”‚      SOURCE CLUSTER         β”‚         β”‚   REMOTE   β”‚
β”‚ Client β”œβ”€β”€β”€β”€β”€β”€β”€β”€β–Ίβ”‚                             β”‚         β”‚   TARGET   β”‚
## template:jinja
{#
This file (/etc/cloud/templates/hosts.debian.tmpl) is only utilized
if enabled in cloud-config. Specifically, in order to enable it
you need to add the following to config:
manage_etc_hosts: True
-#}
# Your system has configured 'manage_etc_hosts' as True.
# As a result, if you wish for changes to this file to persist
# then you will need to either
#!/bin/bash
docker system prune -af --filter "until=8h"

Previous

    Outbound
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚                   β”‚   Parr.     β”‚                β”‚  (http body)  β”‚               β”‚                      β”‚                β”‚
    β”‚ Bitrot Hash       β”‚     Write   β”‚      Pipe      β”‚      Read     β”‚  HTTP buffer  β”‚    Write (syscall)   β”‚  TCP Buffer    β”‚
    β”‚ Erasure Shard     β”‚ ──────────► β”‚  (unbuffered)  β”‚ ────────────► β”‚   (64K Max)   β”‚ ───────────────────► β”‚    (4MB)       β”‚
    β”‚                   β”‚             β”‚                β”‚               β”‚  (io.Copy)    β”‚                      β”‚                β”‚
version: '3.7'
# Settings and configurations that are common for all containers
x-minio-common: &minio-common
image: quay.io/minio/minio:${RELEASE}
command: server http://site1-minio{1...4}/data{1...2}
environment:
- MINIO_PROMETHEUS_AUTH_TYPE=public
- CI=true
"""
FTP benchmark.
Usage:
ftpbench --help
ftpbench -h <host> -u <user> -p <password> [options] login
ftpbench -h <host> -u <user> -p <password> [options] upload <workdir> [-s <size>]
ftpbench -h <host> -u <user> -p <password> [options] download <workdir> [-s <size>] [--files <count>]
Connection options:
#!/bin/bash \
i=1
sudo rm -f /tmp/fstab
for disk in $(lsblk -i -p -n -o NAME | grep -v 'loop\|nvme0\|nvme1\|nvme2\|nvme3\|ubuntu\|md\|sda' | sort); do
sudo mkdir -p /disk${i}
sudo mkfs.xfs -f -L disk${i} $disk
echo "LABEL=disk${i} /disk${i} xfs defaults,noatime 0 0" | sudo tee -a /tmp/fstab
i=$(( $i + 1 ))