Skip to content

Instantly share code, notes, and snippets.

@cprima
Created January 2, 2025 11:50
Show Gist options
  • Save cprima/2ab05641ff99437bce37323f1e186448 to your computer and use it in GitHub Desktop.
Save cprima/2ab05641ff99437bce37323f1e186448 to your computer and use it in GitHub Desktop.
Sort CSV Places Along a GPX Route to Generate a Sorted Roadbook(Python Script)

Sort CSV Places Along a GPX Route to Generate a Sorted Roadbook (Python Script)

This Python script sorts unsorted geographical data (from a CSV file) based on its proximity to a predefined GPX track. It generates a sorted "roadbook" by ordering places in the sequence they are encountered along the route. Ideal for cyclists, hikers, and geospatial analysts working with Overpass API results or similar unsorted data sources.


Features

  • Align Places to GPX Track: Matches CSV places (e.g., towns, landmarks) to the nearest points on a GPX track.
  • Unsorted Input Handling: Works with raw, unsorted input CSV data.
  • Smart Sorting: Orders places by their position along the GPX track.
  • Logging and Progress Bar: Provides detailed progress updates and logs for a user-friendly experience.
  • GPX Simplification: Optimizes GPX points for faster processing.

Prerequisites

  • Python 3.7 or higher
  • Required Python libraries:
    • gpxpy
    • pandas
    • geopy
    • rdp
    • tqdm

Install the dependencies using:

pip install gpxpy pandas geopy rdp tqdm

Usage

1. Input Files

  • A GPX file defining the route (example_route.gpx).
  • A CSV file containing places (places.csv) with the following required columns:
    • Y: Latitude of the place.
    • X: Longitude of the place.

2. Run the Script

Execute the script with your file paths:

# Example paths
gpx_file = r'example_route.gpx'
csv_file = r'places.csv'
output_file = r'sorted_places.csv'

# Process the CSV and generate the sorted roadbook
process_csv_with_smart_sorting(csv_file, gpx_points, output_file)

3. Output

The output is a CSV file (sorted_places.csv) with places sorted in the order they occur along the GPX track.


Input Example

Unsorted Input CSV (places.csv)

Name Y (Latitude) X (Longitude)
Gunnison 38.5458 -106.9287
San Francisco 37.7749 -122.4194
Blanding 37.6240 -109.4780

Output Example

Sorted Output CSV (sorted_places.csv)

Name Y (Latitude) X (Longitude) GPX Index
San Francisco 37.7749 -122.4194 1
Blanding 37.6240 -109.4780 2
Gunnison 38.5458 -106.9287 3

How It Works

  1. GPX Simplification: Reduces the number of GPX points using the Ramer-Douglas-Peucker (RDP) algorithm for better performance.
  2. Nearest GPX Point: Finds the closest GPX point for each place in the CSV file.
  3. Sorting by Index: Orders the places based on their position along the GPX track.
  4. Output Roadbook: Outputs a CSV file with the sorted order.

Logging and Progress Updates

The script provides:

  • Logs: Detailed messages for each processing step (e.g., file loading, GPX simplification, sorting).
  • Progress Bar: A real-time progress bar for processing places.

Example log output:

2025-01-02 12:00:00 - INFO - Loading GPX file: example_route.gpx
2025-01-02 12:00:01 - INFO - GPX file loaded with 10,000 points.
2025-01-02 12:00:01 - INFO - Starting GPX simplification with epsilon=0.001.
2025-01-02 12:00:02 - INFO - GPX simplification complete: 500 points (from 10,000).
2025-01-02 12:00:02 - INFO - Finding nearest GPX point for each place...
Processing places: 100%|████████████████████| 100/100 [00:02<00:00, 50.00it/s]
2025-01-02 12:00:24 - INFO - Roadbook saved successfully to sorted_places.csv.

License

This script is licensed under the Creative Commons Attribution (CC-BY) license. You are free to use, share, and adapt it, provided that appropriate credit is given.


Author

Created by Christian Prior-Mamulyan.
For questions or feedback, email me at [email protected].

import gpxpy
import pandas as pd
from geopy.distance import geodesic
from rdp import rdp
from tqdm import tqdm
import logging
# Configure logging to display progress and results
logging.basicConfig(
format='%(asctime)s - %(levelname)s - %(message)s',
level=logging.INFO
)
def simplify_gpx(gpx_points, epsilon=0.001):
"""
Simplify GPX track points using the Ramer-Douglas-Peucker algorithm.
Args:
gpx_points (list): List of (latitude, longitude) tuples from the GPX track.
epsilon (float): Tolerance for simplification; lower values retain more detail.
Returns:
list: Simplified list of (latitude, longitude) tuples.
"""
logging.info(f"Starting GPX simplification with epsilon={epsilon}.")
simplified_points = rdp(gpx_points, epsilon=epsilon)
logging.info(f"GPX simplification complete: {len(simplified_points)} points (from {len(gpx_points)}).")
return simplified_points
def find_nearest_gpx_index(lat, lon, gpx_points):
"""
Find the index of the nearest GPX point for a given location.
Args:
lat (float): Latitude of the location.
lon (float): Longitude of the location.
gpx_points (list): List of (latitude, longitude) tuples from the GPX track.
Returns:
int: Index of the nearest GPX point.
"""
distances = [geodesic((lat, lon), gpx_point).meters for gpx_point in gpx_points]
return distances.index(min(distances))
def process_csv_with_smart_sorting(csv_file, gpx_points, output_file):
"""
Process a CSV file of places and sort them along the GPX track.
Args:
csv_file (str): Path to the input CSV file containing places.
gpx_points (list): List of (latitude, longitude) tuples from the GPX track.
output_file (str): Path to save the sorted output CSV file.
"""
logging.info(f"Reading CSV file: {csv_file}")
df = pd.read_csv(csv_file)
# Ensure the required columns are present
if 'Y' not in df.columns or 'X' not in df.columns:
logging.error("CSV file must contain 'Y' (latitude) and 'X' (longitude) columns.")
raise ValueError("Missing required columns in CSV file.")
logging.info(f"CSV file loaded with {len(df)} rows.")
# Simplify GPX points for faster processing
gpx_points = simplify_gpx(gpx_points)
# Find the nearest GPX point for each place in the CSV
gpx_indices = []
logging.info("Finding nearest GPX point for each place...")
for _, row in tqdm(df.iterrows(), total=len(df), desc="Processing places"):
lat, lon = row['Y'], row['X']
gpx_index = find_nearest_gpx_index(lat, lon, gpx_points)
gpx_indices.append(gpx_index)
logging.info("Nearest GPX point calculation complete.")
# Add the GPX indices to the DataFrame
df['gpx_index'] = gpx_indices
# Sort the DataFrame by the GPX index
logging.info("Sorting places based on GPX track order...")
df = df.sort_values(by='gpx_index').reset_index(drop=True)
# Save the sorted DataFrame
logging.info(f"Saving sorted roadbook to: {output_file}")
df.to_csv(output_file, index=False)
logging.info("Roadbook saved successfully.")
# Example Usage
# File paths for input GPX and CSV, and output CSV
gpx_file = 'example_route.gpx'
csv_file = 'places.csv'
output_file = 'sorted_places.csv'
# Load GPX points from the input GPX file
logging.info(f"Loading GPX file: {gpx_file}")
with open(gpx_file, 'r') as f:
gpx = gpxpy.parse(f)
gpx_points = [(point.latitude, point.longitude) for track in gpx.tracks for segment in track.segments for point in segment.points]
logging.info(f"GPX file loaded with {len(gpx_points)} points.")
# Process the CSV file and generate the sorted roadbook
process_csv_with_smart_sorting(csv_file, gpx_points, output_file)
gpxpy==1.5.0
pandas==1.5.3
geopy==2.3.0
rdp==0.8
tqdm==4.64.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment