Created
December 1, 2020 21:58
-
-
Save vaclavdekanovsky/821eea8585c6c231983a8fcf6bada59f to your computer and use it in GitHub Desktop.
Calculating distance between two places on the surface of 3D sphere or ellipsoid object (like Earth).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Distance between two points on the surface \n", | |
"Two places located on the surface of the 3D object can be calculated using mathematical equation. For sphere-like objects like Earth these calculation are prepared in the [geopy](https://geopy.readthedocs.io/en/stable/#) library. You can calculate distance on the [great-circle](https://en.wikipedia.org/wiki/Great-circle_distance) or [geodesics](https://en.wikipedia.org/wiki/Geodesics_on_an_ellipsoid) on the ellipsoid. \n", | |
"\n", | |
"To install geopy use `pip install geopy` or in the anaconda `conda install -c conda-forge geopy`" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import pandas as pd # to load cities dataset\n", | |
"from geopy import distance # to calculate distance on the surface" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We'll load the list of capital cities with `geolocations` - `latitude` and `longitude`. It's available on [kaggle world capitals](https://www.kaggle.com/nikitagrec/world-capitals-gps) under public license." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>Country</th>\n", | |
" <th>capital</th>\n", | |
" <th>lat</th>\n", | |
" <th>lon</th>\n", | |
" <th>code</th>\n", | |
" <th>continent</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>221</th>\n", | |
" <td>Turks and Caicos Islands</td>\n", | |
" <td>Grand Turk</td>\n", | |
" <td>21.466667</td>\n", | |
" <td>-71.133333</td>\n", | |
" <td>TC</td>\n", | |
" <td>North America</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>127</th>\n", | |
" <td>Liberia</td>\n", | |
" <td>Monrovia</td>\n", | |
" <td>6.300000</td>\n", | |
" <td>-10.800000</td>\n", | |
" <td>LR</td>\n", | |
" <td>Africa</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>95</th>\n", | |
" <td>Guinea</td>\n", | |
" <td>Conakry</td>\n", | |
" <td>9.500000</td>\n", | |
" <td>-13.700000</td>\n", | |
" <td>GN</td>\n", | |
" <td>Africa</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" Country capital lat lon code \\\n", | |
"221 Turks and Caicos Islands Grand Turk 21.466667 -71.133333 TC \n", | |
"127 Liberia Monrovia 6.300000 -10.800000 LR \n", | |
"95 Guinea Conakry 9.500000 -13.700000 GN \n", | |
"\n", | |
" continent \n", | |
"221 North America \n", | |
"127 Africa \n", | |
"95 Africa " | |
] | |
}, | |
"execution_count": 2, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# load the dataframe with capitals\n", | |
"df = pd.read_csv(\"concap.csv\")\n", | |
"\n", | |
"# rename so that the column names are shorter and comply with PEP-8\n", | |
"df.rename(columns={\"CountryName\": \"Country\", \"CapitalName\": \"capital\", \"CapitalLatitude\": \"lat\", \"CapitalLongitude\": \"lon\", \"CountryCode\": \"code\", \"ContinentName\": \"continent\"}, inplace=True)\n", | |
"df.sample(3)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's get only two cities - Rome and Paris. So that we can find out their distance." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>index</th>\n", | |
" <th>Country</th>\n", | |
" <th>capital</th>\n", | |
" <th>lat</th>\n", | |
" <th>lon</th>\n", | |
" <th>code</th>\n", | |
" <th>continent</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>81</td>\n", | |
" <td>France</td>\n", | |
" <td>Paris</td>\n", | |
" <td>48.866667</td>\n", | |
" <td>2.333333</td>\n", | |
" <td>FR</td>\n", | |
" <td>Europe</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>110</td>\n", | |
" <td>Italy</td>\n", | |
" <td>Rome</td>\n", | |
" <td>41.900000</td>\n", | |
" <td>12.483333</td>\n", | |
" <td>IT</td>\n", | |
" <td>Europe</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" index Country capital lat lon code continent\n", | |
"0 81 France Paris 48.866667 2.333333 FR Europe\n", | |
"1 110 Italy Rome 41.900000 12.483333 IT Europe" | |
] | |
}, | |
"execution_count": 3, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# to start with let's filter only 2 capitals. Rome and Paris.\n", | |
"ropa = df[df[\"capital\"].isin([\"Rome\",\"Paris\"])].reset_index()\n", | |
"ropa" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Calculating the distnace\n", | |
"The first obvious method is to use the shortest distnace on the surface of Earth. You can use various approximations:\n", | |
"\n", | |
"* [Great-circle](https://en.wikipedia.org/wiki/Great-circle_distance) distnace on the surface of sphere - \n", | |
"* Distances from geodesics since Earth is approximated as [oblate ellipsoid](https://en.wikipedia.org/wiki/Geodesics_on_an_ellipsoid) \n", | |
"* Haversine formula - https://en.wikipedia.org/wiki/Haversine_formula, https://towardsdatascience.com/calculating-distance-between-two-geolocations-in-python-26ad3afe287b\n", | |
"\n", | |
"You don't have to invent or even reproduce this math. The geopy.distance module already implemented all of these distnance calculation, it returns the values in kilometers (km), miles (mi), nautical miles (nm) or feet (ft). All these methods are part of `distance` class we have already imported from geopy.\n", | |
"\n", | |
"* `distance((latitude_point_1, longitude_point_1), (lat_2, lon_2))` - using geodesic on `WGS-84` ellipsoid\n", | |
"* `geodesic((latitude_point_1, longitude_point_1), (lat_2, lon_2))`\n", | |
"* `great_circle((latitude_point_1, longitude_point_1), (lat_2, lon_2))`\n", | |
"\n", | |
"More info about geopy.distance https://geopy.readthedocs.io/en/stable/#module-geopy.distance" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"(Distance(1107.8818760940028), 1107.8818760940028, 688.4058822066647)" | |
] | |
}, | |
"execution_count": 4, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"d = distance.distance((ropa.loc[0, \"lat\"], ropa.loc[0, \"lon\"]), (ropa.loc[1, \"lat\"], ropa.loc[1, \"lon\"]))\n", | |
"d, d.km, d.miles" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can see that getting the distance in `km` or `miles` is simple. Let's try to call all the three methods `distance`, `geodesic` and `great_circle`" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th>measurement</th>\n", | |
" <th>feet</th>\n", | |
" <th>ft</th>\n", | |
" <th>kilometers</th>\n", | |
" <th>km</th>\n", | |
" <th>mi</th>\n", | |
" <th>miles</th>\n", | |
" <th>nautical</th>\n", | |
" <th>nm</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>method</th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>geodesic</th>\n", | |
" <td>3.634783e+06</td>\n", | |
" <td>3.634783e+06</td>\n", | |
" <td>1107.881876</td>\n", | |
" <td>1107.881876</td>\n", | |
" <td>688.405882</td>\n", | |
" <td>688.405882</td>\n", | |
" <td>598.208356</td>\n", | |
" <td>598.208356</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>great_circle</th>\n", | |
" <td>3.630457e+06</td>\n", | |
" <td>3.630457e+06</td>\n", | |
" <td>1106.563205</td>\n", | |
" <td>1106.563205</td>\n", | |
" <td>687.586498</td>\n", | |
" <td>687.586498</td>\n", | |
" <td>597.496331</td>\n", | |
" <td>597.496331</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
"measurement feet ft kilometers km \\\n", | |
"method \n", | |
"geodesic 3.634783e+06 3.634783e+06 1107.881876 1107.881876 \n", | |
"great_circle 3.630457e+06 3.630457e+06 1106.563205 1106.563205 \n", | |
"\n", | |
"measurement mi miles nautical nm \n", | |
"method \n", | |
"geodesic 688.405882 688.405882 598.208356 598.208356 \n", | |
"great_circle 687.586498 687.586498 597.496331 597.496331 " | |
] | |
}, | |
"execution_count": 5, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"results = []\n", | |
"for f in [distance.distance, distance.great_circle, distance.geodesic]:\n", | |
" for mes in [\"kilometers\",\"km\",\"miles\",\"mi\",\"nautical\",\"nm\",\"feet\",\"ft\"]:\n", | |
" d = f((ropa.loc[0, \"lat\"], ropa.loc[0, \"lon\"]), (ropa.loc[1, \"lat\"], ropa.loc[1, \"lon\"]))\n", | |
" results.append({\"method\": f.__name__, \"measurement\": mes, \"value\": getattr(d, mes)})\n", | |
"\n", | |
"# show as dataframe\n", | |
"results_df = pd.DataFrame(results)\n", | |
"results_df.pivot_table(index=\"method\", columns=\"measurement\", values=\"value\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"`distance.distance` nativelly calls `distance.geodesic` that's why these two calues collapse into one row in the results. We can also check various Earth approximations - more in [geopy documentation](https://geopy.readthedocs.io/en/stable/#module-geopy.distance)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th>measurement</th>\n", | |
" <th>feet</th>\n", | |
" <th>ft</th>\n", | |
" <th>kilometers</th>\n", | |
" <th>km</th>\n", | |
" <th>mi</th>\n", | |
" <th>miles</th>\n", | |
" <th>nautical</th>\n", | |
" <th>nm</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>method</th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>geodesic</th>\n", | |
" <td>3.634783e+06</td>\n", | |
" <td>3.634783e+06</td>\n", | |
" <td>1107.881876</td>\n", | |
" <td>1107.881876</td>\n", | |
" <td>688.405882</td>\n", | |
" <td>688.405882</td>\n", | |
" <td>598.208356</td>\n", | |
" <td>598.208356</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>geodesic: Airy (1830)</th>\n", | |
" <td>3.634455e+06</td>\n", | |
" <td>3.634455e+06</td>\n", | |
" <td>1107.781964</td>\n", | |
" <td>1107.781964</td>\n", | |
" <td>688.343800</td>\n", | |
" <td>688.343800</td>\n", | |
" <td>598.154408</td>\n", | |
" <td>598.154408</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>geodesic: Clarke (1880)</th>\n", | |
" <td>3.634851e+06</td>\n", | |
" <td>3.634851e+06</td>\n", | |
" <td>1107.902624</td>\n", | |
" <td>1107.902624</td>\n", | |
" <td>688.418774</td>\n", | |
" <td>688.418774</td>\n", | |
" <td>598.219559</td>\n", | |
" <td>598.219559</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>geodesic: GRS-67</th>\n", | |
" <td>3.634796e+06</td>\n", | |
" <td>3.634796e+06</td>\n", | |
" <td>1107.885873</td>\n", | |
" <td>1107.885873</td>\n", | |
" <td>688.408366</td>\n", | |
" <td>688.408366</td>\n", | |
" <td>598.210515</td>\n", | |
" <td>598.210515</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>geodesic: GRS-80</th>\n", | |
" <td>3.634783e+06</td>\n", | |
" <td>3.634783e+06</td>\n", | |
" <td>1107.881876</td>\n", | |
" <td>1107.881876</td>\n", | |
" <td>688.405882</td>\n", | |
" <td>688.405882</td>\n", | |
" <td>598.208356</td>\n", | |
" <td>598.208356</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>geodesic: Intl 1924</th>\n", | |
" <td>3.634927e+06</td>\n", | |
" <td>3.634927e+06</td>\n", | |
" <td>1107.925804</td>\n", | |
" <td>1107.925804</td>\n", | |
" <td>688.433178</td>\n", | |
" <td>688.433178</td>\n", | |
" <td>598.232075</td>\n", | |
" <td>598.232075</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>geodesic: WGS-84</th>\n", | |
" <td>3.634783e+06</td>\n", | |
" <td>3.634783e+06</td>\n", | |
" <td>1107.881876</td>\n", | |
" <td>1107.881876</td>\n", | |
" <td>688.405882</td>\n", | |
" <td>688.405882</td>\n", | |
" <td>598.208356</td>\n", | |
" <td>598.208356</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>great_circle</th>\n", | |
" <td>3.630457e+06</td>\n", | |
" <td>3.630457e+06</td>\n", | |
" <td>1106.563205</td>\n", | |
" <td>1106.563205</td>\n", | |
" <td>687.586498</td>\n", | |
" <td>687.586498</td>\n", | |
" <td>597.496331</td>\n", | |
" <td>597.496331</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
"measurement feet ft kilometers km \\\n", | |
"method \n", | |
"geodesic 3.634783e+06 3.634783e+06 1107.881876 1107.881876 \n", | |
"geodesic: Airy (1830) 3.634455e+06 3.634455e+06 1107.781964 1107.781964 \n", | |
"geodesic: Clarke (1880) 3.634851e+06 3.634851e+06 1107.902624 1107.902624 \n", | |
"geodesic: GRS-67 3.634796e+06 3.634796e+06 1107.885873 1107.885873 \n", | |
"geodesic: GRS-80 3.634783e+06 3.634783e+06 1107.881876 1107.881876 \n", | |
"geodesic: Intl 1924 3.634927e+06 3.634927e+06 1107.925804 1107.925804 \n", | |
"geodesic: WGS-84 3.634783e+06 3.634783e+06 1107.881876 1107.881876 \n", | |
"great_circle 3.630457e+06 3.630457e+06 1106.563205 1106.563205 \n", | |
"\n", | |
"measurement mi miles nautical nm \n", | |
"method \n", | |
"geodesic 688.405882 688.405882 598.208356 598.208356 \n", | |
"geodesic: Airy (1830) 688.343800 688.343800 598.154408 598.154408 \n", | |
"geodesic: Clarke (1880) 688.418774 688.418774 598.219559 598.219559 \n", | |
"geodesic: GRS-67 688.408366 688.408366 598.210515 598.210515 \n", | |
"geodesic: GRS-80 688.405882 688.405882 598.208356 598.208356 \n", | |
"geodesic: Intl 1924 688.433178 688.433178 598.232075 598.232075 \n", | |
"geodesic: WGS-84 688.405882 688.405882 598.208356 598.208356 \n", | |
"great_circle 687.586498 687.586498 597.496331 597.496331 " | |
] | |
}, | |
"execution_count": 6, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# the distnace for various ellipsiods\n", | |
"for ellipsoid in distance.ELLIPSOIDS:\n", | |
" for mes in [\"kilometers\",\"km\",\"miles\",\"mi\",\"nautical\",\"nm\",\"feet\",\"ft\"]:\n", | |
" d = distance.geodesic((ropa.loc[0, \"lat\"], ropa.loc[0, \"lon\"]), \n", | |
" (ropa.loc[1, \"lat\"], ropa.loc[1, \"lon\"]), ellipsoid=ellipsoid)\n", | |
" results.append({\"method\": f\"geodesic: {ellipsoid}\", \"measurement\": mes, \"value\": getattr(d, mes)})\n", | |
"\n", | |
"# show as dataframe\n", | |
"results_df = pd.DataFrame(results)\n", | |
"results_df.pivot_table(index=\"method\", columns=\"measurement\", values=\"value\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"In conclusion the shortest distance between Paris and Rome is around 1007 km or 688 miles. " | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "mapsenv", | |
"language": "python", | |
"name": "mapsenv" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.8.5" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 4 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment