Last active
February 1, 2024 17:24
-
-
Save mmore500/a2e88e7c239935c362ec59c6b5a3f7b5 to your computer and use it in GitHub Desktop.
reconstruction-quality-experiment.ipynb
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"nbformat": 4, | |
"nbformat_minor": 0, | |
"metadata": { | |
"colab": { | |
"provenance": [], | |
"include_colab_link": true | |
}, | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python 3" | |
}, | |
"language_info": { | |
"name": "python" | |
} | |
}, | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "view-in-github", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"<a href=\"https://colab.research.google.com/gist/mmore500/a2e88e7c239935c362ec59c6b5a3f7b5/reconstruction-quality-experiment.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"## Procedure:\n", | |
"\n", | |
"For each experimental replicate per treatment,\n", | |
"- Navigate to <https://colab.research.google.com/gist/mmore500/a2e88e7c239935c362ec59c6b5a3f7b5> to open a fresh copy of the experiment notebook. **Open a fresh notebook copy for each treatment.**\n", | |
"- Click on filename on the top left of the Colab page(`a2e88e7c239935c362ec59c6b5a3f7b5`) and rename according to template\n", | |
" - `evo=island{num_islands}-niche{num_niches}-ngen{num_generations}-popsize{population_size}-tournsize{tournament_size}+instrument={\"steady\"|\"tilted\"}-{\"old\"|\"new\"}-bits{annotation_size_bits}-diff{differentia_width}+replicate={replicate}+ext=.ipynb`.\n", | |
" - For example, `evo=island1-niche1-ngen10000-popsize1024-tournsize2+instrument=steady-old-bits64-diff1+replicate=0+ext=.ipynb`.\n", | |
"- Configure variables in \"Configure Experment\" section.\n", | |
"- On the top menu, click `Runtime > Restart sesson and run all` if available, otherwise `Runtime > Run all`.\n", | |
"- Wait for final cell's execution to complete.\n", | |
"- Record configured variables and results from \"Evaluate Reconstruction\" section in [results spreadsheet](https://docs.google.com/spreadsheets/d/1ZhS4NDTDyBiwmwtWrZO5L06MGB3lhmp2-5ZzClhEwPU/edit?usp=sharing).\n", | |
"- On the top menu, click `File > Download > Download .ipynb`.\n", | |
"- Upload ipynb file to treatment directory at <https://osf.io/n4b2g/>, named same as notebook, except excluding `+replicate={replicate}+ext=.ipynb`.\n", | |
" - Treatment directory should contain notebooks for each replicate of notebook.\n" | |
], | |
"metadata": { | |
"id": "lhNERG3shuFb" | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"## Set Up Environment" | |
], | |
"metadata": { | |
"id": "pzVsZRrQEgVa" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"!python3 -m pip install \\\n", | |
" \"alifedata_phyloinformatics_convert==0.15.1\" \\\n", | |
" \"biopython==1.83\" \\\n", | |
" \"dendropy==4.6.1\" \\\n", | |
" \"git+https://github.com/mmore500/[email protected]#egg=hsurf\" \\\n", | |
" \"hstrat==1.9.1\" \\\n", | |
" \"matplotlib==3.8.2\" \\\n", | |
" \"pandas==1.5.3\" \\\n", | |
" \"tqdist==1.0\" \\\n", | |
" \"tqdm==4.66.1\" \\\n", | |
" \"typing_extensions>=4.9.0\" \\\n", | |
" \"watermark==2.4.3\"" | |
], | |
"metadata": { | |
"id": "yGzTyOj4EfuD" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"from collections import Counter\n", | |
"import typing\n", | |
"\n", | |
"import alifedata_phyloinformatics_convert as apc\n", | |
"from Bio import Phylo\n", | |
"import dendropy as dp\n", | |
"from hstrat import hstrat\n", | |
"from hstrat import _auxiliary_lib as hstrat_aux\n", | |
"from hsurf import hsurf\n", | |
"from matplotlib import pyplot as plt\n", | |
"import numpy as np\n", | |
"import pandas as pd\n", | |
"import tqdist\n", | |
"from tqdm.notebook import tqdm" | |
], | |
"metadata": { | |
"id": "cIrNBKme_8Ey" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"## Configure Experiment" | |
], | |
"metadata": { | |
"id": "H0US4pJ6Cpz4" | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"Configure instrumentation. **Edit me**" | |
], | |
"metadata": { | |
"id": "hgSx_K65JBqH" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"# TODO Uncomment one...\n", | |
"# annotation_size_bits = 64\n", | |
"# annotation_size_bits = 256\n", | |
"# annotation_size_bits = 1024\n", | |
"assert annotation_size_bits.bit_count() == 1, \"must be power of 2 (1, 2, 4, 8, etc.)\"\n", | |
"\n", | |
"# TODO Uncomment one...\n", | |
"# differentia_width_bits = 1\n", | |
"# differentia_width_bits = 8\n", | |
"assert differentia_width_bits.bit_count() == 1, \"must be power of 2 (1, 2, 4, 8, etc.)\"\n", | |
"\n", | |
"# TODO Uncomment one...\n", | |
"# stratum_retention_algo = hstrat.depth_proportional_resolution_tapered_algo # old impl/steady behavior\n", | |
"# stratum_retention_algo = hstrat.recency_proportional_resolution_curbed_algo # old impl/tilted behavior\n", | |
"# stratum_retention_algo = hsurf.stratum_retention_interop_hybrid_algo # new impl/hybrid behavior\n", | |
"# stratum_retention_algo = hsurf.stratum_retention_interop_steady_algo # new impl/steady behavior\n", | |
"# stratum_retention_algo = hsurf.stratum_retention_interop_tilted_sticky_algo # new impl/tilted behavior" | |
], | |
"metadata": { | |
"id": "mWGLaLxyJEf8" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"Configure evolutionary scale. **Edit me**" | |
], | |
"metadata": { | |
"id": "VhnKN1ktJHYp" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"# TODO Uncomment one...\n", | |
"# population_size = 1024 # default condition\n", | |
"# population_size = 65536 # alternate condition\n", | |
"assert population_size.bit_count() == 1, \"must be power of 2 (1, 2, 4, 8, etc.)\"\n", | |
"\n", | |
"# TODO Uncomment one...\n", | |
"# num_generations = 10000 # default condition\n", | |
"# num_generations = 100000 # alternate condition\n" | |
], | |
"metadata": { | |
"id": "xQ21R33LCplC" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"Configure evolutionary conditions. **Edit me**" | |
], | |
"metadata": { | |
"id": "vtNMCtNcJPLw" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"# TODO Uncomment one...\n", | |
"# num_islands=1 # default condition\n", | |
"# num_islands=64 # alternate condition\n", | |
"assert num_islands.bit_count() == 1, \"must be power of 2 (1, 2, 4, 8, etc.)\"\n", | |
"\n", | |
"# TODO Uncomment one...\n", | |
"# num_niches=1 # default condition\n", | |
"# num_niches=8 # alternate condition\n", | |
"assert num_niches.bit_count() == 1, \"must be power of 2 (1, 2, 4, 8, etc.)\"\n", | |
"\n", | |
"# TODO Uncomment one...\n", | |
"# tournament_size=2 # default condition\n", | |
"# tournament_size=1 # alternate condition\n", | |
"# tournament_size=8 # alternate condition\n" | |
], | |
"metadata": { | |
"id": "v5nvGaCJJNaI" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"Configure experimental replicate. **Edit me**" | |
], | |
"metadata": { | |
"id": "cIrbQgJ_cLhf" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"replicate = # TODO set to a number, 0 through 19" | |
], | |
"metadata": { | |
"id": "g8vC-VBLcKt1" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"Set up random number generator. (Do not edit.)" | |
], | |
"metadata": { | |
"id": "pqY6ejDEJgZA" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"seed = hash(\n", | |
" (\n", | |
" replicate,\n", | |
" population_size,\n", | |
" num_generations,\n", | |
" num_islands,\n", | |
" num_niches,\n", | |
" tournament_size,\n", | |
" )\n", | |
") % 2 ** 32\n", | |
"\n", | |
"seed" | |
], | |
"metadata": { | |
"id": "naUfe9mNCcsV" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"from hstrat._auxiliary_lib import seed_random\n", | |
"\n", | |
"seed_random(seed)\n" | |
], | |
"metadata": { | |
"id": "em_mdO9-ENsr" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"Parametrize instrumentation. (Do not edit.)" | |
], | |
"metadata": { | |
"id": "nzxql5RHXlrU" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"annotation_capacity_strata = annotation_size_bits // differentia_width_bits\n", | |
"assert annotation_capacity_strata.bit_count() == 1, \"must be power of 2 (1, 2, 4, 8, etc.)\"\n", | |
"print(f\"{annotation_capacity_strata=}\")\n", | |
"\n", | |
"parametrized_policy = stratum_retention_algo.Policy(\n", | |
" parameterizer=hstrat.PropertyAtMostParameterizer(\n", | |
" target_value=annotation_capacity_strata,\n", | |
" policy_evaluator=hstrat.NumStrataRetainedUpperBoundEvaluator(\n", | |
" at_num_strata_deposited=num_generations,\n", | |
" ),\n", | |
" param_lower_bound=2,\n", | |
" param_upper_bound=1024,\n", | |
" ),\n", | |
")\n", | |
"\n", | |
"print(f\"{parametrized_policy=}\")\n", | |
"print(f\"num strata retained upper bound {parametrized_policy.CalcNumStrataRetainedUpperBound(num_generations)}\")\n" | |
], | |
"metadata": { | |
"id": "tUkf_biLXsrm" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"## Setup" | |
], | |
"metadata": { | |
"id": "rGLUHFxxARm4" | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"Helper functions." | |
], | |
"metadata": { | |
"id": "PA-DX12UHqqN" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"def calc_tqdist_distance(\n", | |
" x: pd.DataFrame,\n", | |
" y: pd.DataFrame,\n", | |
" progress_wrap: typing.Callable = lambda x: x,\n", | |
" ) -> float:\n", | |
" \"\"\"Calculate dissimilarity between two trees. Used to measure how accurate\n", | |
" tree reconstructions are.\"\"\"\n", | |
" tree_a = apc.RosettaTree(x).as_dendropy\n", | |
" tree_b = apc.RosettaTree(y).as_dendropy\n", | |
"\n", | |
" # must suppress root unifurcations or tqdist barfs\n", | |
" # see https://github.com/uym2/tripVote/issues/15\n", | |
" tree_a.unassign_taxa(exclude_leaves=True)\n", | |
" tree_a.suppress_unifurcations()\n", | |
" tree_b.unassign_taxa(exclude_leaves=True)\n", | |
" tree_b.suppress_unifurcations()\n", | |
"\n", | |
" tree_a_taxon_labels = [\n", | |
" leaf.taxon.label for leaf in progress_wrap(tree_a.leaf_node_iter())\n", | |
" ]\n", | |
" tree_b_taxon_labels = [\n", | |
" leaf.taxon.label for leaf in progress_wrap(tree_b.leaf_node_iter())\n", | |
" ]\n", | |
" all(\n", | |
" progress_wrap(\n", | |
" zip(tree_a.leaf_node_iter(), tree_b.leaf_node_iter(), strict=True),\n", | |
" ),\n", | |
" )\n", | |
" assert sorted(tree_a_taxon_labels) == sorted(tree_b_taxon_labels)\n", | |
" assert sorted(tree_a_taxon_labels) == sorted(\n", | |
" x.loc[hstrat_aux.alifestd_find_leaf_ids(x), \"taxon_label\"],\n", | |
" )\n", | |
" assert sorted(tree_a_taxon_labels) == sorted(\n", | |
" y.loc[hstrat_aux.alifestd_find_leaf_ids(y), \"taxon_label\"],\n", | |
" )\n", | |
" for taxon_label in progress_wrap(tree_a_taxon_labels):\n", | |
" assert taxon_label\n", | |
" assert taxon_label.strip()\n", | |
"\n", | |
" newick_a = tree_a.as_string(schema=\"newick\").strip()\n", | |
" newick_b = tree_b.as_string(schema=\"newick\").strip()\n", | |
"\n", | |
" return {\n", | |
" \"quartet_distance\": tqdist.quartet_distance(newick_a, newick_b),\n", | |
" \"quartet_distanc_raw\": tqdist.quartet_distance_raw(newick_a, newick_b),\n", | |
" \"triplet_distance\": tqdist.triplet_distance(newick_a, newick_b),\n", | |
" \"triplet_distance_raw\": tqdist.triplet_distance_raw(newick_a, newick_b),\n", | |
" }\n" | |
], | |
"metadata": { | |
"id": "es5aIEGi9-UG" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"## Generate Phylogeny\n", | |
"\n", | |
"Use simple evolutionary simulation to generate a phylogenetic history to test reconstruction process on." | |
], | |
"metadata": { | |
"id": "X9hDnZb4BvX1" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"true_phylogeny_df = hstrat.evolve_fitness_trait_population(\n", | |
" num_islands=num_islands,\n", | |
" num_niches=num_niches,\n", | |
" num_generations=num_generations,\n", | |
" population_size=population_size,\n", | |
" tournament_size=tournament_size,\n", | |
" progress_wrap=tqdm,\n", | |
")" | |
], | |
"metadata": { | |
"id": "JNhgAvB8Bu5p" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"true_phylogeny_df[\"taxon_label\"] = true_phylogeny_df[\"loc\"].astype(str)\n", | |
"true_phylogeny_df = hstrat_aux.alifestd_mark_leaves(true_phylogeny_df, mutate=True)\n", | |
"true_phylogeny_df.loc[\n", | |
" ~true_phylogeny_df[\"is_leaf\"], \"taxon_label\"\n", | |
"] = \"\"\n", | |
"true_phylogeny_df" | |
], | |
"metadata": { | |
"id": "C4GRtk0Y1tBX" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"true_phylogeny_df = hstrat_aux.alifestd_to_working_format(\n", | |
" hstrat_aux.alifestd_collapse_unifurcations(true_phylogeny_df, mutate=True),\n", | |
" mutate=True,\n", | |
").reset_index(drop=True)\n", | |
"true_phylogeny_df" | |
], | |
"metadata": { | |
"id": "cOdQyGeGAe3P" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"## Generate Reconstruction\n", | |
"\n", | |
"Generate genome annotations as if tracking phylogeny in distributed environment.\n", | |
"Then run reconstruction proess to estimate true phylogeny from generated annotations." | |
], | |
"metadata": { | |
"id": "kscgdxYbB19z" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"extant_annotations = hstrat.descend_template_phylogeny_alifestd(\n", | |
" true_phylogeny_df,\n", | |
" seed_column=hstrat.HereditaryStratigraphicColumn(parametrized_policy),\n", | |
" extant_ids=hstrat_aux.alifestd_find_leaf_ids(true_phylogeny_df),\n", | |
" progress_wrap=tqdm,\n", | |
")\n", | |
"\n", | |
"len(extant_annotations)" | |
], | |
"metadata": { | |
"id": "zIDeuWorB1W8" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"reconstructed_phylogeny_df = hstrat.build_tree(\n", | |
" extant_annotations,\n", | |
" progress_wrap=tqdm,\n", | |
" version_pin=hstrat.__version__,\n", | |
" taxon_labels=true_phylogeny_df.loc[\n", | |
" hstrat_aux.alifestd_find_leaf_ids(true_phylogeny_df),\n", | |
" \"taxon_label\",\n", | |
" ],\n", | |
")\n", | |
"reconstructed_phylogeny_df" | |
], | |
"metadata": { | |
"id": "QFCunjmBj9EH" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"reconstructed_phylogeny_df = hstrat_aux.alifestd_collapse_unifurcations(reconstructed_phylogeny_df, mutate=True)\n", | |
"reconstructed_phylogeny_df" | |
], | |
"metadata": { | |
"id": "kDxYkpLvA5SO" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"## Evaluate Reconstruction\n", | |
"\n", | |
"Reconstruction quality data --- collect into spreadsheet." | |
], | |
"metadata": { | |
"id": "HyyTqjlXB5uu" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"estimation_intervals = [\n", | |
" hstrat.calc_ranks_since_mrca_bounds_with(\n", | |
" *np.random.choice(extant_annotations, size=2, replace=False),\n", | |
" prior=\"arbitrary\",\n", | |
" )\n", | |
"\n", | |
" for __ in tqdm(range(200))\n", | |
"]\n" | |
], | |
"metadata": { | |
"id": "CN3aqzDS-aT3" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"median_abs_uncertainty = np.median([*map(np.ptp, estimation_intervals)])\n", | |
"mean_abs_uncertainty = np.mean([*map(np.ptp, estimation_intervals)])\n", | |
"f\"{median_abs_uncertainty=} {mean_abs_uncertainty=}\"" | |
], | |
"metadata": { | |
"id": "93JmY11g-eVp" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"rel_uncertainties = (\n", | |
" np.array([*map(np.ptp, estimation_intervals)])\n", | |
" / (np.array([*map(np.mean, estimation_intervals)]) + 1)\n", | |
")\n", | |
"median_rel_uncertainty = np.median(rel_uncertainties)\n", | |
"mean_rel_uncertainty = np.mean(rel_uncertainties)\n", | |
"f\"{median_rel_uncertainty=} {mean_rel_uncertainty=}\"" | |
], | |
"metadata": { | |
"id": "5kgf112d-4Bq" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"num_true_inner_nodes = hstrat_aux.alifestd_count_inner_nodes(true_phylogeny_df)\n", | |
"num_reconstructed_inner_nodes = hstrat_aux.alifestd_count_inner_nodes(reconstructed_phylogeny_df)\n", | |
"f\"{num_true_inner_nodes=} {num_reconstructed_inner_nodes=}\"" | |
], | |
"metadata": { | |
"id": "staoF6vMDGW3" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"num_true_polytomies = hstrat_aux.alifestd_count_polytomies(true_phylogeny_df)\n", | |
"num_reconstructed_polytomies = hstrat_aux.alifestd_count_polytomies(reconstructed_phylogeny_df)\n", | |
"f\"{num_true_polytomies=} {num_reconstructed_polytomies=}\"" | |
], | |
"metadata": { | |
"id": "5tiqawvDgwqS" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"true_polytomic_index = hstrat_aux.alifestd_calc_polytomic_index(true_phylogeny_df)\n", | |
"reconstructed_polytomic_index = hstrat_aux.alifestd_calc_polytomic_index(reconstructed_phylogeny_df)\n", | |
"f\"{true_polytomic_index=} {reconstructed_polytomic_index=}\"" | |
], | |
"metadata": { | |
"id": "S1XOZAtJAizJ" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"distances = calc_tqdist_distance(\n", | |
" true_phylogeny_df,\n", | |
" reconstructed_phylogeny_df,\n", | |
" progress_wrap=tqdm,\n", | |
")\n", | |
"f\"{distances=}\"" | |
], | |
"metadata": { | |
"id": "ZWJpZlpMB5N0" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"sampled_triplet_distance_strict = hstrat_aux.alifestd_estimate_triplet_distance_asexual(\n", | |
" true_phylogeny_df,\n", | |
" reconstructed_phylogeny_df,\n", | |
" taxon_label_key=\"taxon_label\",\n", | |
" confidence=0.8,\n", | |
" precision=0.05,\n", | |
" strict=True,\n", | |
" progress_wrap=tqdm,\n", | |
" mutate=True,\n", | |
")\n", | |
"f\"{sampled_triplet_distance_strict=}\"" | |
], | |
"metadata": { | |
"id": "s7XAQgYwBeTh" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"sampled_triplet_distance_lax = hstrat_aux.alifestd_estimate_triplet_distance_asexual(\n", | |
" true_phylogeny_df,\n", | |
" reconstructed_phylogeny_df,\n", | |
" taxon_label_key=\"taxon_label\",\n", | |
" confidence=0.8,\n", | |
" precision=0.05,\n", | |
" strict=False,\n", | |
" progress_wrap=tqdm,\n", | |
" mutate=True,\n", | |
")\n", | |
"f\"{sampled_triplet_distance_lax=}\"" | |
], | |
"metadata": { | |
"id": "2DTLCS4T3yJJ" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"## Visualize Phylogeny & Reconstruction\n", | |
"\n", | |
"For validating results." | |
], | |
"metadata": { | |
"id": "6J_PwYFnB-Zi" | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"Topology only (no time)." | |
], | |
"metadata": { | |
"id": "TZSzgxPWzMfE" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"if population_size <= 1024:\n", | |
" true_phylogeny_tree = apc.alife_dataframe_to_biopython_tree(\n", | |
" hstrat_aux.alifestd_collapse_unifurcations(true_phylogeny_df),\n", | |
" setup_branch_lengths=False,\n", | |
" )\n", | |
" reconstructed_phylogeny_tree = apc.alife_dataframe_to_biopython_tree(\n", | |
" hstrat_aux.alifestd_collapse_unifurcations(reconstructed_phylogeny_df),\n", | |
" setup_branch_lengths=False,\n", | |
" )\n", | |
"\n", | |
" fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))\n", | |
"\n", | |
" ax1.set_title(\"True Tree\")\n", | |
" Phylo.draw(true_phylogeny_tree, do_show=False, axes=ax1)\n", | |
"\n", | |
" ax2.set_title(\"Reconstructed Tree\")\n", | |
" Phylo.draw(reconstructed_phylogeny_tree, do_show=False, axes=ax2)\n", | |
"\n", | |
" plt.tight_layout()\n", | |
" plt.show()" | |
], | |
"metadata": { | |
"id": "dhSB7eqTy86F" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"Scaled by time." | |
], | |
"metadata": { | |
"id": "zxvNxM9NzVY6" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"if population_size <= 1024:\n", | |
" true_phylogeny_tree = apc.alife_dataframe_to_biopython_tree(\n", | |
" hstrat_aux.alifestd_collapse_unifurcations(true_phylogeny_df),\n", | |
" setup_branch_lengths=True,\n", | |
" )\n", | |
" reconstructed_phylogeny_tree = apc.alife_dataframe_to_biopython_tree(\n", | |
" hstrat_aux.alifestd_collapse_unifurcations(reconstructed_phylogeny_df),\n", | |
" setup_branch_lengths=True,\n", | |
" )\n", | |
"\n", | |
" fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))\n", | |
"\n", | |
" ax1.set_title(\"True Tree\")\n", | |
" ax1.set_xscale(\"log\")\n", | |
" Phylo.draw(true_phylogeny_tree, do_show=False, axes=ax1)\n", | |
"\n", | |
" ax2.set_title(\"Reconstructed Tree\")\n", | |
" ax2.set_xscale(\"log\")\n", | |
" Phylo.draw(reconstructed_phylogeny_tree, do_show=False, axes=ax2)\n", | |
"\n", | |
" plt.tight_layout()\n", | |
" plt.show()" | |
], | |
"metadata": { | |
"id": "SKd7tpe5-2e5" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"## Reproducibility Information\n", | |
"\n", | |
"For future reference if reproducing experiments." | |
], | |
"metadata": { | |
"id": "1vWYtYFg_39e" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"print(\n", | |
" f\"\"\"# instrumentation\n", | |
"{annotation_size_bits=}\n", | |
"{differentia_width_bits=}\n", | |
"{stratum_retention_algo.PolicySpec.GetAlgoTitle()=}\n", | |
"\n", | |
"# evolutionary scale\n", | |
"{population_size=}\n", | |
"{num_generations=}\n", | |
"\n", | |
"# evolutionary conditions\n", | |
"{num_islands=}\n", | |
"{num_niches=}\n", | |
"{tournament_size=}\n", | |
"\"\"\"\n", | |
")" | |
], | |
"metadata": { | |
"id": "q0N8A_8JDRXr" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"import datetime\n", | |
"datetime.datetime.now().isoformat()" | |
], | |
"metadata": { | |
"id": "9rKQfXdBAxxJ" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"%load_ext watermark\n", | |
"%watermark" | |
], | |
"metadata": { | |
"id": "qO0LRJeL_3lB" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"!pip freeze" | |
], | |
"metadata": { | |
"id": "UfHdK7OK-afO" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"locals()" | |
], | |
"metadata": { | |
"id": "fLVm-ivpDjR6" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment