liorkesos · September 8, 2024 09:43
diff --git a/GraphBuilderContibutionSegmentation.ipynb b/GraphBuilderContibutionSegmentation.ipynb
 {
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "authorship_tag": "ABX9TyMQSJo0o0FAJKAugLWUBaTR",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    },
    "widgets": {
      "application/vnd.jupyter.widget-state+json": {
        "77102d5b05a7485981c20ddc2eb00606": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "IntSliderModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "IntSliderModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "IntSliderView",
            "continuous_update": false,
            "description": "Slider:",
            "description_tooltip": null,
            "disabled": false,
            "layout": "IPY_MODEL_fcbc7a1d98bf45f2bf52350bc771d2b2",
            "max": 100,
            "min": 0,
            "orientation": "horizontal",
            "readout": true,
            "readout_format": "d",
            "step": 1,
            "style": "IPY_MODEL_7663355f7b6c485294554156ca81759d",
            "value": 16
          }
        },
        "fcbc7a1d98bf45f2bf52350bc771d2b2": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "7663355f7b6c485294554156ca81759d": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "SliderStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "SliderStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": "",
            "handle_color": null
          }
        }
      }
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/gist/liorkesos/52035c7308dafb220bee4244107a9a08/untitled0.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "0M3YoPr738h-"
      },
      "outputs": [],
      "source": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "# GraphBuilder Contribution Segmentation Algorithm\n",
        "\n",
        "## Introduction\n",
        "\n",
        "This notebook presents the mathematical formulation of the GraphBuilder Contribution Segmentation algorithm. This algorithm processes definition files (like Terraform, Pulumi, or Backstage) to create a weighted graph describing and visualizing the application and its subsystems.\n",
        "\n",
        "## Input and Output\n",
        "\n",
        "### Input\n",
        "The input is a singular definition files:\n",
        "\n",
        "D\n",
        "\n",
        "### Output\n",
        "The output is a weighted graph:\n",
        "\n",
        "$G = (V, E, W)$\n",
        "\n",
        "Where:\n",
        "- $V$ is the set of vertices (subsystems or components)\n",
        "- $E \\subseteq V \\times V$ is the set of edges (dependencies)\n",
        "- $W: E \\rightarrow \\mathbb{R}^+$ is a weight function assigning a positive real number to each edge\n",
        "\n",
        "## Algorithm Steps\n",
        "\n",
        "### 1. Parsing and Tokenization\n",
        "\n",
        "For each $d_i \\in D$:\n",
        "- Parse $d_i$ into a set of tokens $T_i = \\{t_1, t_2, ..., t_m\\}$\n",
        "- Define a function $f: T_i \\rightarrow C$ that maps tokens to components\n",
        "\n",
        "### 2. Component Extraction\n",
        "\n",
        "$C = \\bigcup_{i=1}^n f(T_i)$\n",
        "\n",
        "This represents the set of all unique components.\n",
        "\n",
        "### 3. Dependency Identification\n",
        "\n",
        "Define a binary relation $R \\subseteq C \\times C$ where $(c_i, c_j) \\in R$ if $c_i$ depends on $c_j$\n",
        "\n",
        "### 4. Graph Construction\n",
        "\n",
        "- Set $V = C$\n",
        "- For each $(c_i, c_j) \\in R$, add an edge $e_{ij} = (c_i, c_j)$ to $E$\n",
        "\n",
        "### 5. Weight Assignment\n",
        "\n",
        "Define a weight function $W: E \\rightarrow \\mathbb{R}^+$ as:\n",
        "\n",
        "$W(e_{ij}) = \\alpha \\cdot f_{dep}(c_i, c_j) + \\beta \\cdot f_{com}(c_i, c_j) + \\gamma \\cdot f_{imp}(c_i, c_j)$\n",
        "\n",
        "Where:\n",
        "- $f_{dep}(c_i, c_j)$ quantifies the strength of dependency\n",
        "- $f_{com}(c_i, c_j)$ measures the communication frequency\n",
        "- $f_{imp}(c_i, c_j)$ assesses the impact of the dependency\n",
        "- $\\alpha, \\beta, \\gamma$ are hyperparameters such that $\\alpha + \\beta + \\gamma = 1$\n",
        "\n",
        "### 6. Subsystem Clustering\n",
        "\n",
        "Apply a graph clustering algorithm $\\mathcal{A}$ to $G$:\n",
        "\n",
        "$S = \\mathcal{A}(G)$, where $S = \\{S_1, S_2, ..., S_k\\}$ is a set of subsystems\n",
        "\n",
        "### 7. Contribution Segmentation\n",
        "\n",
        "For each subsystem $S_i \\in S$, compute its contribution score:\n",
        "\n",
        "$CS(S_i) = \\frac{\\sum_{v \\in S_i} \\sum_{u \\in V} W((v,u))}{\\sum_{e \\in E} W(e)}$\n",
        "\n",
        "## Complexity Analysis\n",
        "\n",
        "- Time Complexity: $O(|D| \\cdot L + |V|^2 + |E| \\log |E|)$, where $L$ is the average length of a definition file\n",
        "- Space Complexity: $O(|V| + |E|)$\n",
        "\n",
        "## Notes\n",
        "\n",
        "- The choice of clustering algorithm $\\mathcal{A}$ (e.g., Louvain, Spectral Clustering) affects the final subsystem structure\n",
        "- The weight function $W$ can be adjusted based on domain-specific knowledge or empirical studies\n",
        "\n",
        "## Next Steps\n",
        "\n",
        "[You can add implementation details, code snippets, or further explanations here]"
      ],
      "metadata": {
        "id": "XifFjqEU4KSP"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# prompt: a slider using jupyter widgets\n",
        "\n",
        "import ipywidgets as widgets\n",
        "\n",
        "slider = widgets.IntSlider(\n",
        "    value=50,\n",
        "    min=0,\n",
        "    max=100,\n",
        "    step=1,\n",
        "    description='Slider:',\n",
        "    disabled=False,\n",
        "    continuous_update=False,\n",
        "    orientation='horizontal',\n",
        "    readout=True,\n",
        "    readout_format='d'\n",
        ")\n",
        "\n",
        "display(slider)\n"
      ],
      "metadata": {
        "id": "EAm1BWJgi07d",
        "outputId": "ec4c0b71-ac09-4c66-dbe4-b431799bb949",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 49,
          "referenced_widgets": [
            "77102d5b05a7485981c20ddc2eb00606",
            "fcbc7a1d98bf45f2bf52350bc771d2b2",
            "7663355f7b6c485294554156ca81759d"
          ]
        }
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "IntSlider(value=50, continuous_update=False, description='Slider:')"
            ],
            "application/vnd.jupyter.widget-view+json": {
              "version_major": 2,
              "version_minor": 0,
              "model_id": "77102d5b05a7485981c20ddc2eb00606"
            }
          },
          "metadata": {}
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [],
      "metadata": {
        "id": "ZWd7H6GP9PpQ"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [],
      "metadata": {
        "id": "cBiueTo89QQ3"
      }
    },
    {
      "cell_type": "code",
      "source": [],
      "metadata": {
        "id": "amL1suJf9QpA"
      },
      "execution_count": null,
      "outputs": []
    }
  ]
 }
	{
	"nbformat": 4,
	"nbformat_minor": 0,
	"metadata": {
	"colab": {
	"provenance": [],
	"authorship_tag": "ABX9TyMQSJo0o0FAJKAugLWUBaTR",
	"include_colab_link": true
	},
	"kernelspec": {
	"name": "python3",
	"display_name": "Python 3"
	},
	"language_info": {
	"name": "python"
	},
	"widgets": {
	"application/vnd.jupyter.widget-state+json": {
	"77102d5b05a7485981c20ddc2eb00606": {
	"model_module": "@jupyter-widgets/controls",
	"model_name": "IntSliderModel",
	"model_module_version": "1.5.0",
	"state": {
	"_dom_classes": [],
	"_model_module": "@jupyter-widgets/controls",
	"_model_module_version": "1.5.0",
	"_model_name": "IntSliderModel",
	"_view_count": null,
	"_view_module": "@jupyter-widgets/controls",
	"_view_module_version": "1.5.0",
	"_view_name": "IntSliderView",
	"continuous_update": false,
	"description": "Slider:",
	"description_tooltip": null,
	"disabled": false,
	"layout": "IPY_MODEL_fcbc7a1d98bf45f2bf52350bc771d2b2",
	"max": 100,
	"min": 0,
	"orientation": "horizontal",
	"readout": true,
	"readout_format": "d",
	"step": 1,
	"style": "IPY_MODEL_7663355f7b6c485294554156ca81759d",
	"value": 16
	}
	},
	"fcbc7a1d98bf45f2bf52350bc771d2b2": {
	"model_module": "@jupyter-widgets/base",
	"model_name": "LayoutModel",
	"model_module_version": "1.2.0",
	"state": {
	"_model_module": "@jupyter-widgets/base",
	"_model_module_version": "1.2.0",
	"_model_name": "LayoutModel",
	"_view_count": null,
	"_view_module": "@jupyter-widgets/base",
	"_view_module_version": "1.2.0",
	"_view_name": "LayoutView",
	"align_content": null,
	"align_items": null,
	"align_self": null,
	"border": null,
	"bottom": null,
	"display": null,
	"flex": null,
	"flex_flow": null,
	"grid_area": null,
	"grid_auto_columns": null,
	"grid_auto_flow": null,
	"grid_auto_rows": null,
	"grid_column": null,
	"grid_gap": null,
	"grid_row": null,
	"grid_template_areas": null,
	"grid_template_columns": null,
	"grid_template_rows": null,
	"height": null,
	"justify_content": null,
	"justify_items": null,
	"left": null,
	"margin": null,
	"max_height": null,
	"max_width": null,
	"min_height": null,
	"min_width": null,
	"object_fit": null,
	"object_position": null,
	"order": null,
	"overflow": null,
	"overflow_x": null,
	"overflow_y": null,
	"padding": null,
	"right": null,
	"top": null,
	"visibility": null,
	"width": null
	}
	},
	"7663355f7b6c485294554156ca81759d": {
	"model_module": "@jupyter-widgets/controls",
	"model_name": "SliderStyleModel",
	"model_module_version": "1.5.0",
	"state": {
	"_model_module": "@jupyter-widgets/controls",
	"_model_module_version": "1.5.0",
	"_model_name": "SliderStyleModel",
	"_view_count": null,
	"_view_module": "@jupyter-widgets/base",
	"_view_module_version": "1.2.0",
	"_view_name": "StyleView",
	"description_width": "",
	"handle_color": null
	}
	}
	}
	}
	},
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "view-in-github",
	"colab_type": "text"
	},
	"source": [
	"<a href=\"https://colab.research.google.com/gist/liorkesos/52035c7308dafb220bee4244107a9a08/untitled0.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"id": "0M3YoPr738h-"
	},
	"outputs": [],
	"source": []
	},
	{
	"cell_type": "markdown",
	"source": [
	"# GraphBuilder Contribution Segmentation Algorithm\n",
	"\n",
	"## Introduction\n",
	"\n",
	"This notebook presents the mathematical formulation of the GraphBuilder Contribution Segmentation algorithm. This algorithm processes definition files (like Terraform, Pulumi, or Backstage) to create a weighted graph describing and visualizing the application and its subsystems.\n",
	"\n",
	"## Input and Output\n",
	"\n",
	"### Input\n",
	"The input is a singular definition files:\n",
	"\n",
	"D\n",
	"\n",
	"### Output\n",
	"The output is a weighted graph:\n",
	"\n",
	"$G = (V, E, W)$\n",
	"\n",
	"Where:\n",
	"- $V$ is the set of vertices (subsystems or components)\n",
	"- $E \\subseteq V \\times V$ is the set of edges (dependencies)\n",
	"- $W: E \\rightarrow \\mathbb{R}^+$ is a weight function assigning a positive real number to each edge\n",
	"\n",
	"## Algorithm Steps\n",
	"\n",
	"### 1. Parsing and Tokenization\n",
	"\n",
	"For each $d_i \\in D$:\n",
	"- Parse $d_i$ into a set of tokens $T_i = \\{t_1, t_2, ..., t_m\\}$\n",
	"- Define a function $f: T_i \\rightarrow C$ that maps tokens to components\n",
	"\n",
	"### 2. Component Extraction\n",
	"\n",
	"$C = \\bigcup_{i=1}^n f(T_i)$\n",
	"\n",
	"This represents the set of all unique components.\n",
	"\n",
	"### 3. Dependency Identification\n",
	"\n",
	"Define a binary relation $R \\subseteq C \\times C$ where $(c_i, c_j) \\in R$ if $c_i$ depends on $c_j$\n",
	"\n",
	"### 4. Graph Construction\n",
	"\n",
	"- Set $V = C$\n",
	"- For each $(c_i, c_j) \\in R$, add an edge $e_{ij} = (c_i, c_j)$ to $E$\n",
	"\n",
	"### 5. Weight Assignment\n",
	"\n",
	"Define a weight function $W: E \\rightarrow \\mathbb{R}^+$ as:\n",
	"\n",
	"$W(e_{ij}) = \\alpha \\cdot f_{dep}(c_i, c_j) + \\beta \\cdot f_{com}(c_i, c_j) + \\gamma \\cdot f_{imp}(c_i, c_j)$\n",
	"\n",
	"Where:\n",
	"- $f_{dep}(c_i, c_j)$ quantifies the strength of dependency\n",
	"- $f_{com}(c_i, c_j)$ measures the communication frequency\n",
	"- $f_{imp}(c_i, c_j)$ assesses the impact of the dependency\n",
	"- $\\alpha, \\beta, \\gamma$ are hyperparameters such that $\\alpha + \\beta + \\gamma = 1$\n",
	"\n",
	"### 6. Subsystem Clustering\n",
	"\n",
	"Apply a graph clustering algorithm $\\mathcal{A}$ to $G$:\n",
	"\n",
	"$S = \\mathcal{A}(G)$, where $S = \\{S_1, S_2, ..., S_k\\}$ is a set of subsystems\n",
	"\n",
	"### 7. Contribution Segmentation\n",
	"\n",
	"For each subsystem $S_i \\in S$, compute its contribution score:\n",
	"\n",
	"$CS(S_i) = \\frac{\\sum_{v \\in S_i} \\sum_{u \\in V} W((v,u))}{\\sum_{e \\in E} W(e)}$\n",
	"\n",
	"## Complexity Analysis\n",
	"\n",
	"- Time Complexity: $O(\|D\| \\cdot L + \|V\|^2 + \|E\| \\log \|E\|)$, where $L$ is the average length of a definition file\n",
	"- Space Complexity: $O(\|V\| + \|E\|)$\n",
	"\n",
	"## Notes\n",
	"\n",
	"- The choice of clustering algorithm $\\mathcal{A}$ (e.g., Louvain, Spectral Clustering) affects the final subsystem structure\n",
	"- The weight function $W$ can be adjusted based on domain-specific knowledge or empirical studies\n",
	"\n",
	"## Next Steps\n",
	"\n",
	"[You can add implementation details, code snippets, or further explanations here]"
	],
	"metadata": {
	"id": "XifFjqEU4KSP"
	}
	},
	{
	"cell_type": "code",
	"source": [
	"# prompt: a slider using jupyter widgets\n",
	"\n",
	"import ipywidgets as widgets\n",
	"\n",
	"slider = widgets.IntSlider(\n",
	" value=50,\n",
	" min=0,\n",
	" max=100,\n",
	" step=1,\n",
	" description='Slider:',\n",
	" disabled=False,\n",
	" continuous_update=False,\n",
	" orientation='horizontal',\n",
	" readout=True,\n",
	" readout_format='d'\n",
	")\n",
	"\n",
	"display(slider)\n"
	],
	"metadata": {
	"id": "EAm1BWJgi07d",
	"outputId": "ec4c0b71-ac09-4c66-dbe4-b431799bb949",
	"colab": {
	"base_uri": "https://localhost:8080/",
	"height": 49,
	"referenced_widgets": [
	"77102d5b05a7485981c20ddc2eb00606",
	"fcbc7a1d98bf45f2bf52350bc771d2b2",
	"7663355f7b6c485294554156ca81759d"
	]
	}
	},
	"execution_count": null,
	"outputs": [
	{
	"output_type": "display_data",
	"data": {
	"text/plain": [
	"IntSlider(value=50, continuous_update=False, description='Slider:')"
	],
	"application/vnd.jupyter.widget-view+json": {
	"version_major": 2,
	"version_minor": 0,
	"model_id": "77102d5b05a7485981c20ddc2eb00606"
	}
	},
	"metadata": {}
	}
	]
	},
	{
	"cell_type": "code",
	"source": [],
	"metadata": {
	"id": "ZWd7H6GP9PpQ"
	},
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"source": [],
	"metadata": {
	"id": "cBiueTo89QQ3"
	}
	},
	{
	"cell_type": "code",
	"source": [],
	"metadata": {
	"id": "amL1suJf9QpA"
	},
	"execution_count": null,
	"outputs": []
	}
	]
	}