Skip to content

Instantly share code, notes, and snippets.

@FridayCandour
Last active January 15, 2025 06:24
Show Gist options
  • Save FridayCandour/2a976e9ba8031586fe062124ad2d9fca to your computer and use it in GitHub Desktop.
Save FridayCandour/2a976e9ba8031586fe062124ad2d9fca to your computer and use it in GitHub Desktop.
Xtree is an efficient indexing algorithm for distributed data.

Scientific Description of the Indexing Algorithm

Abstract

This paper introduces a novel, highly efficient indexing algorithm implemented in TypeScript. The algorithm is designed to manage large-scale datasets by leveraging a tree-based structure optimized for rapid indexing, search, and deletion operations. It accommodates unique string identifiers, akin to UUIDs, ensuring clarity and performance. The algorithm is particularly well-suited for applications requiring dynamic attribute-based querying with consistent low-latency performance.


Introduction

Efficient indexing algorithms are pivotal for modern data management systems. Traditional indexing structures often struggle to balance speed and memory efficiency while accommodating dynamic schemas. This work presents a groundbreaking approach that combines:

  1. Attribute-Specific Nodes: Each attribute maintains its own node in the tree structure, enabling targeted and efficient queries.
  2. ID-Centric Design: All data operations are indexed via unique string-based identifiers.
  3. Optimized Multi-Attribute Search: A strategy to minimize set intersections during complex queries.
  4. Optimized Relational Operator Multi-Attribute Search: A strategy to seelct based on relational operators.

Design and Implementation

Data Structures

  1. Index Tree:

    • Composed of a Map for the base storage (base) and a Map of attribute nodes (nodes).
    • Each node tracks mappings between attribute values and sets of IDs.
  2. Index Node:

    • Attributes are stored as keys, with their values mapping to sets of IDs.

Algorithm Details

Indexing Operation

  • Objective: Insert or update a record in the index.
  • Steps:
    1. Store the data against the provided ID in base.
    2. For each attribute-value pair in the data:
      • Retrieve or create the corresponding attribute node.
      • Add the ID to the set associated with the value.

Search Operation

  • Objective: Retrieve IDs matching a specific attribute and value.
  • Steps:
    1. Locate the node for the attribute.
    2. Retrieve the set of IDs for the specified value.

Multi-Attribute Search Operation

  • Objective: Retrieve IDs matching multiple attribute-value pairs.
  • Steps:
    1. Sort query attributes by the size of their value sets.
    2. Iteratively intersect the sets, starting with the smallest.

Deletion Operation

  • Objective: Remove all mappings associated with a specific ID.
  • Steps:
    1. Retrieve the data using the ID from base.
    2. For each attribute-value pair in the data:
      • Remove the ID from the corresponding value set.
      • Clean up empty sets or nodes as needed.
    3. Remove the ID from base.

Optimization Strategies

  1. Attribute-Specific Nodes:

    • Isolating attributes reduces unnecessary operations during insertion and querying.
  2. Set-Based Intersection for Multi-Attribute Search:

    • Sorting attributes by set size minimizes computational overhead.
  3. String-Based ID Optimization:

    • Native JavaScript Map and Set are leveraged for efficient string key handling.

Complexity Analysis

Time Complexity

  • Indexing: O(n), where n is the number of attributes in the data.
  • Search: O(1) for single-attribute queries (average case).
  • Multi-Attribute Search: O(m * k), where m is the number of attributes in the query and k is the average size of their value sets.
  • Relational Operatior Search: O(n⋅m⋅s+t⋅(n−1)) Where n = number of query attributes, m = number of keys in node.valueMap, s = average size of Sets in node.valueMap and t = size of the smallest Set.
  • Deletion: O(n), proportional to the number of attributes in the data.

Space Complexity

  • Scales linearly with the number of indexed records and attributes.
  • Efficient use of shared sets for duplicate values across records.

Experimental Results

Pseudocode Examples

Indexing Operation

function indexRecord(id, record):
    base[id] = record
    for (attribute, value) in record:
        if nodes[attribute] does not exist:
            nodes[attribute] = new Map()
        if value not in nodes[attribute]:
            nodes[attribute][value] = new Set()
        nodes[attribute][value].add(id)

Search Operation

function search(attribute, value):
    if attribute not in nodes:
        return empty set
    return nodes[attribute].get(value, empty set)

Multi-Attribute Search Operation

function multiAttributeSearch(query):
    sortedQuery = sort query attributes by size of their value sets in nodes
    result = full set of IDs (initial state)
    for (attribute, value) in sortedQuery:
        result = result intersect search(attribute, value)
        if result is empty:
            break
    return result

Deletion Operation

function deleteRecord(id):
    if id not in base:
        return
    record = base[id]
    for (attribute, value) in record:
        if attribute in nodes and value in nodes[attribute]:
            nodes[attribute][value].delete(id)
            if nodes[attribute][value] is empty:
                remove nodes[attribute][value]
        if nodes[attribute] is empty:
            remove nodes[attribute]
    delete base[id]

(TBD: Benchmark results comparing this algorithm against existing approaches, showcasing improvements in latency and memory efficiency.)


Conclusion

The proposed indexing algorithm offers a robust and efficient solution for dynamic, attribute-based data queries. Its innovative design addresses common challenges in indexing large, schema-less datasets, making it a valuable tool for modern data-intensive applications. Further research could explore its integration with distributed systems and advanced caching mechanisms.


Future Work

  • Extending the algorithm for distributed systems with consistency guarantees.
  • Investigating adaptive strategies for skewed attribute distributions.
  • Enhancing batch indexing throughput using parallelism.

Authors:

  • Friday (Fullstack Engineer)
class Xtree {
private base: Map<string, any>; // Base storage mapping IDs to data
private nodes: Map<string, Map<any, Set<string>>>; // Nodes for different attributes
constructor() {
this.base = new Map<string, any>(); // ID is now always a string
this.nodes = new Map<string, any>();
}
// Add or update data in the tree
index(id: string, data: Record<string, any>): void {
this.base.set(id, data);
for (const [attribute, value] of Object.entries(data)) {
let node = this.nodes.get(attribute);
if (!node) {
node = new Map<any, Set<string>>();
this.nodes.set(attribute, node);
}
if (!node.has(value)) {
node.set(value, new Set<string>());
}
node.get(value)!.add(id);
}
}
// Drop all mappings for an ID
drop(id: string): void {
const data = this.base.get(id);
if (!data) return;
for (const [attribute, value] of Object.entries(data)) {
const node = this.nodes.get(attribute);
if (!node || !node.has(value)) continue;
const idSet = node.get(value)!;
idSet.delete(id);
if (idSet.size === 0) {
node.delete(value); // Deferred cleanup
}
if (node.size === 0) {
this.nodes.delete(attribute);
}
}
this.base.delete(id);
}
// Multi-attribute search
search(query: Record<string, any>): Record<string, any>[] {
const entries = Object.entries(query);
if (entries.length === 0) {
return Array.from(this.base.values());
}
// Start with the smallest set for optimal intersection
let smallestSet: Set<string> | null = null;
let smallestSize = Infinity;
const results: Set<string>[] = [];
for (const [key, value] of entries) {
const node = this.nodes.get(key);
const values = node?.get(value);
if (!node || !values) return [];
results.push(values);
if (values.size < smallestSize) {
smallestSize = values.size;
smallestSet = values;
}
}
if (results.length === 0) return [];
// Intersect using Sets
main: for (const values of results) {
for (const id of smallestSet!) {
if (!values?.has(id)) {
smallestSet!.delete(id);
if (smallestSet!.size === 0) break main;
break;
}
}
}
return Array.from(smallestSet!).map((id) => this.base.get(id));
}
// Multi-attribute relational operations
operator<T extends Record<string, any>>(
query: T,
operator: Record<keyof T, "eq" | "lt" | "gt" | "lte" | "gte" | "like">
): Record<string, any>[] {
const entries = Object.entries(query);
if (entries.length === 0) {
return Array.from(this.base.values());
}
const results: Set<string>[] = [];
// Start with the smallest set for optimal intersection
let smallestSet: Set<string> | null = null;
let smallestSize = Infinity;
for (const [key, value] of entries) {
const node = this.nodes.get(key);
if (!node) return []; // Key not found, return empty results
const combinedSet = new Set<string>();
const op = operator[key]; // get the operator
for (const [k, valSet] of node) {
let match = false;
switch (op) {
case "like":
match = k.includes(value);
break;
case "gt":
match = k > value;
break;
case "lt":
match = k < value;
break;
case "gte":
match = k >= value;
break;
case "lte":
match = k <= value;
break;
case "eq":
default: // defaults to eq
match = k === value;
break;
}
if (match) {
for (const item of valSet || []) {
combinedSet.add(item);
}
}
}
if (combinedSet.size > 0) {
results.push(combinedSet);
// Finding the smallest Set
if (combinedSet.size < smallestSize) {
smallestSize = combinedSet.size;
smallestSet = combinedSet;
}
}
}
// return if no results
if (results.length === 0) return [];
// Intersect using Sets
main: for (const values of results) {
for (const id of smallestSet!) {
if (!values?.has(id)) {
smallestSet!.delete(id);
if (smallestSet!.size === 0) break main;
break;
}
}
}
// map data to values
return Array.from(smallestSet!).map((id) => this.base.get(id));
}
// Multi-attribute count
count(query: Record<string, any> | true): number {
if (query === true) {
return this.base.size;
}
const entries = Object.entries(query);
// Start with the smallest set for optimal intersection
let smallestSet: Set<string> | null = null;
let smallestSize = Infinity;
for (const [key, value] of entries) {
const node = this.nodes.get(key);
const values = node?.get(value);
if (!node || !values) return 0;
if (values.size < smallestSize) {
smallestSize = values.size;
smallestSet = values;
}
}
// Intersect using Sets
const result = new Set(smallestSet);
for (const [key, value] of entries) {
const node = this.nodes.get(key);
const values = node?.get(value);
for (const id of result) {
if (!values?.has(id)) {
result.delete(id);
}
}
}
return result.size;
}
}
// Example Usage
const indexer = new Xtree();
// Index data
indexer.index("2", { name: "john doe", age: 1 });
indexer.index("1", { name: "john paul", age: 2 });
indexer.index("3", { name: "john friday", age: 3 });
indexer.index("4", { name: "Jude francis", age: 4 });
indexer.index("5", { name: "john thomas", age: 5 });
const a = indexer.search({ name: "john doe" });
const b = indexer.operator({ age: 2 }, { age: "lt" });
console.log({ a, b });
console.assert(a.length === b.length, "Length check failed");
console.assert(a[0].age === b[0].age, "Length age failed");
console.assert(a[0].name === b[0].name, "Length name failed");
@FridayCandour
Copy link
Author

FridayCandour commented Dec 13, 2023

😄 Hello share your Contributions, suggestions, and improvements on telegram Uiedbook

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment