This paper introduces a novel, highly efficient indexing algorithm implemented in TypeScript. The algorithm is designed to manage large-scale datasets by leveraging a tree-based structure optimized for rapid indexing, search, and deletion operations. It accommodates unique string identifiers, akin to UUIDs, ensuring clarity and performance. The algorithm is particularly well-suited for applications requiring dynamic attribute-based querying with consistent low-latency performance.
Efficient indexing algorithms are pivotal for modern data management systems. Traditional indexing structures often struggle to balance speed and memory efficiency while accommodating dynamic schemas. This work presents a groundbreaking approach that combines:
- Attribute-Specific Nodes: Each attribute maintains its own node in the tree structure, enabling targeted and efficient queries.
- ID-Centric Design: All data operations are indexed via unique string-based identifiers.
- Optimized Multi-Attribute Search: A strategy to minimize set intersections during complex queries.
- Optimized Relational Operator Multi-Attribute Search: A strategy to seelct based on relational operators.
-
Index Tree:
- Composed of a
Map
for the base storage (base
) and aMap
of attribute nodes (nodes
). - Each node tracks mappings between attribute values and sets of IDs.
- Composed of a
-
Index Node:
- Attributes are stored as keys, with their values mapping to sets of IDs.
- Objective: Insert or update a record in the index.
- Steps:
- Store the data against the provided ID in
base
. - For each attribute-value pair in the data:
- Retrieve or create the corresponding attribute node.
- Add the ID to the set associated with the value.
- Store the data against the provided ID in
- Objective: Retrieve IDs matching a specific attribute and value.
- Steps:
- Locate the node for the attribute.
- Retrieve the set of IDs for the specified value.
- Objective: Retrieve IDs matching multiple attribute-value pairs.
- Steps:
- Sort query attributes by the size of their value sets.
- Iteratively intersect the sets, starting with the smallest.
- Objective: Remove all mappings associated with a specific ID.
- Steps:
- Retrieve the data using the ID from
base
. - For each attribute-value pair in the data:
- Remove the ID from the corresponding value set.
- Clean up empty sets or nodes as needed.
- Remove the ID from
base
.
- Retrieve the data using the ID from
-
Attribute-Specific Nodes:
- Isolating attributes reduces unnecessary operations during insertion and querying.
-
Set-Based Intersection for Multi-Attribute Search:
- Sorting attributes by set size minimizes computational overhead.
-
String-Based ID Optimization:
- Native JavaScript
Map
andSet
are leveraged for efficient string key handling.
- Native JavaScript
- Indexing: O(n), where n is the number of attributes in the data.
- Search: O(1) for single-attribute queries (average case).
- Multi-Attribute Search: O(m * k), where m is the number of attributes in the query and k is the average size of their value sets.
- Relational Operatior Search: O(n⋅m⋅s+t⋅(n−1)) Where n = number of query attributes, m = number of keys in node.valueMap, s = average size of Sets in node.valueMap and t = size of the smallest Set.
- Deletion: O(n), proportional to the number of attributes in the data.
- Scales linearly with the number of indexed records and attributes.
- Efficient use of shared sets for duplicate values across records.
function indexRecord(id, record):
base[id] = record
for (attribute, value) in record:
if nodes[attribute] does not exist:
nodes[attribute] = new Map()
if value not in nodes[attribute]:
nodes[attribute][value] = new Set()
nodes[attribute][value].add(id)
function search(attribute, value):
if attribute not in nodes:
return empty set
return nodes[attribute].get(value, empty set)
function multiAttributeSearch(query):
sortedQuery = sort query attributes by size of their value sets in nodes
result = full set of IDs (initial state)
for (attribute, value) in sortedQuery:
result = result intersect search(attribute, value)
if result is empty:
break
return result
function deleteRecord(id):
if id not in base:
return
record = base[id]
for (attribute, value) in record:
if attribute in nodes and value in nodes[attribute]:
nodes[attribute][value].delete(id)
if nodes[attribute][value] is empty:
remove nodes[attribute][value]
if nodes[attribute] is empty:
remove nodes[attribute]
delete base[id]
(TBD: Benchmark results comparing this algorithm against existing approaches, showcasing improvements in latency and memory efficiency.)
The proposed indexing algorithm offers a robust and efficient solution for dynamic, attribute-based data queries. Its innovative design addresses common challenges in indexing large, schema-less datasets, making it a valuable tool for modern data-intensive applications. Further research could explore its integration with distributed systems and advanced caching mechanisms.
- Extending the algorithm for distributed systems with consistency guarantees.
- Investigating adaptive strategies for skewed attribute distributions.
- Enhancing batch indexing throughput using parallelism.
Authors:
- Friday (Fullstack Engineer)
😄 Hello share your Contributions, suggestions, and improvements on telegram Uiedbook