Skip to content

Instantly share code, notes, and snippets.

View surister's full-sized avatar
🥵

Ivan surister

🥵
View GitHub Profile
@surister
surister / uuid_overhaul.md
Last active March 17, 2025 18:23
Crate UUID overhaul.

Right now CrateDB has ..

What other databases are doing:

  • postgres: In postgres 18, uuid4() (RFC), uuid7(rfc), gen_random_uuid() (uuid4 RFC)[^1]
  • tinybird: generateUUIDv4() (RFC) and clickhouse[^2]
  • cockroachdb: gen_random_uuid, uuid_v4 (RFC) and unique_rowid (timebased sortable)[^3]
  • singlestore: UUID4 (RFC)[^4]
  • firebase: UUID4, custom token (timebased sortable)[^5]
  • elasticsearch: elasticflake (like us), k-ordered (timebased sortable)
@surister
surister / docker-compose.yml
Created February 20, 2025 21:08
CrateDB docker compose cluster - one command
services:
cratedb01:
image: crate/crate:5.10.1
ports:
- "4200:4200"
- "5432:5432"
volumes:
- cratedb1:/data
command: ["crate",
"-Chttp.cors.enabled=true",
@surister
surister / abstract.md
Created January 29, 2025 13:18
europython_Abstract.md

How to build a hybrid search service in Python with CrateDB.

In this talk Ivan, a Database Ecosystem Engineer at @CrateDB is going to show you how to create from scratch a hybrid search (key-word search and vector search ) service in Python with CrateDB.

He will give/explain you:

  • what is and why we want hybrid search
  • have a quick look at CrateDB
  • how to create Python client library to do hybrid search
  • how to use FastApi to build a web service
  • how to use everything we created to make a documentation search service for our technical documentation at CrateDB.
Query ran at 2025-01-27T21:35:03.043Z on CrateDB 6.0.0
SELECT
  table_name,
  SUM(num_docs) as records,
  (SUM(size) / (1024 * 1024)) as total_size_mib,
  (SUM(size) / count(*)) / (1024 * 1024) as avg_size_per_shard_in_mib,
  (SUM(size) / SUM(num_docs) :: DOUBLE) as avg_size_in_bytes_per_record
FROM
  sys.shards
@surister
surister / Math.vue
Last active December 7, 2024 10:59
Katex on Vue3 - Math render in Vue3 - Vue - no plugins.
<!--
In order to display math equations in vue3, you need to use something like mathjax or katex, I prefer
katex since it seems to be the most powerful solution.
vue3 katex libraries are mostly unmaintained or don't properly work on my setup, as of 2024-12-07 they bug out
on my latest vue3 + nuxt projects, this is the most simple way I made it to work.
You only need to run
@surister
surister / abs.md
Last active December 3, 2024 11:09
CrateDB - Storage usage on disk

CrateDB - Storage usage on disk

CrateDB stores data in a row and column store, on top of that, it automatically creates an index, on reads the index will be leveraged, and depending on the query, it will use the most efficient store.

This is one of the many features that makes CrateDB very fast when reading and aggregating data, but it has an impact on storage.

We are going to use Yellow taxi trip - January 2024 which has 2_964_624 rows

@surister
surister / abstract.md
Last active November 21, 2024 11:16
FOSDEM 2025

Hybrid Index: The secret to blazingly fast queries on any data structure @ CrateDB

One of the most effective ways to improve query performance is through indexing. At CrateDB, we said, what's faster than one index? everything indexed! - We took the bold approach: indexing every column by default. But we didn't stop there—we leverage multiple data structures for every indexed column. At query time, CrateDB intelligently selects the optimal index based on the query type, enabling faster and more efficient results.

But you probably have many questions. Does this actually work? How did you do it? Isn't there a performance penalty on write speed? And updates? How about storage size?

In this talk we will tell you all about Hybrid Idexes, one of the fundamental aspects of CrateDB: an Open-source distributed SQL Database for Real-Time Analytics and Hybrid Search.


@surister
surister / intohybridcte.md
Last active August 28, 2024 10:24
Dissecting hybrid search SQL queries (CTEs)

In https://cratedb.com/blog/hybrid-search-explained we learned about Hybrid Search and how to do it in pure SQL, the resulting query can be hard to understand if you don't have too much experience doing Common table expressions (CTEs), in this piece we will dive deeper into CTEs and the smaller details of the query.

Recap

In the last chapter, we learned that Hybrid Search is pretty much doing some queries that capture different meanings and combine them, don't forget about this as we will see how CTEs are very similar.

##Common Table Expressions.

CTEs are subqueries that can be referenced from a main query, they are temporal, meaning that outside of this main query, they do not exist.

There are already a bunch of hybrid search in haystack past conferences:
EU 2023: (Mastering Hybrid Search: Blending Classic Ranking Functions with Vector Search for Superior Search Relevance)[https://haystackconf.com/eu2023/talk-10/]
EU 2023: (Reciprocal Rank Fusion (RRF) or How to Stop Worrying about Boosting)[https://haystackconf.com/eu2023/talk-2/]
US 2024: (All Vector Search is Hybrid Search)[https://haystackconf.com/us2024/talk-1/]
US 2024: (Better Semantic Search with Hybrid (Sparse-Dense) Search)
# Doing hybrid search on your real-time data in pure SQL with CrateDB's index-all strategy.
Points to highlight:
@surister
surister / hybrid_index.md
Last active October 6, 2024 10:41
# Hybrid index: The magic behind the extremely fast queries in CrateDB

The magic behind the extremely fast queries in CrateDB: Hybrid index.

It's no secret that CrateDB has very fast query times, milliseconds in very big datasets. There are many factors that help accomplish this feat, some of these exist in other databases like compute distribution and join optimization but there is one unique trait of CrateDB among SQL databases, we call it Hybrid Index.

The distributed nature of CrateDB.

CrateDB was conceived from the very beginning to be distributed, a cluster is usually deployed with several nodes, tables are distributed in shards and shards/replica-shards live in the nodes.