Skip to content

Instantly share code, notes, and snippets.

@MeMartijn
MeMartijn / jina_text_segmenter.py
Created October 29, 2024 15:01
Jina AI's Segmenter ported to Python
import regex
from typing import List
# Define constants
MAX_HEADING_LENGTH = 7
MAX_HEADING_CONTENT_LENGTH = 200
MAX_HEADING_UNDERLINE_LENGTH = 200
MAX_HTML_HEADING_ATTRIBUTES_LENGTH = 100
MAX_LIST_ITEM_LENGTH = 200
MAX_NESTED_LIST_ITEMS = 6