Skip to content

Instantly share code, notes, and snippets.

@zoltanctoth
Created March 12, 2025 07:04
Show Gist options
  • Save zoltanctoth/1c37bd3845efaf665b5d361620e8e663 to your computer and use it in GitHub Desktop.
Save zoltanctoth/1c37bd3845efaf665b5d361620e8e663 to your computer and use it in GitHub Desktop.
mlflow-llm-as-a-judge-example
professionalism = mlflow.metrics.genai.make_genai_metric(
name="professionalism",
definition=(
"Professionalism refers to the use of a formal, respectful, and appropriate style of communication that is "
"tailored to the context and audience. It often involves avoiding overly casual language, slang, or "
"colloquialisms, and instead using clear, concise, and respectful language."
),
grading_prompt=(
"Professionalism: If the answer is written using a professional tone, below are the details for different scores: "
"- Score 0: Language is extremely casual, informal, and may include slang or colloquialisms. Not suitable for "
"professional contexts."
"- Score 1: Language is casual but generally respectful and avoids strong informality or slang. Acceptable in "
"some informal professional settings."
"- Score 2: Language is overall formal but still have casual words/phrases. Borderline for professional contexts."
"- Score 3: Language is balanced and avoids extreme informality or formality. Suitable for most professional contexts. "
"- Score 4: Language is noticeably formal, respectful, and avoids casual elements. Appropriate for formal "
"business or academic settings. "
),
examples=[professionalism_example_score_2, professionalism_example_score_4],
model="openai:/gpt-4o-mini",
parameters={"temperature": 0.0},
aggregations=["mean", "variance"],
greater_is_better=True,
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment