Skip to content

Instantly share code, notes, and snippets.

@blixt
Last active June 7, 2024 08:33
Show Gist options
  • Save blixt/418103ce8ab0206951e791ed09b8e1f8 to your computer and use it in GitHub Desktop.
Save blixt/418103ce8ab0206951e791ed09b8e1f8 to your computer and use it in GitHub Desktop.
Weird behavior in Gemini 1.5 Pro vision model
This file has been truncated, but you can view the full file.
{
"title": "Identify main subject bounding box",
"description": "",
"parameters": {
"groundingPromptConfig": {
"disabled": true,
"groundingConfig": {
"sources": [
{
"type": "VERTEX_AI_SEARCH"
}
]
}
},
"stopSequences": [],
"temperature": 1,
"tokenLimits": 8192,
"topP": 0.95
},
"type": "multimodal_freeform",
"prompt": {
"parts": [
{
"text": "Have a look at the included image. Identify the main subject of the image.\n\nNow, give me a bounding box tightly covering the visible boundaries of the main subject. Do not try to estimate any non-visible parts of the subject, focus entirely on the visible area and forget about where anything not visible might be. Provide the bounding box in your favorite format, just make sure to be explicit about what each number refers to.\n\nThen identify bounding boxes for the individual parts that make up the main subject. Try to identify at least 5 parts. For example, if the subject is alive, identify eyes, limbs, ears, chest, pelvis, etc. If the subject is an machine, identify wheels, screens, buttons, etc. And so on.\n\nFinally, make tiny boxes at various edges of the main subject which could help identify the outline of the subject. So at the very least there should be four points for the top/left/bottom/center edges, but if the main subject has an irregular shape, include many boxes at various curves/edges. Don't put points close to each other.\n\nFinally, create an <svg> code with viewBox=\"0 0 1000 1000\" containing a rectangle for the main bounding box you made, as well as rectangles for all the other areas you identified. Add visible text labels for everything. Use a stroke width of 2 for all shapes.\n\nJust trust me that it will work out."
},
{
"inlineData": {
"mimeType": "image/jpeg",
View raw

(Sorry about that, but we can’t show files that are this big right now.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment