Skip to content

Instantly share code, notes, and snippets.

@gabrielStanovsky
Last active May 17, 2023 06:27
Show Gist options
  • Save gabrielStanovsky/91357269944a24a261fe4a9a3cdf5236 to your computer and use it in GitHub Desktop.
Save gabrielStanovsky/91357269944a24a261fe4a9a3cdf5236 to your computer and use it in GitHub Desktop.

User Persona 1: The Social Scientist

  • Typical Background: An academic researcher from a social science faculty with a large corpus of texts in their respective field (e.g., legal studies, sociology, history, etc.).

  • Wants: to explore their texts and find suggestions for recurring patterns in it. This is an interactive process, once they find some pattern in the data, they start to form a quantiative question about the data that they want to answer. Typically interested in good coverage -- would like to extract all occurances of the pattern they're interested in, as the absolute number of occurances would have implications for answering their research questions.

  • Technical background: No coding or linguistic background. Know how to conduct statistical experiments (statistical significance, etc.).

  • UI needs: Simple as possible, inviting open exploration.

  • Example: Dr. Renana Keydar, a legal scholar, who's interested in judicial additudes towards defendents in trials. In this work we used Spike to find whether the judge finds the tesimoney credible or not, and how that affected the trial's outcome. In this research we wanted to observe trends over time, hence coverage was indeed crucial.

User Persona 2: The Lab Scientist

  • Typical Background: An academic researcher from a natural science faculty with a large corpus of texts describing lab results, expriment protocols, scientific papers, in e.g., biology, chemestry, etc.

  • Wants: To find specific phyical values to inform their ongoing research. As opposed to Persona 1 typically has a well-formed question in mind which they want to answer, similar to populating a knowledge base where the column headers are known in advance, and they want to fill it in from a large, evolving body of scientific literature. Here percision is important as opposed to coverage, would like to get the percise units of the values they're extracting along with all relevant physical context for that measurement to hold.

  • Technical background: Varying degrees of coding skills, knows how to work with databases (Excel, SQL, etc.), and often scripts to manipulate data formatting. No linguistic background.

  • UI needs: Would manage with some technical UI to configure their query, e.g., to adjust the specific units of their values, etc.

  • Example: Dr. Barak Raveh, a bioinformatics researcher, who's developing simulators for biological processes and would like to extract values from scientific literature (e.g., the amount of red blood cells in certain animals in certain conditions) to plug into his models. Since these would affect the behavior of the simulation, it's important to him to extract percise values and the context in which the value was obtained. Less minds missing some values to ensure this precision.

User Persona 3: The Computational Linguist

  • Typical Background: An academic researcher natural language processing.

  • Wants: To create a corpus towards human annotation, training new models, or testing existing ones on predefined linguistic phenomena. Would like to be able to map the extractions from Spike back to the original corpus to enrich with relevant context, or analyze the surrouding context in various ways.

  • Technical background: Knows coding and linguitic concepts employed in Spike, such as dependency trees, part of speech, regular expressions, etc.

  • UI needs: Would need the full spectrum of Spike internals to understand the parsing process, tokenization, lemmatization, etc.

  • Example: Gabi Stanovsky, e.g., in this work we used Spike to create a corpus to train and test machine translation on sentences which are specifcally hard for gender-biased models, as they present non-stereotypical assignments (e.g., female doctors and male nurses).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment