top of page

Portfolio

Linguist (PhD, Boston University) with expertise in sociolinguistic research, dataset curation, and applied data science.

My portfolio highlights experience creating and curating linguistic datasets, designing annotation schemes, and supporting machine learning workflows. I specialize in bridging research and application—translating complex language data into insights and tools that advance both academic projects and industry technologies.

Language/Linguistic Data Creation and Analysis
 

Spanish in Boston Project (Boston University, PhD Research)

  • Managed collection, curation, and quality assurance (QA) of sociolinguistic datasets.

  • Designed annotation guidelines for novel variables to standardize workflows and improve data quality.

  • Built datasets for academic research (e.g., variation in Spanish liquids).

  • Supervised and trained student assistants in annotation and QA workflows as Lab Manager for the Spanish in Boston Project.

  • Led the full lifecycle of my dissertation project—from data design and collection to statistical modeling and visualization—demonstrating end-to-end research and data management skills.

Mirror Principle Violations Project

  • Surveyed descriptive materials across various languages.

Cogito Corporation (2022–2023, Data Annotator — Machine Learning Annotation)

  • Processed speech and language data for machine learning model development.

  • Created unique annotated datasets for internal and external clients.

  • Conducted prompt engineering for AI models to improve task accuracy.

  • Annotated audio for emotional engagement, rate of speech, energy level, and customer/agent experience.

  • Tested pre-trained ML language models and provided calibration suggestions.

  • Handled dynamic annotation requests across teams and contributed to workflow improvements.

D
a
t
a

Research Methods

Qualitative

  • Conducted linguistic fieldwork with Puerto Rican Spanish speakers in Puerto Rico and Louisiana. 

  • Designed interview protocols for both exploratory research and hypothesis testing.

Quantitative

Applied UX & Industry Insights

  • Provided sociolinguistic insights to conversational AI teams, informing user experience (UX) and model design.

  • Delivered research-based recommendations (e.g., conversational pause-fillers, speech patterns) that influenced annotation strategies and model training.

Me
t
h
o
d
s

Technical Experience

Statistical & Data Tools: Rregex • Python (developing proficiency) • Excel/Sheets

Linguistic & Writing Tools: Praat (app & scripting) • LaTeX • ELAN

Workflow & Version Control: bash/terminal • git/github

Scripting: Scripting for dataset processing, QA, and workflow optimization.​

Regex testing: Built and refined patterns for clarity/accuracy in text processing.​

ML workflows: Annotated, tested, and QA’d speech & language datasets for model development.

T
e
c
h

Let's Get Social!

  • email icon in #677E76
  • linkedin logo in #677E76
  • github logo in #677E76
  • google scholar logo in #677E76
bottom of page