Portfolio
Linguist (PhD, Boston University) with expertise in sociolinguistic research, dataset curation, and applied data science.
My portfolio highlights experience creating and curating linguistic datasets, designing annotation schemes, and supporting machine learning workflows. I specialize in bridging research and application—translating complex language data into insights and tools that advance both academic projects and industry technologies.
Language/Linguistic Data Creation and Analysis
Spanish in Boston Project (Boston University, PhD Research)
-
Managed collection, curation, and quality assurance (QA) of sociolinguistic datasets.
-
Designed annotation guidelines for novel variables to standardize workflows and improve data quality.
-
Built datasets for academic research (e.g., variation in Spanish liquids).
-
Supervised and trained student assistants in annotation and QA workflows as Lab Manager for the Spanish in Boston Project.
-
Led the full lifecycle of my dissertation project—from data design and collection to statistical modeling and visualization—demonstrating end-to-end research and data management skills.
Mirror Principle Violations Project
-
Surveyed descriptive materials across various languages.
Cogito Corporation (2022–2023, Data Annotator — Machine Learning Annotation)
-
Processed speech and language data for machine learning model development.
-
Created unique annotated datasets for internal and external clients.
-
Conducted prompt engineering for AI models to improve task accuracy.
-
Annotated audio for emotional engagement, rate of speech, energy level, and customer/agent experience.
-
Tested pre-trained ML language models and provided calibration suggestions.
-
Handled dynamic annotation requests across teams and contributed to workflow improvements.
D
a
t
a
Research Methods
Qualitative
-
Conducted linguistic fieldwork with Puerto Rican Spanish speakers in Puerto Rico and Louisiana.
-
Designed interview protocols for both exploratory research and hypothesis testing.
Quantitative
-
Designed and ran an online study on Spanish word order (Qualtrics • Prolific); results published in conference proceedings (2023, DOI).
-
Conducted coding, extraction, and statistical analysis of various sociolinguistic datasets.
-
Applied probabilistic methods to investigate specific linguistic variables (e.g., liquid use in Spanish), forming the quantitative foundation of my dissertation project, which involved managing and analyzing a 24K-token dataset.
Applied UX & Industry Insights
-
Provided sociolinguistic insights to conversational AI teams, informing user experience (UX) and model design.
-
Delivered research-based recommendations (e.g., conversational pause-fillers, speech patterns) that influenced annotation strategies and model training.
Me
t
h
o
d
s
Technical Experience
Statistical & Data Tools: R • regex • Python (developing proficiency) • Excel/Sheets
Linguistic & Writing Tools: Praat (app & scripting) • LaTeX • ELAN
Workflow & Version Control: bash/terminal • git/github
Scripting: Scripting for dataset processing, QA, and workflow optimization.
Regex testing: Built and refined patterns for clarity/accuracy in text processing.
ML workflows: Annotated, tested, and QA’d speech & language datasets for model development.