Portfolio

NLP Pipelines for Corporate and Policy Data

During my internship at Goods Unite Us, I developed NLP-driven data pipelines to process large-scale corporate and policy datasets, including SEC filings and political contribution records. This project focused on extracting, normalizing, and integrating unstructured data from into structured insights to support transparency, check reliable patterns and integrated to the Mobile App.

Building a Reproducible Transcript Cleaning Pipeline in R [Ongoing Project]

Designed and refactored an R-based pipeline to clean, normalize, and restructure WebVTT transcripts for the digitalization and preprocessing of a large-scale sociolinguistic corpus of bilingual speech in the U.S.–Mexico border, focusing on transcript normalization and timestamp reconstruction using R.