I am the co-founder and CTO of ELSA, an AI-powered application to help language students improve their English communication skills.

At ELSA I built the engineering and research teams and currently I focus my time on developing our AI technology.

Prior to ELSA, I created an edtech startup called Sinkronigo that published speech-enabled ebooks for language learning.

Earlier on I was one of the founding researchers in the multimedia research group at Telefonica R&D, in Barcelona, where I pursued research in speech and multimedia processing.

I hold a Ph.D. in speech processing and I am the coauthor of over 125 research publications. I am also the co-inventor in multiple patents and an active contributor to the open source community (if you worked in the area of multi-microphone speech processing you probably heard about the BeamformIt software).

I am an Electrical engineer, turned speech and multimedia researcher, turned entrepreneur.

I was born in Tarragona, an ancient Roman Empire Capital in the Mediterranean coast of Spain.

I am the single child of a family of farmers that moved to Tarragona right before I was born and established a family business selling and repairing home appliances.

I was thus raised in between TVs under test, and got to master soldering at an early age.

Currently I live in Lisbon, Portugal, with my wife and 2 kids, I enjoy Portuguese good coffee and "pasteis de nata" and how wellcoming Portuguese people are always with me.

My Headshot

[name_initial]+[last_name] @ gmail+[dot]+com


You can find my CV in pdf version in here. In there you will find a complete list of PhD and Msc. student theses I co-directed, as well as more information on my duties in each of my positions across the years
I have been very fortunate to having worked in academia, academic and corporate research and in a startup environment.

You can also visit my linkedin profile for a summarized version of by professional path and to get updated with what I am up to. I do not publish a lot, but I like to post there from time to time.

I am always eager to learn end experience new things. Do you have a new project idea or need advice in your idea, get in touch!
my email: [name_initial]+[last_name] @ gmail+[dot]+com


Loosely ordered by topics:

Language learning

Speech Segmentation and Clustering

Audio Fingerprinting

Dinamic Time Warping and Applications

Zero-Resource Speech Processing

Query-by-Example Voice Search

Sports analytics

Voice Biometrics

Content-based Video-Copy Detection (CV-VCD)

Multimedia & Mobile Computing

Speaker Diarization - Multiple channels

Speaker Diarization and clustering - Core algorithms

Speech Recognition


Ph.D. Thesis

In 2006 I defended my Ph.D. Thesis titled "Robust Speaker Diarization for Meetings".
The research for the thesis was done in between UPC Barcelona (under supervision of Prof. Javier Hernando) and ICSI Berkeley (under supervision of Dr. Chuck Wooters).
I arrived at ICSI right when the Speaker Diarization and Speech Recognition communities started to shift focus from analyzing single-channel Broadcast News recordings to multi-channel meeting recordings.
My first important contribution was to propose adding a signal preprocessing step to any speech analysis to obtain a single (enhanced) speech recording, obtained via the weighted combination of all available channels, with an acoustic beamforming algorithm. From this work I later released the open source tool BeamformIt software which is still currently considered a baseline in this and related areas.
In addition, I worked on many algorithmic improvements to the Agglomerative Speaker Diarization system we used at ICSI, resulting in our system being the top performer during the NIST Speaker Diarization Campaigns of 2004 and 2005, when I led the ICSI submissions for Diarizartion.

You can browse the document online (see link above, there might be some pdf->html conversion errors) or download the pdf file.