research.html

<!DOCTYPE html>
<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
        <title>Sam Ribeiro</title>
        <link href="https://fonts.googleapis.com/css?family=Roboto:100,300,300i,400,500,700,900" rel="stylesheet">
        <link rel="stylesheet" type="text/css" href="mystyle.css">
    </head>
    
    <body>
        <header id="header">
            <!-- <img src="images/profile.jpg"> -->
            <h1 style="text-align:center">Sam Ribeiro</h1>
        </header>

        <header id="nav">
            <table>
              <tr>
                <td><a href="index.html">Home</a></td>
                <td><a href="about.html">About</a></td>
                <td><a href="research.html">Research</a></td>
              </tr>
            </table>
        </header>

        <div id="wrapper">
        
            <main>
            <div id="content">
                    <div class="innertube">

                <p style="line-height:0.9">
                <small><i line-height>
                    Earlier academic research projects and speech corpora. All research listed here was done prior to my current employment. See  <a href="https://scholar.google.com/citations?user=VdV_-40AAAAJ" target="_blank">Google Scholar</a> for my full publication list.
                </i></small></p>

                </br>

                <h2> Speech Corpora </h2>

                <p> <b>TaL: The Tongue and Lips Corpus</b> </p>
                <p style="line-height: normal">
                    <small>
                        The Tongue and Lips (TaL) corpus is a multi-speaker corpus of ultrasound images of the tongue and video images of lips. This corpus contains synchronised imaging data of extraoral (lips) and intraoral (tongue) articulators from 82 native speakers of English. The TaL corpus was collected under the <i>Silent Speech Interfaces for all</i> project (Carnegie Trust for the Universities of Scotland Research Incentive Grant – grant number RIG008585).

                        [<a href="https://ultrasuite.github.io/papers/tal_corpus_SLT2021.pdf" target="_blank">paper</a> |
                        <a href="https://ultrasuite.github.io/data/tal_corpus/" target="_blank">documentation</a> |
                        <a href="https://github.com/UltraSuite/tal-tools" target="_blank">code</a> | 
                        <a href="https://ultrasuite.github.io/data/tal_corpus/#download" target="_blank">data</a>]
                    </small>
                </p>

                <p> <b>The UltraSuite Repository</b> </p>
                <p style="line-height: normal">
                    <small>
                        UltraSuite is a repository of synchronized ultrasound and acoustic data from child speech therapy sessions.
                        Ultrasound tongue imaging (UTI) uses standard medical ultrasound to visualize the tongue surface during speech production.
                        It is increasingly being used for speech therapy, making it important to develop automatic methods to assist various 
                        time-consuming manual tasks currently performed by speech therapists.
                        The UltraSuite repository includes three data sets: one from typically developing children and two from children with speech sound disorders.
                        [<a href="https://ultrasuite.github.io/papers/ultrasuite_IS18.pdf" target="_blank">paper</a> |
                        <a href="https://ultrasuite.github.io" target="_blank">documentation</a> |
                        <a href="https://github.com/UltraSuite" target="_blank">code</a> | 
                        <a href="https://ultrasuite.github.io/download" target="_blank">data</a>]
                    </small>
                </p>

                <p><b>Parallel Audiobook Corpus</b> </p>
                <p style="line-height: normal">
                    <small>
                    The Parallel Audiobook Corpus (version 1.0) is a collection of parallel readings of audiobooks.
                    The corpus consists of approximately 121 hours of data across 4 books and 59 speakers.
                    This corpus was prepared for speech synthesis, voice conversion, or prosody modelling.
                    [<a href="https://msamribeiro.github.io/parallel-corpus" target="_blank">documentation</a> |
                    <a href="https://datashare.is.ed.ac.uk/handle/10283/3217" target="_blank">data</a>]
                    </small>
                </p>

                <p><b>SIWIS Multilingual Database</b> </p>
                <p style="line-height: normal">
                    <small>
                        The SIWIS database is a parallel multilingual speech database with acted emphasis.
                        It includes recordings of 36 bilingual and trilingual speakers of English, French, German and Italian with applications
                        to speech to speech translation (S2ST).
                        The database was designed for various scenarios: training CLSA systems (cross-language speaker adaptation), 
                        conveying emphasis through S2ST systems, and evaluating TTS systems.
                        [<a href="http://publications.idiap.ch/downloads/papers/2016/Goldman_IS2016.pdf" target="_blank">paper</a> | 
                        [<a href="https://www.idiap.ch/project/siwis/downloads/siwis-database" target="_blank">data</a>]
                    </small>
                </p>

                </br>

                <h2> Projects </h2>
                <p>
                    <b> Silent Speech Interfaces for all</b></br>
                    <i> Recognising speech from ultrasound images of the tongue</i>
                </p>

                <p style="line-height: normal">
                
                <small>
                    Silent speech interfaces perform speech recognition and synthesis from articulatory data in order to restore spoken communication for users with voice impairments (for example, after laryngectomy) or to allow silent communication in situations where audible speech is undesirable. Much of the previous work in this area has focused on models learned on data from single speakers (called speaker-dependent models), which do not generalize to unknown speakers. This project proposes to investigate the first speaker-independent silent speech interface for continuous speech recognition from ultrasound images of the tongue. This interface will be benchmarked against a system trained on high-quality data from a single speaker (speaker-dependent model). Additionally, this project will investigate speaker adaptation techniques, which use small amounts of speaker-specific data to bridge the gap between speaker-dependent and independent systems.
                    </br>
                    Funded by the Carnegie Trust for the Universities of Scotland Research Incentive Grant – grant number RIG008585 (“Silent speech interfaces for all – recognising speech from ultrasound images of the tongue”).
                </small>
            </p>

            <p>
                <b> Other projects </b>

                <p style="line-height: normal">
                <small>
                During my PhD, I collaborated with the <b>SIWIS</b> (Spoken Interaction with Interpretation in Switzerland) project (<a href="https://www.idiap.ch/project/siwis/front-page" target="_blank">link</a>). During my post-doc, I was funded by the <b>Ultrax2020</b> project (<a href="https://www.ultrax-speech.org" target="_blank">link</a>).
                </small>
            </p>

                        <br>

                    </div>
                </div>
            </main>

        </div>
    </body>
</html>