• Hynek Boril, Ph.D.


      Research Associate

      Center for Robust Speech Systems (CRSS)

      Erik Jonsson School of Engineering and Computer Science

      The University of Texas at Dallas

Tools


Pitch Tracker DTFE (Direct Time Domain Frequency Estimator)

DTFE (also denoted DFE) is a novel algorithm for fundamental frequency estimation and voiced/unvoiced (V/UV) classification performed directly in the time domain. The algorithm is designed to provide real-time pitch detection with time and frequency resolution comparable or superior to autocorrelation-based schemes while significantly reducing computational costs. The DTFE algorithm comprises spectral shaping, adaptive thresholding, and F0 candidate selection based on consistency criteria. The primary application is on clean speech signals (close-talk channels).

  • dtfe.exe - MS Windows binary, requires Microsoft .NET framework (available through MS Windows Update)

References

    Boril, H. and Pollák, P. (2004). “Direct time domain fundamental frequency estimation of speech in noisy conditions”, in Proc. EUSIPCO 2004, volume 1, 1003 - 1006 (Vienna, Austria). [pdf] [cited] [bib]

    Boril, H. (2008). “Robust speech recogniton: Analysis and equalization of Lombard effect in Czech corpora,” Ph.D. dissertation, Czech Technical University in Prague, Czech Republic (Section 4.1, pp. 30-41). [pdf] [cited] [bib]

    Boril, H. and Pollák, P. (2006). “Pitch-marking Based on the DFE Algorithm.” Lecture, 6th ECESS and TC-STAR WP3 Meeting (Berlin, Germany). [pdf] [bib]

Executing dtfe.exe

  • DTFE is executed with two command line parameters: 'dtfe.exe <input_wave_file> <output_F0_text_file>'.
  • The input is required to be a single channel (mono) sound file in the Windows PCM '.wav' format. The following sample frequencies are supported: 8k, 16k, 22.05k 32k, 44.1k, 48k, 96k, 192k (Hz).
  • The output text file contains two colums - F0 estimates in the first column followed by corresponding time labels in seconds in the second column. Note that each frequency estimate occupies two consecutive lines representing the F0 sample's onset and offset times, respectively. The exception are the first and last row of the file which denote the boundaries of the onset of the first (or offset of the last) voiced island in the waveform - see the example below.
  • DTFE can be conveniently executed from Matlab - see the example code below.
  • The wav file used in the example can be downloaded here.

    inputFile = 'example.wav';
    outputFile = 'example.txt';
    dos(['dtfe.exe ' inputFile ' ' outputFile]);
    [frequency, time] = textread('example.txt', '%f %f');
    plot(time, frequency);

    DTFE Example

 

Last Updated 2-6-2012