Classes | Typedefs | Functions
s3_align.h File Reference

data structure for alignment More...

#include <logmath.h>
#include <s3types.h>

Go to the source code of this file.

Classes

struct  align_stseg_s
 
struct  align_phseg_s
 
struct  align_wdseg_s
 

Typedefs

typedef struct align_stseg_s align_stseg_t
 
typedef struct align_phseg_s align_phseg_t
 
typedef struct align_wdseg_s align_wdseg_t
 

Functions

int32 align_init (mdef_t *_mdef, tmat_t *_tmat, dict_t *_dict, cmd_ln_t *_config, logmath_t *_logmath)
 
void align_free (void)
 
int32 align_build_sent_hmm (char *transcript, int insert_sil)
 
int32 align_destroy_sent_hmm (void)
 
int32 align_start_utt (char *uttid)
 
void align_sen_active (uint8 *senlist, int32 n_sen)
 
int32 align_frame (int32 *senscr)
 
int32 align_end_utt (align_stseg_t **stseg, align_phseg_t **phseg, align_wdseg_t **wdseg)
 

Detailed Description

data structure for alignment

Typedef Documentation

◆ align_phseg_t

typedef struct align_phseg_s align_phseg_t

Phone level segmentation/alignment information

◆ align_stseg_t

typedef struct align_stseg_s align_stseg_t

State level segmentation/alignment; one entry per frame

◆ align_wdseg_t

typedef struct align_wdseg_s align_wdseg_t

Word level segmentation/alignment information

Function Documentation

◆ align_build_sent_hmm()

int32 align_build_sent_hmm ( char *  wordstr,
int  insert_sil 
)

Build a sentence HMM for the given transcription (wordstr). A two-level DAG is built: phone-level and state-level.

  • <s> and </s> always added at the beginning and end of sentence to form an augmented transcription.
  • Optional <sil> and noise words added between words in the augmented transcription. wordstr must contain only the transcript; no extraneous stuff such as utterance-id. Phone-level HMM structure has replicated nodes to allow for different left and right context CI phones; hence, each pnode corresponds to a unique triphone in the sentence HMM. Return 0 if successful, <0 if any error (eg, OOV word encountered).
Parameters
wordstrIn: Word transcript
insert_silIn: Whether to insert silences/fillers

References BAD_S3CIPID, BAD_S3PID, BAD_S3WID, pnode_s::ci, pnode_s::id, pnode_s::lc, pnode_s::next, slink_s::node, pnode_s::pid, pnode_s::predlist, pnode_s::rc, pnode_s::startstate, pnode_s::succlist, and pnode_s::wid.

◆ align_destroy_sent_hmm()

int32 align_destroy_sent_hmm ( void  )

◆ align_end_utt()

int32 align_end_utt ( align_stseg_t **  stseg_out,
align_phseg_t **  phseg_out,
align_wdseg_t **  wdseg_out 
)

All frames consumed. Trace back best Viterbi state sequence and dump it out.

Parameters
stseg_outOut: list of state segmentation
phseg_outOut: list of phone segmentation
wdseg_outOut: list of word segmentation

References snode_s::active_frm, snode_s::hist, align_stseg_s::next, align_phseg_s::next, align_wdseg_s::next, slink_s::next, slink_s::node, snode_s::predlist, and snode_s::score.

◆ align_frame()

int32 align_frame ( int32 *  senscr)

Step time aligner one frame forward Wind up utterance and return final result (READ-ONLY). Results only valid until the next utterance is begun.

One frame of Viterbi time alignment.

Parameters
senscrIn: array of senone scores this frame

◆ align_free()

void align_free ( void  )

◆ align_init()

int32 align_init ( mdef_t _mdef,
tmat_t _tmat,
dict_t _dict,
cmd_ln_t *  _config,
logmath_t *  _logmath 
)

◆ align_sen_active()

void align_sen_active ( uint8 *  senlist,
int32  n_sen 
)

Called at the beginning of a frame to flag the active senones (any senone used by active HMMs) in that frame.

Flag the active senones.

Parameters
senlistOut: senlist[s] TRUE iff active in frame
n_senIn: Size of senlist[] array

References IS_S3SENID, and snode_s::sen.

◆ align_start_utt()

int32 align_start_utt ( char *  uttid)

Start Viterbi alignment using the sentence HMM previously built. Assumes that each utterance will only be aligned once; state member variables initialized during sentence HMM building.