PyTorch Models and Tools

The module hyperion.torch provides utilities, dataloaders, neural architectures and models based on PyTorch

Layers

These include several custom neural network layers.

Activation Function Layers

These includes a factory class the creates activation layers from config parameters, and custom activation layers.

class hyperion.torch.layers.activation_factory.ActivationFactory[source]

static create(activation, **kwargs)[source]

Creates a non-linear activation object

Parameters

activation – str with activation type, dictionary with name field indicating the activation type, and extra activation arguments None, then it returns None, Activation constructor
**kwargs – extra arguments for activation constructor

Returns

Non-linear activation object

static create_from_str(activation_name, **kwargs)[source]

Creates a non-linear activation object from string

Parameters

activation – str with activation type,
**kwargs – extra arguments for activation constructor

Returns

Non-linear activation object

static get_config(activation)[source]

class hyperion.torch.layers.swish.Swish(*args: Any, **kwargs: Any)[source]

forward(x)[source]

__init__(*args: Any, **kwargs: Any) → None

Normalization Layers

These includes a factory class the creates normalizaton layers from config parameters.

class hyperion.torch.layers.norm_layer_factory.NormLayer2dFactory[source]

static create(norm_name, num_groups=None, momentum=0.1, eps=1e-05)[source]

Creates a layer-norm callabe constructor

Parameters

norm_name –
str with normalization layer name, in [batch-norm, group-norm, instance-norm,

instance-norm-affine, layer-norm ]
num_groups – num_groups for group-norm
momentum – default momentum
eps – default epsilon for numerical stability

Returns

Callable contructor to crate layer-norm layers

class hyperion.torch.layers.norm_layer_factory.NormLayer1dFactory[source]

static create(norm_name, num_groups=None, momentum=0.1, eps=1e-05)[source]

Creates a layer-norm callabe constructor

Parameters

norm_name –
str with normalization layer name, in [batch-norm, group-norm, instance-norm,

instance-norm-affine, layer-norm ]
num_groups – num_groups for group-norm
momentum – default momentum
eps – default epsilon for numerical stability

Returns

Callable contructor to crate layer-norm layers

Dropout Layers

These include custom dropout and drop-connect layers

class hyperion.torch.layers.dropout.Dropout1d(*args: Any, **kwargs: Any)[source]

forward(inputs)[source]

__init__(*args: Any, **kwargs: Any) → None

class hyperion.torch.layers.dropout.DropConnect2d(*args: Any, **kwargs: Any)[source]

__init__(p=0.2)[source]

forward(inputs)[source]

class hyperion.torch.layers.dropout.DropConnect1d(*args: Any, **kwargs: Any)[source]

__init__(p=0.2)[source]

forward(inputs)[source]

Attention Layers

Attention layers like the ones used in Transformers and Conformers.

class hyperion.torch.layers.attention.ScaledDotProdAttV1(*args: Any, **kwargs: Any)[source]

Scaled dot product multihead attention layer

in_feats: input feature dimension

out_feats: output feature dimension

num_heads: number of heads

d_k: key/query projection dimension

d_v: value projection dimension

dropout_rate: dropout rate

time_dim: time dimension in the input, default=1 meaning input dimensions are (batch, time, in_feats)

__init__(in_feats, out_feats, num_heads, d_k, d_v, dropout_rate=0, time_dim=1)[source]

property in_feats

property out_feats

forward(query, key, value, mask=None)[source]

Computes ‘Scaled Dot Product Attention’.

Parameters

query – query with size=(batch, time1, in_feats), where time1 is the output time dimension
key – key with size=(batch, time2, in_feats) where time1 is the input time dimension
value – value with size=(batch, time2, in_feats)
mask – optional mask with size=(batch, time1, time2), to zero attention between some time steps or size=(batch, time) to make time1=time2

Returns

Attention weigthed average of the value with size=(batch, time1, out_feats)

class hyperion.torch.layers.attention.LocalScaledDotProdAttV1(*args: Any, **kwargs: Any)[source]

Local Scaled dot product multihead attention layer: It calculates self-attention between time steps within a window of ‘context’ frames.

in_feats: input feature dimension

out_feats: output feature dimension

num_heads: number of heads

d_k: key/query projection dimension

d_v: value projection dimension

context: maximum attention temporal context.

dropout_rate: dropout rate

time_dim: time dimension in the input, default=1 meaning input dimensions are (batch, time, in_feats)

__init__(in_feats, out_feats, num_heads, d_k, d_v, context=25, dropout_rate=0, time_dim=1)[source]: Construct an MultiHeadedAttention object.

static _softmax(scores1, scores2, shift1, shift2, t1, t2)[source]

Computes softmax for block diagonal attention maps

Parameters

scores1 – attention scores from block-diagonal score matrix with size=(batch, heads, blocks, t1, t2)
scores2 – attention scores from a shifted block-diagonal score matrix with size=(batch, heads, blocks-1, t1, t2)
shift1 – shift of diagonal blocks of scores2 wrt scores1 in time steps in the time dimension 1
shift2 – shift of diagonal blocks of scores2 wrt scores1 in time steps in the time dimension 2, with self-attention shift1=shift2
t1 – length of time dimension 1 (output time dimension)
t2 – length of time dimension 2 (input time dimension), with self-att t1=t2.

Returns

probs1: posterior attention scores for block-diagonal att. matrix: with size=(batch, heads, blocks, t1, t2)
probs2: posterior attention scores for a shifted block-diagonal att. matrix: with size=(batch, heads, blocks-1, t1, t2)

forward1(query, key, value, mask)[source]

Computes ‘Local Scaled Dot Product Attention’.

Parameters

query – query with size=(batch, time1, in_feats), where time1 is the output time dimension
key – key with size=(batch, time2, in_feats) where time1 is the input time dimension
value – value with size=(batch, time2, in_feats)
mask –

optional mask with size=(batch, time1, time2),
to zero attention between some time steps.

or (batch, time) if time1=time2

Returns

Attention weigthed average of the values with size=(batch, time1, out_feats)

forward2(query, key, value, mask)[source]

Computes ‘Local Scaled Dot Product Attention’.

Parameters

query – query with size=(batch, time1, in_feats), where time1 is the output time dimension
key – key with size=(batch, time2, in_feats) where time1 is the input time dimension
value – value with size=(batch, time2, in_feats)
mask –

optional mask with size=(batch, time1, time2),
to zero attention between some time steps.

or (batch, time) if time1=time2

Returns

Attention weigthed average of the values with size=(batch, time1, out_feats)

forward(query, key, value, mask)[source]

Computes ‘Local Scaled Dot Product Attention’.

Parameters

query – query with size=(batch, time1, in_feats), where time1 is the output time dimension
key – key with size=(batch, time2, in_feats) where time1 is the input time dimension
value – value with size=(batch, time2, in_feats)
mask –

optional mask with size=(batch, time1, time2),
to zero attention between some time steps.

or (batch, time) if time1=time2

Returns

Attention weigthed average of the values with size=(batch, time1, out_feats)

property in_feats

property out_feats

class hyperion.torch.layers.attention.ScaledDotProdAttRelPosEncV1(*args: Any, **kwargs: Any)[source]

Scaled dot product multihead attention layer: with relative positional encoders as defined in https://arxiv.org/pdf/1901.02860.pdf

in_feats: input feature dimension

out_feats: output feature dimension

num_heads: number of heads

d_k: key/query projection dimension

d_v: value projection dimension

causal_pos_enc: positional encoder is 0 for attending future frames.

dropout_rate: dropout rate

time_dim: time dimension in the input, default=1 meaning input dimensions are (batch, time, in_feats)

__init__(in_feats, out_feats, num_heads, d_k, d_v, causal_pos_enc=False, dropout_rate=0, time_dim=1)[source]

_apply_tril(x)[source]

Applies lower triangular mask to (Q + v^T) W R_{i-j} attention matrix: to keep causal attention points, i.e., i-j >= 0

E.g., if t1=3, t2=4 this will apply a mask [1 1 0 0;

1 1 1 0; 1 1 1 1 ]

_apply_triu(x)[source]

Applies upper triangular mask to (Q + v^T) W R_{i-j} attention matrix: to keep non-causal attention points, i.e., i-j < 0

E.g., if t1=3, t2=4 this will apply a mask [0 0 1 1;

0 0 0 1; 0 0 0 0 ]

_left_shift(x)[source]

Applies left shifts to the rows of x

to get scores with relative pos encodings R_{i-j} i-j >=0, causal attention

E.g.

[q0 R3, q0 R2, q0 R1, q0 R0;: q1 R3, q1 R2, q1 R1, q1 R0; q2 R3, q2 R2, q2 R1, q2 R0]

becomes:

[q0 R1, q0 R0, 0 , 0 ;: q1 R2, q1 R1, q1 R0, 0 ; q2 R3, q2 R2, q2 R1, q2 R0]

_right_shift(x)[source]

Applies right shifts to the rows of x

to get scores with relative pos encodings R_{i-j} i-j < 0, non-causal attention

E.g.

[q0 R_0, q0 R_{-1}, q0 R_{-2};: q1 R_0, q1 R_{-1}, q1 R_{-2}; q2 R_0, q1 R_{-1}, q2 R_{-2}]

becomes:

[ 0, q0 R_{-1}, q0 R_{-2};: 0, 0 , q1 R_{-1}; 0, 0 , 0 ]

forward(query, key, value, pos_emb=None, mask=None)[source]

Computes ‘Scaled Dot Product Attention’.

Parameters

query – query with size=(batch, time1, in_feats), where time1 is the output time dimension
key – key with size=(batch, time2, in_feats) where time1 is the input time dimension
value – value with size=(batch, time2, in_feats)
pos_emb – positional embedding size=(batch, time2, in_feats) as R_{L-1}, …, R_0
mask – optional mask with size=(batch, time1, time2), to zero attention between some time steps or size=(batch, time) to make time1=time2

Returns

Attention weigthed average of the value with size=(batch, time1, out_feats)

property in_feats

property out_feats

class hyperion.torch.layers.attention.LocalScaledDotProdAttRelPosEncV1(*args: Any, **kwargs: Any)[source]

Local Scaled dot product multihead attention layer

It calculates self-attention between time steps within a window of ‘context’ frames.

It uses relative positional encoders as defined in https://arxiv.org/pdf/1901.02860.pdf

in_feats: input feature dimension

out_feats: output feature dimension

num_heads: number of heads

d_k: key/query projection dimension

d_v: value projection dimension

context: maximum attention temporal context.

causal_pos_enc: positional encoder is 0 for attending future frames.

dropout_rate: dropout rate

time_dim: time dimension in the input, default=1 meaning input dimensions are (batch, time, in_feats)

__init__(in_feats, out_feats, num_heads, d_k, d_v, context=25, causal_pos_enc=False, dropout_rate=0, time_dim=1)[source]: Construct an MultiHeadedAttention object.

_apply_tril(x)[source]

Applies lower triangular mask to (Q + v^T) W R_{i-j} attention matrix: to keep causal attention points, i.e., i-j >= 0

E.g., if t1=3, t2=4 this will apply a mask [1 1 0 0;

1 1 1 0; 1 1 1 1 ]

_apply_triu(x)[source]

Applies upper triangular mask to (Q + v^T) W R_{i-j} attention matrix: to keep non-causal attention points, i.e., i-j < 0

E.g., if t1=3, t2=4 this will apply a mask [0 0 1 1;

0 0 0 1; 0 0 0 0 ]

_left_shift(x, context, left_shift)[source]

Applies left shifts to the rows of x

to get scores with relative pos encodings R_{i-j} i-j >=0, causal attention

E.g.

[q0 R3, q0 R2, q0 R1, q0 R0;: q1 R3, q1 R2, q1 R1, q1 R0; q2 R3, q2 R2, q2 R1, q2 R0]

becomes:

[q0 R1, q0 R0, 0 , 0 ;: q1 R2, q1 R1, q1 R0, 0 ; q2 R3, q2 R2, q2 R1, q2 R0]

_right_shift(x, context, left_shift)[source]

Applies right shifts to the rows of x

to get scores with relative pos encodings R_{i-j} i-j < 0, non-causal attention

E.g.

[q0 R_0, q0 R_{-1}, q0 R_{-2};: q1 R_0, q1 R_{-1}, q1 R_{-2}; q2 R_0, q1 R_{-1}, q2 R_{-2}]

becomes:

[ 0, q0 R_{-1}, q0 R_{-2};: 0, 0 , q1 R_{-1}; 0, 0 , 0 ]

forward(query, key, value, pos_emb=None, mask=None)[source]

Computes ‘Scaled Dot Product Attention’.

Parameters

query – query with size=(batch, time1, in_feats), where time1 is the output time dimension
key – key with size=(batch, time2, in_feats) where time1 is the input time dimension
value – value with size=(batch, time2, in_feats)
pos_emb – positional embedding size=(batch, time2, in_feats) as R_{L-1}, …, R_0
mask – optional mask with size=(batch, time1, time2), to zero attention between some time steps or size=(batch, time) to make time1=time2

Returns

Attention weigthed average of the value with size=(batch, time1, out_feats)

static _softmax(scores1, scores2, shift1, shift2, t1, t2)

Computes softmax for block diagonal attention maps

Parameters

scores1 – attention scores from block-diagonal score matrix with size=(batch, heads, blocks, t1, t2)
scores2 – attention scores from a shifted block-diagonal score matrix with size=(batch, heads, blocks-1, t1, t2)
shift1 – shift of diagonal blocks of scores2 wrt scores1 in time steps in the time dimension 1
shift2 – shift of diagonal blocks of scores2 wrt scores1 in time steps in the time dimension 2, with self-attention shift1=shift2
t1 – length of time dimension 1 (output time dimension)
t2 – length of time dimension 2 (input time dimension), with self-att t1=t2.

Returns

probs1: posterior attention scores for block-diagonal att. matrix: with size=(batch, heads, blocks, t1, t2)
probs2: posterior attention scores for a shifted block-diagonal att. matrix: with size=(batch, heads, blocks-1, t1, t2)

forward1(query, key, value, mask)

Computes ‘Local Scaled Dot Product Attention’.

Parameters

query – query with size=(batch, time1, in_feats), where time1 is the output time dimension
key – key with size=(batch, time2, in_feats) where time1 is the input time dimension
value – value with size=(batch, time2, in_feats)
mask –

optional mask with size=(batch, time1, time2),
to zero attention between some time steps.

or (batch, time) if time1=time2

Returns

Attention weigthed average of the values with size=(batch, time1, out_feats)

forward2(query, key, value, mask)

Computes ‘Local Scaled Dot Product Attention’.

Parameters

query – query with size=(batch, time1, in_feats), where time1 is the output time dimension
key – key with size=(batch, time2, in_feats) where time1 is the input time dimension
value – value with size=(batch, time2, in_feats)
mask –

optional mask with size=(batch, time1, time2),
to zero attention between some time steps.

or (batch, time) if time1=time2

Returns

Attention weigthed average of the values with size=(batch, time1, out_feats)

property in_feats

property out_feats

Pooling Layers

These include custom pooling layers and factory class to create pooling layers from config parameters.

class hyperion.torch.layers.pool_factory.GlobalPool1dFactory[source]

static create(pool_type, in_feats=None, inner_feats=128, num_comp=64, dist_pow=2, use_bias=False, num_heads=8, d_k=256, d_v=256, bin_attn=False, use_global_context=True, norm_layer=None, dim=- 1, keepdim=False, **kwargs)[source]

static filter_args(**kwargs)[source]

static add_class_args(parser, prefix=None, skip=[])[source]

static get_config(layer)[source]

static add_argparse_args(parser, prefix=None, skip=[])

hyperion.torch.layers.global_pool._conv1(in_channels, out_channels, bias=False)[source]: point-wise convolution

class hyperion.torch.layers.global_pool.GlobalAvgPool1d(*args: Any, **kwargs: Any)[source]

Global average pooling in 1d

dim: pooling dimension

keepdim: it True keeps the same number of dimensions after pooling

__init__(dim=- 1, keepdim=False)[source]

forward(x, weights=None)[source]

forward_slidwin(x, win_length, win_shift, snip_edges=False)[source]

get_config()

class hyperion.torch.layers.global_pool.GlobalMeanStdPool1d(*args: Any, **kwargs: Any)[source]

Global mean + standard deviation pooling in 1d

dim: pooling dimension

keepdim: it True keeps the same number of dimensions after pooling

__init__(dim=- 1, keepdim=False)[source]

forward(x, weights=None)[source]

forward_slidwin(x, win_length, win_shift, snip_edges=False)[source]

get_config()

class hyperion.torch.layers.global_pool.GlobalMeanLogVarPool1d(*args: Any, **kwargs: Any)[source]

Global mean + log-variance pooling in 1d

dim: pooling dimension

keepdim: it True keeps the same number of dimensions after pooling

__init__(dim=- 1, keepdim=False)[source]

forward(x, weights=None)[source]

forward_slidwin(x, win_length, win_shift)

get_config()

class hyperion.torch.layers.global_pool.LDEPool1d(*args: Any, **kwargs: Any)[source]

Learnable dictionary encoder pooling in 1d

in_feats: input feature dimension

num_comp: number of cluster components

dist_pow: power for distance metric

use_bias: use bias parameter when computing posterior responsibility

dim: pooling dimension

keepdim: it True keeps the same number of dimensions after pooling

__init__(in_feats, num_comp=64, dist_pow=2, use_bias=False, dim=- 1, keepdim=False)[source]

property num_comp

property in_feats

forward(x, weights=None)[source]

get_config()[source]

forward_slidwin(x, win_length, win_shift)

class hyperion.torch.layers.global_pool.ScaledDotProdAttV1Pool1d(*args: Any, **kwargs: Any)[source]

__init__(in_feats, num_heads, d_k, d_v, bin_attn=False, dim=- 1, keepdim=False)[source]

property in_feats

forward(x, weights=None)[source]

get_config()[source]

forward_slidwin(x, win_length, win_shift)

class hyperion.torch.layers.global_pool.GlobalChWiseAttMeanStdPool1d(*args: Any, **kwargs: Any)[source]

Attentive mean + stddev pooling for each channel

__init__(in_feats, inner_feats=128, bin_attn=False, use_global_context=True, norm_layer=None, dim=- 1, keepdim=False)[source]

forward(x, weights=None)[source]

forward_slidwin(x, win_length, win_shift)

get_config()[source]

Acoustic Feature Extraction Layers

These define several feature extraction layers that take wave as input and produce Spectrograms, Filter-banks, MFCC, etc. It also includes a factory class to create feature extraction layers from config params.

class hyperion.torch.layers.audio_feats_factory.AudioFeatsFactory[source]

static create(audio_feat, sample_frequency=16000, frame_length=25, frame_shift=10, fft_length=512, remove_dc_offset=True, preemphasis_coeff=0.97, window_type='povey', use_fft_mag=False, dither=1, fb_type='mel_kaldi', low_freq=20, high_freq=0, num_filters=23, norm_filters=False, num_ceps=13, snip_edges=True, center=False, cepstral_lifter=22, energy_floor=0, raw_energy=True, use_energy=True)[source]

static filter_args(**kwargs)[source]

Filters MFCC args from arguments dictionary.

Parameters: kwargs – Arguments dictionary.
Returns: Dictionary with MFCC options.

static add_class_args(parser, prefix=None)[source]

Adds MFCC options to parser.

Parameters

parser – Arguments parser
prefix – Options prefix.

static add_argparse_args(parser, prefix=None)

Adds MFCC options to parser.

Parameters

parser – Arguments parser
prefix – Options prefix.

hyperion.torch.layers.audio_feats._get_feature_window_function(window_type, window_size, blackman_coeff=0.42)[source]: Returns a window function with the given type and size

hyperion.torch.layers.audio_feats._get_strided_batch(waveform, window_length, window_shift, snip_edges, center=False)[source]

Given a waveform (1D tensor of size num_samples), it returns a 2D tensor (m, window_size) representing how the window is shifted along the waveform. Each row is a frame.

Parameters

waveform (torch.Tensor) – Tensor of size num_samples
window_size (int) – Frame length
window_shift (int) – Frame shift
snip_edges (bool) – If True, end effects will be handled by outputting only frames that completely fit in the file, and the number of frames depends on the frame_length. If False, the number of frames depends only on the frame_shift, and we reflect the data at the ends.
center (bool) – If true, if puts the center of the frame at t*window_shift, starting at t=0, If overwrides snip_edges and set it to False

Returns

3D tensor of size (m, window_size) where each row is a frame

Return type

torch.Tensor

hyperion.torch.layers.audio_feats._get_log_energy(x, energy_floor)[source]: Returns the log energy of size (m) for a strided_input (m,*)

class hyperion.torch.layers.audio_feats.Wav2Win(*args: Any, **kwargs: Any)[source]

__init__(fs=16000, frame_length=25, frame_shift=10, pad_length=None, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', dither=1, snip_edges=True, center=False, energy_floor=0, raw_energy=True, return_log_energy=False)[source]

forward(x)[source]

class hyperion.torch.layers.audio_feats.Wav2FFT(*args: Any, **kwargs: Any)[source]

__init__(fs=16000, frame_length=25, frame_shift=10, fft_length=512, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', dither=1, snip_edges=True, center=False, energy_floor=0, raw_energy=True, use_energy=True)[source]

property fs

property frame_length

property frame_shift

property remove_dc_offset

property preemph_coeff

property window_type

property dither

forward(x)[source]

class hyperion.torch.layers.audio_feats.Wav2Spec(*args: Any, **kwargs: Any)[source]

__init__(fs=16000, frame_length=25, frame_shift=10, fft_length=512, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', use_fft_mag=False, dither=1, snip_edges=True, center=False, energy_floor=0, raw_energy=True, use_energy=True)[source]

forward(x)[source]

property dither

property frame_length

property frame_shift

property fs

property preemph_coeff

property remove_dc_offset

property window_type

class hyperion.torch.layers.audio_feats.Wav2LogSpec(*args: Any, **kwargs: Any)[source]

__init__(fs=16000, frame_length=25, frame_shift=10, fft_length=512, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', use_fft_mag=False, dither=1, snip_edges=True, center=False, energy_floor=0, raw_energy=True, use_energy=True)[source]

forward(x)[source]

property dither

property frame_length

property frame_shift

property fs

property preemph_coeff

property remove_dc_offset

property window_type

class hyperion.torch.layers.audio_feats.Wav2LogFilterBank(*args: Any, **kwargs: Any)[source]

__init__(fs=16000, frame_length=25, frame_shift=10, fft_length=512, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', use_fft_mag=False, dither=1, fb_type='mel_kaldi', low_freq=20, high_freq=0, num_filters=23, norm_filters=False, snip_edges=True, center=False, energy_floor=0, raw_energy=True, use_energy=True)[source]

forward(x)[source]

property dither

property frame_length

property frame_shift

property fs

property preemph_coeff

property remove_dc_offset

property window_type

class hyperion.torch.layers.audio_feats.Wav2MFCC(*args: Any, **kwargs: Any)[source]

__init__(fs=16000, frame_length=25, frame_shift=10, fft_length=512, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', use_fft_mag=False, dither=1, fb_type='mel_kaldi', low_freq=20, high_freq=0, num_filters=23, norm_filters=False, num_ceps=13, snip_edges=True, center=False, cepstral_lifter=22, energy_floor=0, raw_energy=True, use_energy=True)[source]

static make_lifter(N, Q)[source]

Makes the liftering function

Parameters

N – Number of cepstral coefficients.
Q – Liftering parameter

Returns

Liftering vector.

static make_dct_matrix(num_ceps, num_filters)[source]

forward(x)[source]

property dither

property frame_length

property frame_shift

property fs

property preemph_coeff

property remove_dc_offset

property window_type

class hyperion.torch.layers.audio_feats.Wav2KanBayashiLogFilterBank(*args: Any, **kwargs: Any)[source]

Class to replicate log-filter-banks used in Kan Bayashi’s ParallelWaveGAN repository: https://github.com/kan-bayashi/ParallelWaveGAN

__init__(fs=16000, frame_length=64, frame_shift=16, fft_length=1024, remove_dc_offset=True, window_type='hanning', low_freq=80, high_freq=7600, num_filters=80, snip_edges=False, center=True)[source]

forward(x)[source]

property dither

property frame_length

property frame_shift

property fs

property preemph_coeff

property remove_dc_offset

property window_type

class hyperion.torch.layers.audio_feats.Spec2LogFilterBank(fs=16000, fft_length=512, fb_type='mel_kaldi', low_freq=20, high_freq=0, num_filters=23, norm_filters=False)[source]

__init__(fs=16000, fft_length=512, fb_type='mel_kaldi', low_freq=20, high_freq=0, num_filters=23, norm_filters=False)[source]

forward(x)[source]

Feature Normalization Layers

class hyperion.torch.layers.mvn.MeanVarianceNorm(*args: Any, **kwargs: Any)[source]

__init__(norm_mean=True, norm_var=False, left_context=0, right_context=0, dim=1)[source]

forward(x)[source]

normalize_global(x)[source]

normalize_cumsum(x)[source]

static filter_args(**kwargs)[source]

Filters ST-CMVN args from arguments dictionary.

Parameters: kwargs – Arguments dictionary.
Returns: Dictionary with ST-CMVN options.

static add_class_args(parser, prefix=None)[source]

Adds ST-CMVN options to parser.

Parameters

parser – Arguments parser
prefix – Options prefix.

static add_argparse_args(parser, prefix=None)

Adds ST-CMVN options to parser.

Parameters

parser – Arguments parser
prefix – Options prefix.

Feature Augmentation Layers

class hyperion.torch.layers.spec_augment.AxisMasker(*args: Any, **kwargs: Any)[source]

Applies a mask to the spectrogram along time or freq dimension. Implementation based on espnet.

mask_width_range: range for the width of the masks

mask_num_range: range for the number of masks

dim: axis where we apply the mask

fill_value: masking value

__init__(min_width=0, max_width=30, min_num_masks=1, max_num_masks=2, dim=- 1, fill_value=0)[source]

forward(x)[source]

Apply mask along time or freq dimension

Parameters: x – spectrogram (batch, *, time, freq)
Returns: Masked spectrogram (batch, *, time, freq)

class hyperion.torch.layers.spec_augment.SpecWarper(*args: Any, **kwargs: Any)[source]

Warps the spectrogram along time or freq dimension. Implementation based on espnet.

window: time warp parameter

__init__(window=80, mode='bicubic', dim=- 2)[source]

forward(x, lengths=None)[source]

warps x along time or freq dimension

Parameters

x – spectrogram (batch, *, time, freq)
lengths – length ratios

Returns

warped spectrogram (batch, *, time, freq)

class hyperion.torch.layers.spec_augment.SpecAugment(*args: Any, **kwargs: Any)[source]

Implementation of SpecAugment.

Reference:: Daniel S. Park et al. “SpecAugment: A Simple Data

Augmentation Method for Automatic Speech Recognition”

Attributes:

__init__(time_warp_prob=0, time_warp_window=5, time_warp_mode='bicubic', time_mask_prob=0, time_mask_min_width=0, time_mask_max_width=100, time_mask_min_num_masks=1, time_mask_max_num_masks=2, freq_mask_prob=0, freq_mask_min_width=0, freq_mask_max_width=20, freq_mask_min_num_masks=1, freq_mask_max_num_masks=2, fill_value=0)[source]

forward(x, lengths=None)[source]

filter_args()[source]

Filters SpecAugment args from arguments dictionary.

Parameters: kwargs – Arguments dictionary.
Returns: Dictionary with SpecAugment options.

static add_class_args(parser, prefix=None)[source]

Adds SpecAugment options to parser.

Parameters

parser – Arguments parser
prefix – Options prefix.

Large Margin Losses Layers

These are output layers that are used to create large margin cross-entorpy losses.

class hyperion.torch.layers.margin_losses.ArcLossOutput(*args: Any, **kwargs: Any)[source]

__init__(in_feats, num_classes, s=64, margin=0.3, margin_warmup_epochs=0)[source]

update_margin(epoch)[source]

forward(x, y=None)[source]

class hyperion.torch.layers.margin_losses.CosLossOutput(*args: Any, **kwargs: Any)[source]

__init__(in_feats, num_classes, s=64, margin=0.3, margin_warmup_epochs=0)[source]

update_margin(epoch)[source]

forward(x, y=None)[source]

class hyperion.torch.layers.margin_losses.SubCenterArcLossOutput(*args: Any, **kwargs: Any)[source]

__init__(in_feats, num_classes, num_subcenters=2, s=64, margin=0.3, margin_warmup_epochs=0)[source]

update_margin(epoch)

forward(x, y=None)[source]

Prob Densitiy Function Layers

These are layers related to probability density functions used in VAEs

class hyperion.torch.layers.pdf_storage.StdNormal(*args: Any, **kwargs: Any)[source]

Storage for Standard Normal distribution

__init__(shape)[source]

property pdf

forward()[source]

class hyperion.torch.layers.tensor2pdf.Tensor2PDF(*args: Any, **kwargs: Any)[source]

Base class for layers that create a prob distribution from an input tensor

__init__(pdf_feats, project=True, in_feats=None, in_dim=None)[source]

class hyperion.torch.layers.tensor2pdf.Tensor2NormalICov(*args: Any, **kwargs: Any)[source]

Transforms a Tensor into Normal distribution with identitiy variance

__init__(pdf_feats, project=True, in_feats=None, in_dim=None)[source]

forward(inputs, prior=None, squeeze_dim=None)[source]

class hyperion.torch.layers.tensor2pdf.Tensor2NormalGlobDiagCov(*args: Any, **kwargs: Any)[source]

Transforms a Tensor into Normal distribution

Input tensor will be the mean of the distribution and the standard deviation is a global trainable parameter.

__init__(pdf_feats, project=True, in_feats=None, in_dim=None)[source]

forward(inputs, prior=None, squeeze_dim=None)[source]

class hyperion.torch.layers.tensor2pdf.Tensor2NormalDiagCov(*args: Any, **kwargs: Any)[source]

Transforms a Tensor into Normal distribution

Applies two linear transformation to the tensors to obtain the mean and the log-variance.

__init__(pdf_feats, project=True, in_feats=None, in_dim=None)[source]

forward(inputs, prior=None, squeeze_dim=None)[source]

class hyperion.torch.layers.tensor2pdf.Tensor2BayNormalICovGivenNormalPrior(*args: Any, **kwargs: Any)[source]

Transforms a Tensor into Normal distribution with identitiy variance

Uses Bayesian interpolation between Gaussian prior and Maximum Likelihood estimation

__init__(pdf_feats, project=True, in_feats=None, in_dim=None)[source]

forward(inputs, prior=None, squeeze_dim=None)[source]

class hyperion.torch.layers.tensor2pdf.Tensor2BayNormalGlobDiagCovGivenNormalPrior(*args: Any, **kwargs: Any)[source]

Transforms a Tensor into Normal distribution

Input tensor will be the ML mean of the distribution and the ML standard deviation is a global trainable parameter.

Uses Bayesian interpolation between Gaussian prior and Maximum Likelihood estimation

__init__(pdf_feats, project=True, in_feats=None, in_dim=None)[source]

forward(inputs, prior=None, squeeze_dim=None)[source]

class hyperion.torch.layers.tensor2pdf.Tensor2BayNormalDiagCovGivenNormalPrior(*args: Any, **kwargs: Any)[source]

Transforms a Tensor into Normal distribution

Applies two linear transformation to the tensors to obtain the maximum likelihood mean and the log-variance.

Uses Bayesian interpolation between Gaussian prior and Maximum Likelihood estimation

__init__(pdf_feats, project=True, in_feats=None, in_dim=None)[source]

forward(inputs, prior=None, squeeze_dim=None)[source]

Vector Quantization Layers

These are vector quantization layers like the ones used in VQ-VAEs

class hyperion.torch.layers.vq.VectorQuantizer(*args: Any, **kwargs: Any)[source]

__init__(num_embed, embed_feats, project=True, in_feats=None, in_dim=None)[source]

class hyperion.torch.layers.vq.KMeansVectorQuantizer(*args: Any, **kwargs: Any)[source]

__init__(num_embed, embed_feats, commitment_cost=0.25, project=True, in_feats=None, in_dim=None)[source]

forward(inputs, return_r=False)[source]

class hyperion.torch.layers.vq.MultiKMeansVectorQuantizer(*args: Any, **kwargs: Any)[source]

__init__(num_groups, num_embed, embed_feats, commitment_cost=0.25, project=True, in_feats=None, in_dim=None)[source]

property commitment_cost

forward(inputs, return_r=False)[source]

class hyperion.torch.layers.vq.EMAKMeansVectorQuantizer(*args: Any, **kwargs: Any)[source]

__init__(num_embed, embed_feats, commitment_cost=0.25, gamma=0.99, eps=1e-05, project=True, in_feats=None, in_dim=None)[source]

forward(inputs, return_r=False)[source]

class hyperion.torch.layers.vq.MultiEMAKMeansVectorQuantizer(*args: Any, **kwargs: Any)[source]

__init__(num_groups, num_embed, embed_feats, commitment_cost=0.25, gamma=0.99, eps=1e-05, project=True, in_feats=None, in_dim=None)[source]

property commitment_cost

property gamma

property eps

forward(inputs, return_r=False)[source]

Upsampling Layers

These include layers related to upsampling operations.

class hyperion.torch.layers.interpolate.Interpolate(*args: Any, **kwargs: Any)[source]

__init__(scale_factor, mode='nearest')[source]

forward(x)[source]

class hyperion.torch.layers.subpixel_convs.SubPixelConv1d(*args: Any, **kwargs: Any)[source]

__init__(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')[source]

forward(x)[source]

class hyperion.torch.layers.subpixel_convs.SubPixelConv2d(*args: Any, **kwargs: Any)[source]

__init__(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')[source]

forward(x)[source]

hyperion.torch.layers.subpixel_convs.ICNR2d(tensor, stride=2, initializer=torch.nn.init.kaiming_normal)[source]

Initialization method “Initialization to Convolution Nearest neighbours Resize (ICNR)” for subpixel convolutions described in described in “Andrew Aitken et al. (2017) Checkerboard artifact free sub-pixel convolution”

https://arxiv.org/abs/1707.02937

Parameters

tensor – torch.Tensor containing the conv weights
stride – subpixel conv stride
initializer – initizializer to be used for sub_kernel inizialization

Examples

>>> conv = SubPixelConv2d(in_channels, out_channels, kernel_size=3, stride=upscale)
>>> ICNR2d(conv_shuffle.weight, stride=upscale)

hyperion.torch.layers.subpixel_convs.ICNR1d(tensor, stride=2, initializer=torch.nn.init.kaiming_normal)[source]

1d version of the initialization method “Initialization to Convolution Nearest neighbours Resize (ICNR)” for subpixel convolutions described in described in “Andrew Aitken et al. (2017) Checkerboard artifact free sub-pixel convolution”

https://arxiv.org/abs/1707.02937

Parameters

tensor – torch.Tensor containing the conv weights
stride – subpixel conv stride
initializer – initizializer to be used for sub_kernel inizialization

Examples

>>> conv = SubPixelConv1d(in_channels, out_channels, kernel_size=3, stride=upscale)
>>> ICNR1d(conv_shuffle.weight, stride=upscale)

Positional Encoders

These include layers that implement positional encoders used in transformers.

class hyperion.torch.layers.pos_encoder.PosEncoder(*args: Any, **kwargs: Any)[source]

Positional encoding.

num_feats: embedding dim

dropout_rate: dropout rate

__init__(num_feats, dropout_rate=0)[source]

_pe(x, relative=False)[source]: Reset the positional encodings.

forward(x)[source]

Add positional encoding.

Parameters: x – Input with shape=(batch, time, C)
Returns: x-scaled + pos-encoder

class hyperion.torch.layers.pos_encoder.RelPosEncoder(*args: Any, **kwargs: Any)[source]

Relative Positional encoding as defined in

https://arxiv.org/pdf/1901.02860.pdf

It returns the input and the positional encoder separtely so they are mixed in the attention block later.

num_feats: embedding dim

dropout_rate: dropout rate

__init__(num_feats, dropout_rate=0)[source]

forward(x)[source]

Add positional encoding.

Parameters: x – Input with shape=(batch, time, C)
Returns: x-scaled, pos-encoding

_pe(x, relative=False): Reset the positional encodings.

class hyperion.torch.layers.pos_encoder.NoPosEncoder(*args: Any, **kwargs: Any)[source]

This is a dummy class for the case where we deactivate the positional encoder

__init__()[source]

forward(x)[source]

Identity map

Parameters: x – Input with shape=(batch, time, C)
Returns: x

Calibration

These are layers that are used to simulate the calibration block after the speaker recognition back-end

class hyperion.torch.layers.calibrators.LinBinCalibrator(*args: Any, **kwargs: Any)[source]

__init__(a, b)[source]

forward(x)[source]

Layer Blocks

These are Torch modules that combine several layers. These are the building blocks used to create more complex architectures like ResNets, Transformers of EfficientNets.

Fully Connected Blocks

These are fully connected blocks used to create simple feed forward networks, classification heads, etc.

class hyperion.torch.layer_blocks.fc_blocks.FCBlock(*args: Any, **kwargs: Any)[source]

Fully connected block

in_feats: input feature dimension

out_feats: output feature dimension

activatoin: str/dict indicating the type of activation function

norm_layer: normalization layer constructor, if None it uses batch-norm

use_norm: if True, it applies the normalization layer, if False no normalization is applied

norm_before: if True normalization layer is applied before the activation function, if False after

__init__(in_feats, out_feats, activation={'inplace': True, 'name': 'relu'}, dropout_rate=0, norm_layer=None, use_norm=True, norm_before=False)[source]

forward(x)[source]: Forward function

forward_linear(x)[source]: Forward function without activation function

Deep Convolutional Blocks

Deep Convolutional 1d Blocks

These are blocks to create deep convolutional networks 1d without residuals.

class hyperion.torch.layer_blocks.dc1d_blocks.DC1dEncBlock(*args: Any, **kwargs: Any)[source]

__init__(in_channels, out_channels, kernel_size, stride=1, dilation=1, activation='relu', dropout_rate=0, use_norm=True, norm_layer=None, norm_before=True)[source]

freeze()[source]

unfreeze()[source]

forward(x)[source]

class hyperion.torch.layer_blocks.dc1d_blocks.DC1dDecBlock(*args: Any, **kwargs: Any)[source]

__init__(in_channels, out_channels, kernel_size, stride=1, dilation=1, activation='relu', dropout_rate=0, use_norm=True, norm_layer=None, norm_before=True)[source]

freeze()[source]

unfreeze()[source]

forward(x)[source]

Deep Convolutional 2d Blocks

These are blocks to create deep convolutional networks 2d without residuals.

class hyperion.torch.layer_blocks.dc2d_blocks.DC2dEncBlock(*args: Any, **kwargs: Any)[source]

__init__(in_channels, out_channels, kernel_size, stride=1, dilation=1, activation='relu', dropout_rate=0, use_norm=True, norm_layer=None, norm_before=True)[source]

freeze()[source]

unfreeze()[source]

forward(x)[source]

class hyperion.torch.layer_blocks.dc2d_blocks.DC2dDecBlock(*args: Any, **kwargs: Any)[source]

__init__(in_channels, out_channels, kernel_size, stride=1, dilation=1, activation='relu', dropout_rate=0, use_norm=True, norm_layer=None, norm_before=True)[source]

freeze()[source]

unfreeze()[source]

forward(x)[source]

TDNN Blocks

TDNN blocks used to create TDNN x-vectors

class hyperion.torch.layer_blocks.tdnn_blocks.TDNNBlock(*args: Any, **kwargs: Any)[source]

__init__(in_channels, out_channels, kernel_size, dilation=1, activation={'inplace': True, 'name': 'relu'}, dropout_rate=0, norm_layer=None, use_norm=True, norm_before=False)[source]

freeze()[source]

unfreeze()[source]

forward(x)[source]

Extended TDNN Blocks

Extended TDNN blocks used to create E-TDNN x-vectors

class hyperion.torch.layer_blocks.etdnn_blocks.ETDNNBlock(*args: Any, **kwargs: Any)[source]

__init__(in_channels, out_channels, kernel_size, dilation=1, activation={'inplace': True, 'name': 'relu'}, dropout_rate=0, norm_layer=None, use_norm=True, norm_before=False)[source]

forward(x)[source]

Residual Extended TDNN Blocks

Extended TDNN blocks with residual connections

class hyperion.torch.layer_blocks.resetdnn_blocks.ResETDNNBlock(*args: Any, **kwargs: Any)[source]

__init__(num_channels, kernel_size, dilation=1, activation={'inplace': True, 'name': 'relu'}, dropout_rate=0, norm_layer=None, use_norm=True, norm_before=False)[source]

forward(x)[source]

Squeeze-Excitation Blocks

Squeeze-Excitation Blocks 1d and 2d, which are added at the output ResNet blocks and other to create squeeze-excitation networks.

class hyperion.torch.layer_blocks.se_blocks.SEBlock2D(*args: Any, **kwargs: Any)[source]

From https://arxiv.org/abs/1709.01507

__init__(num_channels, r=16, activation={'inplace': True, 'name': 'relu'})[source]

forward(x)[source]

class hyperion.torch.layer_blocks.se_blocks.TSEBlock2D(*args: Any, **kwargs: Any)[source]

From https://arxiv.org/abs/1709.01507 Modified to do pooling only in time dimension

__init__(num_channels, num_feats, r=16, activation={'inplace': True, 'name': 'relu'})[source]

forward(x)[source]

class hyperion.torch.layer_blocks.se_blocks.SEBlock1d(*args: Any, **kwargs: Any)[source]

1d Squeeze Excitation version of https://arxiv.org/abs/1709.01507

__init__(num_channels, r=16, activation={'inplace': True, 'name': 'relu'})[source]

forward(x)[source]

hyperion.torch.layer_blocks.se_blocks.SEBlock2d: alias of hyperion.torch.layer_blocks.se_blocks.SEBlock2D

hyperion.torch.layer_blocks.se_blocks.TSEBlock2d: alias of hyperion.torch.layer_blocks.se_blocks.TSEBlock2D

Cannonical ResNet Blocks

These are blocks used to create cannonical ResNet, SE-ResNet, Res2Nets, etc.

ResNet Blocks

These blocks are used to create cannonical ResNets.

hyperion.torch.layer_blocks.resnet_blocks._conv3x3(in_channels, out_channels, stride=1, groups=1, dilation=1, bias=False)[source]: 3x3 convolution with padding

hyperion.torch.layer_blocks.resnet_blocks._conv1x1(in_channels, out_channels, stride=1, bias=False)[source]: 1x1 convolution

class hyperion.torch.layer_blocks.resnet_blocks.ResNetInputBlock(*args: Any, **kwargs: Any)[source]

Input block for ResNet architecture

Parameters

in_channels – input channels
out_channels – output channels
kernel_size – kernel size for conv
stride – stride for conv
activation – str/dict indicationg activation type and arguments
norm_layer – norm_layer object constructor, if None it uses BatchNorm2d
norm_before – if True it applies the norm_layer before the activation, if False, after the activation
do_maxpool – apply maxpooling 2x2 at the output

__init__(in_channels, out_channels, kernel_size=7, stride=2, activation={'inplace': True, 'name': 'relu'}, norm_layer=None, norm_before=True, do_maxpool=True)[source]

forward(x)[source]

class hyperion.torch.layer_blocks.resnet_blocks.ResNetBasicBlock(*args: Any, **kwargs: Any)[source]

expansion = 1

__init__(in_channels, channels, activation={'inplace': True, 'name': 'relu'}, stride=1, dropout_rate=0, groups=1, dilation=1, norm_layer=None, norm_before=True)[source]

property out_channels

forward(x)[source]

class hyperion.torch.layer_blocks.resnet_blocks.ResNetBNBlock(*args: Any, **kwargs: Any)[source]

expansion = 4

__init__(in_channels, channels, activation={'inplace': True, 'name': 'relu'}, stride=1, dropout_rate=0, groups=1, dilation=1, norm_layer=None, norm_before=True)[source]

property out_channels

forward(x)[source]

class hyperion.torch.layer_blocks.resnet_blocks.Interpolate(*args: Any, **kwargs: Any)[source]

__init__(scale_factor, mode='nearest')[source]

forward(x)[source]

class hyperion.torch.layer_blocks.resnet_blocks.ResNetEndpointBlock(*args: Any, **kwargs: Any)[source]

__init__(in_channels, out_channels, scale, activation={'inplace': True, 'name': 'relu'}, norm_layer=None, norm_before=True)[source]

forward(x)[source]

SE-ResNet Blocks

These blocks are used to create cannonical Squeeze-Excitation ResNets

class hyperion.torch.layer_blocks.seresnet_blocks.SEResNetBasicBlock(*args: Any, **kwargs: Any)[source]

__init__(in_channels, channels, activation={'inplace': True, 'name': 'relu'}, stride=1, dropout_rate=0, groups=1, dilation=1, norm_layer=None, norm_before=True, se_r=16, time_se=False, num_feats=None)[source]

forward(x)[source]

expansion = 1

property out_channels

class hyperion.torch.layer_blocks.seresnet_blocks.SEResNetBNBlock(*args: Any, **kwargs: Any)[source]

__init__(in_channels, channels, activation={'inplace': True, 'name': 'relu'}, stride=1, dropout_rate=0, groups=1, dilation=1, norm_layer=None, norm_before=True, se_r=16, time_se=False, num_feats=None)[source]

expansion = 4

property out_channels

forward(x)[source]

SE-ResNet Blocks

These blocks are used to create cannonical Squeeze-Excitation ResNets.

hyperion.torch.layer_blocks.res2net_blocks._conv3x3(in_channels, out_channels, stride=1, groups=1, dilation=1, bias=False)[source]: 3x3 convolution with padding

hyperion.torch.layer_blocks.res2net_blocks._conv1x1(in_channels, out_channels, stride=1, bias=False)[source]: 1x1 convolution

class hyperion.torch.layer_blocks.res2net_blocks.Res2NetBasicBlock(*args: Any, **kwargs: Any)[source]

expansion = 1

__init__(in_channels, channels, activation={'inplace': True, 'name': 'relu'}, stride=1, dropout_rate=0, width_factor=1, scale=4, groups=1, dilation=1, norm_layer=None, norm_before=True, se_r=None, time_se=False, num_feats=None)[source]

property out_channels

forward(x)[source]

class hyperion.torch.layer_blocks.res2net_blocks.Res2NetBNBlock(*args: Any, **kwargs: Any)[source]

expansion = 4

__init__(in_channels, channels, activation={'inplace': True, 'name': 'relu'}, stride=1, dropout_rate=0, width_factor=1, scale=4, groups=1, dilation=1, norm_layer=None, norm_before=True, se_r=None, time_se=False, num_feats=None)[source]

property out_channels

forward(x)[source]

SpineNet Blocks

These are some extra blocks needed to build SpineNet and Spine2Net.

class hyperion.torch.layer_blocks.spine_blocks.Interpolate(*args: Any, **kwargs: Any)[source]

__init__(scale_factor, mode='nearest')[source]

forward(x)[source]

hyperion.torch.layer_blocks.spine_blocks._conv3x3(in_channels, out_channels, stride=1, groups=1, dilation=1, bias=False)[source]: 3x3 convolution with padding

hyperion.torch.layer_blocks.spine_blocks._conv1x1(in_channels, out_channels, stride=1, bias=False)[source]: 1x1 convolution

hyperion.torch.layer_blocks.spine_blocks._subpixel_conv1x1(in_channels, out_channels, stride=1, bias=False)[source]: point-wise subpixel convolution

class hyperion.torch.layer_blocks.spine_blocks.SpineConv(*args: Any, **kwargs: Any)[source]

__init__(in_channels, channels, stride=1, dropout_rate=0, groups=1, dilation=1, activation={'inplace': True, 'name': 'relu'}, norm_layer=None, norm_before=True)[source]: Class that connects the ouputs of the SpineNet to the rest of the network

forward(x)[source]

class hyperion.torch.layer_blocks.spine_blocks.BlockSpec(level, block_fn, input_offsets, is_output)[source]

A container class that specifies the block configuration for SpineNet.

__init__(level, block_fn, input_offsets, is_output)[source]

static build_block_specs(block_specs=None)[source]: Builds the list of BlockSpec objects for SpineNet.

class hyperion.torch.layer_blocks.spine_blocks.SpineEndpoints(*args: Any, **kwargs: Any)[source]

__init__(in_channels, channels, level, target_level, upsampling_type='nearest', stride=1, activation={'inplace': True, 'name': 'relu'}, norm_layer=None, norm_before=True, do_endpoint_conv=True)[source]: Class that connects the ouputs of the SpineNet to the rest of the network

forward(x)[source]

class hyperion.torch.layer_blocks.spine_blocks.SpineResample(*args: Any, **kwargs: Any)[source]

__init__(spec, in_channels, out_channels, scale, alpha, upsampling_type='nearest', activation={'inplace': True, 'name': 'relu'}, norm_layer=None, norm_before=True)[source]: Class that build a resampling connection between single SpineNet blocks.

forward(x)[source]

MobileNet Blocks

These are blocks needed to build EfficientNet networks.

hyperion.torch.layer_blocks.mbconv_blocks._conv1x1(in_channels, out_channels, stride=1, bias=False)[source]: 1x1 convolution

hyperion.torch.layer_blocks.mbconv_blocks._dwconvkxk(channels, kernel_size=3, stride=1, bias=False)[source]: kxk depth-wise convolution with padding

class hyperion.torch.layer_blocks.mbconv_blocks.MBConvBlock(*args: Any, **kwargs: Any)[source]

__init__(in_channels, out_channels, expansion=6, kernel_size=3, stride=1, activation='swish', drop_connect_rate=0, norm_layer=None, se_r=None, time_se=False, num_feats=None)[source]

forward(x)[source]

class hyperion.torch.layer_blocks.mbconv_blocks.MBConvInOutBlock(*args: Any, **kwargs: Any)[source]

__init__(in_channels, out_channels, kernel_size=3, stride=2, activation='swish', norm_layer=None)[source]

forward(x)[source]

Generic ResNet Blocks

ResNet 1d Blocks

These are blocks used to buld flexible ResNets based on 1d convs.

hyperion.torch.layer_blocks.resnet1d_blocks._convk(in_channels, out_channels, kernel_size=3, stride=1, groups=1, dilation=1, bias=False)[source]: kernel k convolution with padding

hyperion.torch.layer_blocks.resnet1d_blocks._conv1(in_channels, out_channels, stride=1, bias=False)[source]: point-wise convolution

hyperion.torch.layer_blocks.resnet1d_blocks._subpixel_conv1(in_channels, out_channels, stride=1, bias=False)[source]: point-wise subpixel convolution

hyperion.torch.layer_blocks.resnet1d_blocks._subpixel_convk(in_channels, out_channels, kernel_size=3, stride=1, groups=1, dilation=1, bias=False)[source]: kernel k subpixel convolution with padding

class hyperion.torch.layer_blocks.resnet1d_blocks.ResNet1dBasicBlock(*args: Any, **kwargs: Any)[source]

expansion = 1

__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, drop_connect_rate=0, groups=1, dilation=1, use_norm=True, norm_layer=None, norm_before=True)[source]

property out_channels

forward(x)[source]

class hyperion.torch.layer_blocks.resnet1d_blocks.ResNet1dBasicDecBlock(*args: Any, **kwargs: Any)[source]

expansion = 1

__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, drop_connect_rate=0, groups=1, dilation=1, use_norm=True, norm_layer=None, norm_before=True)[source]

property out_channels

forward(x)[source]

class hyperion.torch.layer_blocks.resnet1d_blocks.ResNet1dBNBlock(*args: Any, **kwargs: Any)[source]

__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, drop_connect_rate=0, groups=1, dilation=1, expansion=4, use_norm=True, norm_layer=None, norm_before=True)[source]

property out_channels

forward(x)[source]

class hyperion.torch.layer_blocks.resnet1d_blocks.ResNet1dBNDecBlock(*args: Any, **kwargs: Any)[source]

__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, drop_connect_rate=0, groups=1, dilation=1, expansion=4, use_norm=True, norm_layer=None, norm_before=True)[source]

property out_channels

forward(x)[source]

class hyperion.torch.layer_blocks.resnet1d_blocks.SEResNet1dBasicBlock(*args: Any, **kwargs: Any)[source]

expansion = 1

__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, drop_connect_rate=0, groups=1, dilation=1, se_r=16, use_norm=True, norm_layer=None, norm_before=True)[source]

forward(x)[source]

property out_channels

class hyperion.torch.layer_blocks.resnet1d_blocks.SEResNet1dBasicDecBlock(*args: Any, **kwargs: Any)[source]

expansion = 1

__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, drop_connect_rate=0, groups=1, dilation=1, se_r=16, use_norm=True, norm_layer=None, norm_before=True)[source]

property out_channels

forward(x)[source]

class hyperion.torch.layer_blocks.resnet1d_blocks.SEResNet1dBNBlock(*args: Any, **kwargs: Any)[source]

property out_channels

__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, drop_connect_rate=0, groups=1, dilation=1, expansion=4, se_r=16, use_norm=True, norm_layer=None, norm_before=True)[source]

forward(x)[source]

class hyperion.torch.layer_blocks.resnet1d_blocks.SEResNet1dBNDecBlock(*args: Any, **kwargs: Any)[source]

property out_channels

__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, drop_connect_rate=0, groups=1, dilation=1, expansion=4, se_r=16, use_norm=True, norm_layer=None, norm_before=True)[source]

forward(x)[source]

class hyperion.torch.layer_blocks.resnet1d_blocks.ResNet1dEndpoint(*args: Any, **kwargs: Any)[source]

__init__(in_channels, channels, in_scale, scale, upsampling_mode='nearest', activation={'inplace': True, 'name': 'relu6'}, norm_layer=None, norm_before=True)[source]

Class that connects the ouputs of the ResNet1d to the rest of the network when using multilevel feature aggregation

It converts the features of all the levels that we are going to aggregate to the same temporal scale

forward(x)[source]

Res2Net 1d Blocks

These are blocks used to buld flexible Res2Nets based on 1d convs.

hyperion.torch.layer_blocks.res2net1d_blocks._convk(in_channels, out_channels, kernel_size=3, stride=1, groups=1, dilation=1, bias=False)[source]: kernel k convolution with padding

hyperion.torch.layer_blocks.res2net1d_blocks._conv1(in_channels, out_channels, stride=1, bias=False)[source]: point-wise convolution

class hyperion.torch.layer_blocks.res2net1d_blocks.Res2Net1dBasicBlock(*args: Any, **kwargs: Any)[source]

expansion = 1

__init__(in_channels, channels, kernel_size=3, activation={'inplace': True, 'name': 'relu6'}, stride=1, dropout_rate=0, drop_connect_rate=0, width_factor=1, scale=4, groups=1, dilation=1, use_norm=True, norm_layer=None, norm_before=True, se_r=None)[source]

property out_channels

forward(x)[source]

class hyperion.torch.layer_blocks.res2net1d_blocks.Res2Net1dBNBlock(*args: Any, **kwargs: Any)[source]

__init__(in_channels, channels, kernel_size=3, activation={'inplace': True, 'name': 'relu6'}, stride=1, dropout_rate=0, drop_connect_rate=0, width_factor=1, scale=4, groups=1, dilation=1, use_norm=True, norm_layer=None, norm_before=True, se_r=None, num_feats=None)[source]

property out_channels

property expansion

forward(x)[source]

ResNet 2d Blocks

These are blocks used to buld flexible ResNets based on 2d convs.

hyperion.torch.layer_blocks.resnet2d_blocks._convkxk(in_channels, out_channels, kernel_size=3, stride=1, groups=1, dilation=1, bias=False)[source]: kernel k convolution with padding

hyperion.torch.layer_blocks.resnet2d_blocks._conv1x1(in_channels, out_channels, stride=1, bias=False)[source]: point-wise convolution

hyperion.torch.layer_blocks.resnet2d_blocks._subpixel_conv1x1(in_channels, out_channels, stride=1, bias=False)[source]: point-wise subpixel convolution

hyperion.torch.layer_blocks.resnet2d_blocks._subpixel_convkxk(in_channels, out_channels, kernel_size=3, stride=1, groups=1, dilation=1, bias=False)[source]: kernel k subpixel convolution with padding

class hyperion.torch.layer_blocks.resnet2d_blocks.ResNet2dBasicBlock(*args: Any, **kwargs: Any)[source]

expansion = 1

__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, groups=1, dilation=1, use_norm=True, norm_layer=None, norm_before=True)[source]

property out_channels

forward(x)[source]

class hyperion.torch.layer_blocks.resnet2d_blocks.ResNet2dBasicDecBlock(*args: Any, **kwargs: Any)[source]

expansion = 1

__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, groups=1, dilation=1, use_norm=True, norm_layer=None, norm_before=True)[source]

property out_channels

forward(x)[source]

class hyperion.torch.layer_blocks.resnet2d_blocks.ResNet2dBNBlock(*args: Any, **kwargs: Any)[source]

__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, groups=1, dilation=1, expansion=4, use_norm=True, norm_layer=None, norm_before=True)[source]

property out_channels

forward(x)[source]

class hyperion.torch.layer_blocks.resnet2d_blocks.ResNet2dBNDecBlock(*args: Any, **kwargs: Any)[source]

__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, groups=1, dilation=1, expansion=4, use_norm=True, norm_layer=None, norm_before=True)[source]

property out_channels

forward(x)[source]

class hyperion.torch.layer_blocks.resnet2d_blocks.SEResNet2dBasicBlock(*args: Any, **kwargs: Any)[source]

expansion = 1

__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, groups=1, dilation=1, se_r=16, use_norm=True, norm_layer=None, norm_before=True)[source]

property out_channels

forward(x)[source]

class hyperion.torch.layer_blocks.resnet2d_blocks.SEResNet2dBasicDecBlock(*args: Any, **kwargs: Any)[source]

expansion = 1

__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, groups=1, dilation=1, se_r=16, use_norm=True, norm_layer=None, norm_before=True)[source]

property out_channels

forward(x)[source]

class hyperion.torch.layer_blocks.resnet2d_blocks.SEResNet2dBNBlock(*args: Any, **kwargs: Any)[source]

property out_channels

__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, groups=1, dilation=1, expansion=4, se_r=16, use_norm=True, norm_layer=None, norm_before=True)[source]

forward(x)[source]

class hyperion.torch.layer_blocks.resnet2d_blocks.SEResNet2dBNDecBlock(*args: Any, **kwargs: Any)[source]

property out_channels

__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, groups=1, dilation=1, expansion=4, se_r=16, use_norm=True, norm_layer=None, norm_before=True)[source]

forward(x)[source]

Res2Net 2d Blocks

These are blocks used to buld flexible Res2Nets based on 2d convs.

hyperion.torch.layer_blocks.res2net2d_blocks._convkxk(in_channels, out_channels, kernel_size=3, stride=1, groups=1, dilation=1, bias=False)[source]: kernel k convolution with padding

hyperion.torch.layer_blocks.res2net2d_blocks._conv1x1(in_channels, out_channels, stride=1, bias=False)[source]: 1x1 convolution

class hyperion.torch.layer_blocks.res2net2d_blocks.Res2Net2dBasicBlock(*args: Any, **kwargs: Any)[source]

expansion = 1

__init__(in_channels, channels, kernel_size=3, activation={'inplace': True, 'name': 'relu6'}, stride=1, dropout_rate=0, width_factor=1, scale=4, groups=1, dilation=1, use_norm=True, norm_layer=None, norm_before=True, se_r=None, time_se=False, num_feats=None)[source]

property out_channels

forward(x)[source]

class hyperion.torch.layer_blocks.res2net2d_blocks.Res2Net2dBNBlock(*args: Any, **kwargs: Any)[source]

__init__(in_channels, channels, kernel_size=3, activation={'inplace': True, 'name': 'relu6'}, stride=1, dropout_rate=0, width_factor=1, scale=4, groups=1, dilation=1, use_norm=True, norm_layer=None, norm_before=True, se_r=None, time_se=False, num_feats=None)[source]

property out_channels

property expansion

forward(x)[source]

Transformer Blocks

These are blocks used to build Transformers.

class hyperion.torch.layer_blocks.transformer_conv2d_subsampler.TransformerConv2dSubsampler(*args: Any, **kwargs: Any)[source]

Convolutional 2D subsampling (to 1/4 length) Tor transformer

in_feats: input feature dimension

out_feats: Transformer d_model

hid_act: activation layer object

pos_enc: positional encoder layer

time_dim: indicates which is the time dimension in the input tensor

__init__(in_feats, out_feats, hid_act, pos_enc, time_dim=1)[source]

forward(x, mask)[source]

Forward function.

Parameters

x – input tensor with size=(batch, time, num_feats)
mask – mask to indicate valid time steps for x (batch, time1, time2)

Returns

Tensor with output features Tensor with subsampled mask

class hyperion.torch.layer_blocks.transformer_encoder_v1.TransformerEncoderBlockV1(*args: Any, **kwargs: Any)[source]

Building block for transformer encoder.

num_feats: input/output feat. dimension (aka d_model)

self_attn: attention nn.Module or string in [‘scaled-dot-prod-att-v1’, ‘local-scaled-dot-prod-att-v1’]

num_heads: number of heads

feed_forward: position-wise feed-forward nn.Module or string in [‘linear’, ‘conv1dx2’, ‘conv1d-linear’]

d_ff: dimension of middle layer in feed_forward block

ff_kernel_size: kernel size for convolutional versions of ff block

ff_act: ff block hidden activation

ff_dropout_rate: dropout rate for ff block

att_context: maximum context range for local attention

att_dropout_rate: dropout rate for attention block

rel_pos_enc: if True, use relative postional encodings, absolute encodings otherwise.

causal_pos_enc: if True, use causal positional encodings (when rel_pos_enc=True), it assumes that query q_i only attents to key k_j when j<=i

norm_before: if True, use layer norm before layers, otherwise after

concat_after

if True, if concats attention input and output and apply linear transform, i.e.,: y = x + linear(concat(x, att(x)))

if False, y = x + att(x)

__init__(num_feats, self_attn, num_heads, feed_forward, d_ff, ff_kernel_size, ff_act='relu6', ff_dropout_rate=0, att_context=25, att_dropout_rate=0, rel_pos_enc=False, causal_pos_enc=False, norm_before=True, concat_after=False)[source]

static _make_att(att_type, num_feats, num_heads, context, dropout_rate, rel_pos_enc, causal_pos_enc)[source]

Creates multihead attention block from att_type string

Parameters

att_type – string in [‘scaled-dot-prod-att-v1’, ‘local-scaled-dot-prod-att-v1’]
num_feats – input/output feat. dimension (aka d_model)
num_heads – number of heads
dropout_rate – dropout rate for attention block
rel_pos_enc – if True, use relative postional encodings, absolute encodings otherwise.
causal_pos_enc – if True, use causal positional encodings (when rel_pos_enc=True), it assumes that query q_i only attents to key k_j when j<=i

Returns

Attention nn.Module

static _make_ff(ff_type, num_feats, hid_feats, kernel_size, activation, dropout_rate)[source]

Creates position-wise feed forward block from ff_type string

Parameters

ff_type – string in [‘linear’, ‘conv1dx2’, ‘conv1d-linear’]
num_feats – input/output feat. dimension (aka d_model)
hid_feats – dimension of middle layer in feed_forward block
kernel_size – kernel size for convolutional versions of ff block
dropout_rate – dropout rate for ff block
activation – activation function for ff block

Returns

Position-wise feed-forward nn.Module

forward(x, pos_emb=None, mask=None)[source]

Forward pass function

Parameters

x – input tensor with size=(batch, time, num_feats)
pos_emb – positional embedding size=(batch, time2, in_feats) as R_{L-1}, …, R_0, when using relative postional encoder, otherwise None
mask – mask to indicate valid time steps for x (batch, time)

Returns

Tensor with output features Tensor with mask

class hyperion.torch.layer_blocks.transformer_feedforward.PositionwiseFeedForward(*args: Any, **kwargs: Any)[source]

Positionwise feed forward layer for transfomer.

num_feats: input/output dimenstion

hid_feats: number of hidden units

activation: activation function for hidden layers

dropout_rate: dropout rate

time_dim: time dimension in the input tensor

__init__(num_feats, hid_feats, activation='relu6', dropout_rate=0, time_dim=1)[source]

forward(x)[source]

Forward function.

Parameters: x – input size=(batch, time, num_feats)
Returns: tensor size=(batch, time, num_feats)

class hyperion.torch.layer_blocks.transformer_feedforward.Conv1dx2(*args: Any, **kwargs: Any)[source]

Two layer Conv1d for transformer feed-forward block

Introduced in FastSpeech: Fast, Robust and Controllable Text to Speech. .. FastSpeech: Fast, Robust and Controllable Text to Speech:

https://arxiv.org/pdf/1905.09263.pdf

num_channels: input/output channels.

hid_channels: hidden channels

kernel_size: conv kernel size

activation: activation function for hidden layers

dropout_rate: dropout rate

time_dim: indicates what is the time dimension in the input tensor.

__init__(num_channels, hid_channels, kernel_size, dropout_rate=0, time_dim=- 1)[source]

forward(x)[source]

Calculates forward propagation. :param x: input tensors with size=(batch, time, num_channels) or

size=(batch, num_channels, time).

Returns: output tensor same size as input

class hyperion.torch.layer_blocks.transformer_feedforward.Conv1dLinear(*args: Any, **kwargs: Any)[source]

Conv1D + Linear for Transformer block.

num_channels: input/output channels.

hid_channels: hidden channels

kernel_size: conv kernel size

activation: activation function for hidden layers

dropout_rate: dropout rate

time_dim: indicates what is the time dimension in the input tensor.

__init__(num_channels, hid_channels, kernel_size, dropout_rate=0, time_dim=- 1)[source]

forward(x)[source]

Calculates forward propagation. :param x: input tensors with size=(batch, time, num_channels) or

size=(batch, num_channels, time).

Returns: output tensor same size as input

Conformer Blocks

class hyperion.torch.layer_blocks.conformer_encoder_v1.ConformerEncoderBlockV1(*args: Any, **kwargs: Any)[source]

Building block for conformer encoder introduced in

https://arxiv.org/pdf/2005.08100.pdf

This includes some optional extra features not included in the original paper:

Choose local-attention (attending only to close frames instead of all the frames in the sequence)

Choose number of conv blocks

Squeeze-Excitation after depthwise-conv

Allows downsampling in time dimension

Allows choosing activation and layer normalization type

We call this Conformer+

num_feats: input/output feat. dimension (aka d_model)

self_attn: attention module in [‘scaled-dot-prod-att-v1’, ‘local-scaled-dot-prod-att-v1’]

num_heads: number of heads

conv_repeats: number of conv blocks

conv_kernel_size: kernel size for conv blocks

conv_stride: stride for depth-wise conv in first conv block

feed_forward: position-wise feed-forward string in [‘linear’, ‘conv1dx2’, ‘conv1d-linear’]

d_ff: dimension of middle layer in feed_forward block

ff_kernel_size: kernel size for convolutional versions of ff block

hid_act: ff and conv block hidden activation

dropout_rate: dropout rate for ff and conv blocks

att_context: maximum context range for local attention

att_dropout_rate: dropout rate for attention block

causal_pos_enc: if True, use causal positional encodings (when rel_pos_enc=True), it assumes that query q_i only attents to key k_j when j<=i

conv_norm_layer: norm layer constructor for conv block, if None it uses BatchNorm

se_r: Squeeze-Excitation compression ratio, if None it doesn’t use Squeeze-Excitation

ff_macaron: if True, it uses macaron-net style ff layers, otherwise transformer style.

out_lnorm: if True, use LNorm layer at the output as in the conformer paper, we think that this layer is redundant and put it to False by default

concat_after

if True, if concats attention input and output and apply linear transform, i.e.,: y = x + linear(concat(x, att(x)))

if False, y = x + att(x)

__init__(num_feats, self_attn, num_heads, conv_repeats=1, conv_kernel_size=31, conv_stride=1, feed_forward='linear', d_ff=2048, ff_kernel_size=3, hid_act='swish', dropout_rate=0, att_context=25, att_dropout_rate=0, pos_enc_type='rel', causal_pos_enc=False, conv_norm_layer=None, se_r=None, ff_macaron=True, out_lnorm=False, concat_after=False)[source]

static _make_att(att_type, num_feats, num_heads, context, dropout_rate, pos_enc_type, causal_pos_enc)[source]

Creates multihead attention block from att_type string

Parameters

att_type – string in [‘scaled-dot-prod-att-v1’, ‘local-scaled-dot-prod-att-v1’]
num_feats – input/output feat. dimension (aka d_model)
num_heads – number of heads
dropout_rate – dropout rate for attention block
pos_enc_type – type of positional encoder
causal_pos_enc – if True, use causal positional encodings (when rel_pos_enc=True), it assumes that query q_i only attents to key k_j when j<=i

Returns

Attention nn.Module

static _make_ff(ff_type, num_feats, hid_feats, kernel_size, activation, dropout_rate)[source]

Creates position-wise feed forward block from ff_type string

Parameters

ff_type – string in [‘linear’, ‘conv1dx2’, ‘conv1d-linear’]
num_feats – input/output feat. dimension (aka d_model)
hid_feats – dimension of middle layer in feed_forward block
kernel_size – kernel size for convolutional versions of ff block
dropout_rate – dropout rate for ff block
activation – activation function for ff block

Returns

Position-wise feed-forward nn.Module

forward(x, pos_emb=None, mask=None)[source]

Forward pass function

Parameters

x – input tensor with size=(batch, time, num_feats)
pos_emb – positional embedding size=(batch, time2, in_feats) as R_{L-1}, …, R_0, when using relative postional encoder, otherwise None
mask – mask to indicate valid time steps for x (batch, time)

Returns

Tensor with output features Tensor with mask

hyperion.torch.layer_blocks.conformer_conv._conv1(in_channels, out_channels, bias=False)[source]: 1x1 convolution

hyperion.torch.layer_blocks.conformer_conv._dwconvk(channels, kernel_size, stride=1, bias=False)[source]: kxk depth-wise convolution with padding

class hyperion.torch.layer_blocks.conformer_conv.ConformerConvBlock(*args: Any, **kwargs: Any)[source]

Convolutional block for conformer introduced at

https://arxiv.org/pdf/2005.08100.pdf

This includes some optional extra features not included in the original paper:

Squeeze-Excitation after depthwise-conv

Allows downsampling in time dimension

Allows choosing activation and layer normalization type

num_channels: number of input/output channels

kernel_size: kernel_size for depth-wise conv

stride: stride for depth-wise conv

activation: activation function str or object

norm_layer: norm layer constructor, if None it uses BatchNorm

dropout_rate: dropout rate

se_r: Squeeze-Excitation compression ratio, if None it doesn’t use Squeeze-Excitation

__init__(num_channels, kernel_size, stride=1, activation='swish', norm_layer=None, dropout_rate=0, se_r=None)[source]

forward(x)[source]

Forward function

Parameters: x – input size = (batch, num_channels, time)

Returns: torch.Tensor size = (batch, num_channels, (time-1)//stride+1)

Torch Models and Model Loader

All PyTorch ML Neural Architectures and Models in Hyperion derive from the same base class

class hyperion.torch.TorchModel(*args: Any, **kwargs: Any)[source]

get_config()[source]

copy()[source]

save(file_path)[source]

freeze()[source]

unfreeze()[source]

classmethod load(file_path=None, cfg=None, state_dict=None)[source]

get_reg_loss()[source]

get_loss()[source]

property device

__init__(*args: Any, **kwargs: Any) → None

The TorchModelLoader can load any model or network architecture from file.

Neural Architectures

All neural architectures derive from the NetArch class.

class hyperion.torch.narchs.net_arch.NetArch(*args: Any, **kwargs: Any)[source]

in_context()[source]

in_dim()[source]

out_dim()[source]

in_shape()[source]

out_shape(in_shape=None)[source]

__init__(*args: Any, **kwargs: Any) → None

copy()

property device

freeze()

get_config()

get_loss()

get_reg_loss()

classmethod load(file_path=None, cfg=None, state_dict=None)

save(file_path)

unfreeze()

The TorchNALoader can load any network architecture from file.

class hyperion.torch.narchs.torch_na_loader.TorchNALoader[source]

static load(file_path, extra_objs={})[source]

static load_from_cfg(cfg, state_dict=None, extra_objs={})[source]

Acoustic Features

class hyperion.torch.narchs.audio_feats_mvn.AudioFeatsMVN(*args: Any, **kwargs: Any)[source]

Acoustic Feature Extractor + ST-MVN Optional SpecAugment

__init__(audio_feats, mvn=None, spec_augment=None, trans=False, aug_after_mvn=False)[source]

property fs

property frame_length

property frame_shift

copy()

property device

forward(x, lengths=None)[source]

freeze()

get_loss()

get_reg_loss()

in_context()

in_dim()

in_shape()

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

save(file_path)

unfreeze()

get_config()[source]

static filter_args(**kwargs)[source]

add_class_args(prefix=None)[source]

Fully Connected Network

Classification Head

class hyperion.torch.narchs.classif_head.ClassifHead(*args: Any, **kwargs: Any)[source]

Classification Head for x-vector style networks

in_feats: input features

num_classes: number of output classes

embed_dim: dimension of embedding layer

num_embed_layers: number of hidden layers

hid_act: str or dict hidden activation type in [‘relu’, ‘relu6’, ‘swish’, … ]

loss_type: type of loss function that will be used with the x-vector in [‘softmax’, ‘cos-softmax’, ‘arc-softmax’], corresponding to standard cross-entorpy, additive margin softmax or additive angular margin softmax.

s: scale parameter for cos-softmax and arc-softmax

margin: margin parameter for cos-softmax and arc-softmax

margin_warmup_epochs: number of epochs to anneal the margin from 0 to margin

num_subcenters: number of subcenters in subcenter losses

norm_layer: norm_layer object or str indicating type norm layer, if None it uses BatchNorm1d

use_norm: it True it uses layer/batch-normalization

norm_before: if True, layer-norm is before the activation function

__init__(in_feats, num_classes, embed_dim=256, num_embed_layers=1, hid_act={'inplace': True, 'name': 'relu'}, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=0, num_subcenters=2, norm_layer=None, use_norm=True, norm_before=True, dropout_rate=0)[source]

rebuild_output_layer(num_classes, loss_type, s, margin, margin_warmup_epochs, num_subcenters=2)[source]

set_margin(margin)[source]

set_margin_warmup_epochs(margin_warmup_epochs)[source]

set_s(s)[source]

update_margin(epoch)[source]

freeze_layers(layer_list)[source]

put_layers_in_eval_mode(layer_list)[source]

forward(x, y=None)[source]

forward_hid_feats(x, y=None, layers=None, return_output=False)[source]

extract_embed(x, embed_layer=0)[source]

copy()

property device

freeze()

get_config()[source]

get_loss()

get_reg_loss()

in_context()

in_dim()

in_shape()

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

save(file_path)

unfreeze()

static filter_args(**kwargs)[source]

static add_class_args(parser, prefix=None)[source]

static add_argparse_args(parser, prefix=None)

Deep Convolutional Encoder/Decoders

These are Encoder/Decoders based on Deep Convolutional Networks 1d and 2d.

DC Encoder 1d

class hyperion.torch.narchs.dc1d_encoder.DC1dEncoder(*args: Any, **kwargs: Any)[source]

__init__(in_feats, in_conv_channels=128, in_kernel_size=3, in_stride=1, conv_repeats=[1, 1, 1], conv_channels=[128, 64, 32], conv_kernel_sizes=3, conv_strides=2, conv_dilations=1, head_channels=0, hid_act='relu6', head_act=None, dropout_rate=0, use_norm=True, norm_layer=None, norm_before=True)[source]

in_context()[source]

in_shape()[source]

out_shape(in_shape=None)[source]

forward(x)[source]

get_config()[source]

static filter_args(**kwargs)[source]

static add_class_args(parser, prefix=None, head_channels=False, in_feats=False)[source]

static add_argparse_args(parser, prefix=None, head_channels=False, in_feats=False)

copy()

property device

freeze()

get_loss()

get_reg_loss()

in_dim()

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

save(file_path)

unfreeze()

DC Decoder 1d

class hyperion.torch.narchs.dc1d_decoder.DC1dDecoder(*args: Any, **kwargs: Any)[source]

__init__(in_channels=32, in_conv_channels=32, in_kernel_size=3, in_stride=1, conv_repeats=[1, 1, 1], conv_channels=[64, 128, 128], conv_kernel_sizes=3, conv_strides=2, conv_dilations=1, head_channels=0, hid_act='relu6', head_act=None, dropout_rate=0, use_norm=True, norm_layer=None, norm_before=True)[source]

in_context()[source]

in_shape()[source]

out_shape(in_shape=None)[source]

forward(x, target_shape=None)[source]

get_config()[source]

static filter_args(**kwargs)[source]

static add_class_args(parser, prefix=None, head_channels=False)[source]

static add_argparse_args(parser, prefix=None, head_channels=False)

copy()

property device

freeze()

get_loss()

get_reg_loss()

in_dim()

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

save(file_path)

unfreeze()

DC Encoder 2d

class hyperion.torch.narchs.dc2d_encoder.DC2dEncoder(*args: Any, **kwargs: Any)[source]

__init__(in_channels=1, in_conv_channels=128, in_kernel_size=3, in_stride=1, conv_repeats=[1, 1, 1], conv_channels=[128, 64, 32], conv_kernel_sizes=3, conv_strides=2, conv_dilations=1, head_channels=0, hid_act='relu6', head_act=None, dropout_rate=0, use_norm=True, norm_layer=None, norm_before=True)[source]

in_context()[source]

in_shape()[source]

out_shape(in_shape=None)[source]

forward(x)[source]

get_config()[source]

static filter_args(**kwargs)[source]

static add_class_args(parser, prefix=None, head_channels=False)[source]

static add_argparse_args(parser, prefix=None, head_channels=False)

copy()

property device

freeze()

get_loss()

get_reg_loss()

in_dim()

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

save(file_path)

unfreeze()

DC Decoder 2d

class hyperion.torch.narchs.dc2d_decoder.DC2dDecoder(*args: Any, **kwargs: Any)[source]

__init__(in_channels=32, in_conv_channels=32, in_kernel_size=3, in_stride=1, conv_repeats=[1, 1, 1], conv_channels=[64, 128, 128], conv_kernel_sizes=3, conv_strides=2, conv_dilations=1, head_channels=0, hid_act='relu6', head_act=None, dropout_rate=0, use_norm=True, norm_layer=None, norm_before=True)[source]

in_context()[source]

in_shape()[source]

out_shape(in_shape=None)[source]

forward(x, target_shape=None)[source]

get_config()[source]

static filter_args(**kwargs)[source]

static add_class_args(parser, prefix=None, head_channels=False)[source]

static add_argparse_args(parser, prefix=None, head_channels=False)

copy()

property device

freeze()

get_loss()

get_reg_loss()

in_dim()

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

save(file_path)

unfreeze()

TDNN Variants

These are variants of TDNNs. There is a factory class that creates TDNN networks from config params.

class hyperion.torch.narchs.tdnn_factory.TDNNFactory[source]

static create(tdnn_type, num_enc_blocks, in_feats, enc_hid_units, enc_expand_units=None, kernel_size=3, dilation=1, dilation_factor=1, hid_act={'inplace': True, 'name': 'relu6'}, out_units=0, out_act=None, dropout_rate=0, norm_layer=None, use_norm=True, norm_before=True, in_norm=True)[source]

filter_args()[source]

static add_class_args(parser, prefix=None)[source]

static add_argparse_args(parser, prefix=None)

TDNN

class hyperion.torch.narchs.tdnn.TDNNV1(*args: Any, **kwargs: Any)[source]

__init__(num_blocks, in_units, hid_units, out_units=0, kernel_size=3, dilation=1, dilation_factor=1, hid_act={'inplace': True, 'name': 'relu'}, out_act=None, dropout_rate=0, norm_layer=None, use_norm=True, norm_before=True, in_norm=True, pooling=None)[source]

property in_context

forward(x, use_amp=False)[source]

copy()

property device

freeze()

get_config()[source]

get_loss()

get_reg_loss()

in_dim()

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

save(file_path)

unfreeze()

in_shape()[source]

out_shape(in_shape=None)[source]

E-TDNN

class hyperion.torch.narchs.etdnn.ETDNNV1(*args: Any, **kwargs: Any)[source]

__init__(num_blocks, in_units, hid_units, out_units=0, kernel_size=3, dilation=1, dilation_factor=1, hid_act={'inplace': True, 'name': 'relu'}, out_act=None, dropout_rate=0, norm_layer=None, use_norm=True, norm_before=True, in_norm=True, pooling=None)[source]

property in_context

forward(x)[source]

copy()

property device

freeze()

get_config()[source]

get_loss()

get_reg_loss()

in_dim()

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

save(file_path)

unfreeze()

in_shape()[source]

out_shape(in_shape=None)[source]

Residual E-TDNN

class hyperion.torch.narchs.resetdnn.ResETDNNV1(*args: Any, **kwargs: Any)[source]

__init__(num_blocks, in_units, hid_units, expand_units, out_units=0, kernel_size=3, dilation=1, dilation_factor=1, hid_act={'inplace': True, 'name': 'relu'}, out_act=None, dropout_rate=0, norm_layer=None, use_norm=True, norm_before=True, in_norm=True, pooling=None)[source]

property in_context

forward(x)[source]

copy()

property device

freeze()

get_config()[source]

get_loss()

get_reg_loss()

in_dim()

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

save(file_path)

unfreeze()

in_shape()[source]

out_shape(in_shape=None)[source]

Cannonical ResNets/SE-ResNets/Res2Nets

These classes can be used to build cannonical ResNets, SE-ResNets and Res2Nets. There is a factory class that creates ResNets from config params.

class hyperion.torch.narchs.resnet_factory.ResNetFactory[source]

static create(resnet_type, in_channels, conv_channels=64, base_channels=64, out_units=0, hid_act={'inplace': True, 'name': 'relu6'}, out_act=None, in_kernel_size=7, in_stride=2, zero_init_residual=False, groups=1, replace_stride_with_dilation=None, dropout_rate=0, norm_layer=None, norm_before=True, do_maxpool=True, in_norm=True, se_r=16, in_feats=None, res2net_scale=4, res2net_width_factor=1)[source]

filter_args()[source]

static add_class_args(parser, prefix=None)[source]

static add_argparse_args(parser, prefix=None)

class hyperion.torch.narchs.resnet.ResNet(*args: Any, **kwargs: Any)[source]

ResNet2D base class

block: resnet basic block type in [‘basic’, ‘bn’, ‘sebasic’, ‘sebn’], meaning basic resnet block, bottleneck resnet block, basic block with squeeze-excitation, and bottleneck block with squeeze-excitation

num_layers: list with the number of layers in each of the 4 layer blocks that we find in resnets, after each layer block feature maps are downsmapled times 2 in each dimension and channels are upsampled times 2.

in_channels: number of input channels

conv_channels: number of output channels in first conv layer (stem)

base_channels: number of channels in the first layer block

out_units: number of logits in the output layer, if 0 there is no output layer and resnet is used just as feature extractor, for example for x-vector encoder.

in_kernel_size: kernels size of first conv layer

hid_act: str or dictionary describing hidden activations.

out_act: output activation

zero_init_residual: initializes batchnorm weights to zero so each residual block behaves as identitiy at the beggining. We observed worse results when using this option in x-vectors

groups: number of groups in convolutions

replace_stride_with_dilation: use dialted conv nets instead of downsammpling, we never tested this.

dropout_rate: dropout rate

norm_layer: norm_layer object or str indicating type layer-norm object, if None it uses BatchNorm2d

do_maxpool: if False, removes the maxpooling layer at the stem of the network.

in_norm: if True, adds another batch norm layer in the input

se_r: squeeze-excitation dimension compression

time_se: if True squeeze-excitation embedding is obtaining by averagin only in the time dimension, instead of time-freq dimension or HxW dimensions

in_feats: input feature size (number of components in dimension of 2 of input tensor), this is only required when time_se=True to calculcate the size of the squeeze excitation matrices.

__init__(block, num_layers, in_channels, conv_channels=64, base_channels=64, out_units=0, hid_act={'inplace': True, 'name': 'relu6'}, out_act=None, in_kernel_size=7, in_stride=2, zero_init_residual=False, multilevel=False, endpoint_channels=64, groups=1, replace_stride_with_dilation=None, dropout_rate=0, norm_layer=None, norm_before=True, do_maxpool=True, in_norm=True, se_r=16, time_se=False, in_feats=None, res2net_scale=4, res2net_width_factor=1)[source]

_compute_out_size(in_size)[source]

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

in_context()[source]

Returns: Tuple (past, future) context required to predict one frame.

in_shape()[source]

Returns: Tuple describing input shape for the network

out_shape(in_shape=None)[source]

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

forward(x, use_amp=False)[source]

_forward(x)[source]

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

forward_hid_feats(x, layers=None, return_output=False)[source]

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

get_config()[source]: Gets network config :returns: dictionary with config params

copy()

property device

freeze()

get_loss()

get_reg_loss()

in_dim()

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.ResNet18(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.ResNet34(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.ResNet50(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.ResNet101(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.ResNet152(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.ResNext50_32x4d(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.ResNext101_32x8d(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.WideResNet50(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.WideResNet101(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.LResNet18(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.LResNet34(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.LResNet50(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.LResNext50_4x4d(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.SEResNet18(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.SEResNet34(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.SEResNet50(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.SEResNet101(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.SEResNet152(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.SEResNext50_32x4d(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.SEResNext101_32x8d(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.SEWideResNet50(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.SEWideResNet101(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.SELResNet18(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.SELResNet34(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.SELResNet50(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.SELResNext50_4x4d(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.TSEResNet18(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.TSEResNet34(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.TSEResNet50(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.TSEResNet101(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.TSEResNet152(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.TSEResNext50_32x4d(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.TSEResNext101_32x8d(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.TSEWideResNet50(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.TSEWideResNet101(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.TSELResNet18(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.TSELResNet34(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.TSELResNet50(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.TSELResNext50_4x4d(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.Res2Net18(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.Res2Net34(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.Res2Net50(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.Res2Net101(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.Res2Net152(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.Res2Next50_32x4d(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.Res2Next101_32x8d(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.WideRes2Net50(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.WideRes2Net101(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.LRes2Net50(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.LRes2Next50_4x4d(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.SERes2Net18(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.SERes2Net34(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.SERes2Net50(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.SERes2Net101(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.SERes2Net152(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.SERes2Next50_32x4d(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.SERes2Next101_32x8d(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.SEWideRes2Net50(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.SEWideRes2Net101(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.SELRes2Net50(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.SELRes2Next50_4x4d(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.TSERes2Net18(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.TSERes2Net34(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.TSERes2Net50(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.TSERes2Net101(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

class hyperion.torch.narchs.resnet.TSERes2Net152(*args: Any, **kwargs: Any)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

__init__(in_channels, **kwargs)[source]

class hyperion.torch.narchs.resnet.TSERes2Next50_32x4d(*args: Any, **kwargs: Any)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

__init__(in_channels, **kwargs)[source]

class hyperion.torch.narchs.resnet.TSERes2Next101_32x8d(*args: Any, **kwargs: Any)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

__init__(in_channels, **kwargs)[source]

class hyperion.torch.narchs.resnet.TSEWideRes2Net50(*args: Any, **kwargs: Any)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

__init__(in_channels, **kwargs)[source]

class hyperion.torch.narchs.resnet.TSEWideRes2Net101(*args: Any, **kwargs: Any)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

__init__(in_channels, **kwargs)[source]

class hyperion.torch.narchs.resnet.TSELRes2Net50(*args: Any, **kwargs: Any)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

__init__(in_channels, **kwargs)[source]

class hyperion.torch.narchs.resnet.TSELRes2Next50_4x4d(*args: Any, **kwargs: Any)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

__init__(in_channels, **kwargs)[source]

class hyperion.torch.narchs.resnet.LResNet34_345(*args: Any, **kwargs: Any)[source]

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()

property device

forward(x, use_amp=False)

forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

Returns: Tuple (past, future) context required to predict one frame.

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters: in_shape – input shape
Returns: Tuple describing output shape for the network

save(file_path)

unfreeze()

__init__(in_channels, **kwargs)[source]

SpineNets/Spine2Nets

class hyperion.torch.narchs.spinenet_factory.SpineNetFactory[source]

static create(spinenet_type, in_channels, output_levels=[3, 4, 5, 6, 7], endpoints_num_filters=256, resample_alpha=0.5, block_repeats=1, filter_size_scale=1.0, conv_channels=64, base_channels=64, out_units=0, hid_act={'inplace': True, 'name': 'relu6'}, out_act=None, in_kernel_size=7, in_stride=2, zero_init_residual=False, groups=1, dropout_rate=0, norm_layer=None, norm_before=True, do_maxpool=True, in_norm=True, se_r=16, in_feats=None, res2net_scale=4, res2net_width_factor=1)[source]

filter_args()[source]

static add_class_args(parser, prefix=None)[source]

static add_argparse_args(parser, prefix=None)

class hyperion.torch.narchs.spinenet.SpineNet(*args: Any, **kwargs: Any)[source]

__init__(in_channels, block_specs=None, output_levels=[3, 4, 5, 6, 7], endpoints_num_filters=256, resample_alpha=0.5, feature_output_level=None, block_repeats=1, filter_size_scale=1.0, conv_channels=64, base_channels=64, out_units=0, concat=False, do_endpoint_conv=True, concat_ax=3, upsampling_type='nearest', hid_act={'inplace': True, 'name': 'relu6'}, out_act=None, in_kernel_size=7, in_stride=2, zero_init_residual=False, groups=1, dropout_rate=0, norm_layer=None, norm_before=True, do_maxpool=True, in_norm=True, in_feats=None, se_r=16, time_se=False, has_se=False, is_res2net=False, res2net_scale=4, res2net_width_factor=1)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters

in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_make_permuted_blocks(block_specs)[source]: Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs)[source]: Builds the cross-scale connections between the blocks.

_make_endpoints()[source]: Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_compute_max_context(in_context)[source]: Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)[source]

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_compute_channel_size()[source]

Returns: If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

in_shape()[source]

Returns: Tuple describing input shape for the network

out_shape(in_shape=None)[source]: Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

_match_feat_shape(feat0, feat1)[source]: Match shape between feats of the input connections.

forward(x, use_amp=False)[source]

_forward(x)[source]

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

get_config()[source]: Gets network config :returns: dictionary with config params

copy()

property device

freeze()

get_loss()

get_reg_loss()

in_context()

in_dim()

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

save(file_path)

unfreeze()

class hyperion.torch.narchs.spinenet.SpineNet49(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters

in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()

Returns: If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context): Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints(): Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs): Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs): Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1): Match shape between feats of the input connections.

copy()

property device

forward(x, use_amp=False)

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None): Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)

unfreeze()

class hyperion.torch.narchs.spinenet.SpineNet49S(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters

in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()

Returns: If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context): Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints(): Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs): Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs): Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1): Match shape between feats of the input connections.

copy()

property device

forward(x, use_amp=False)

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None): Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)

unfreeze()

class hyperion.torch.narchs.spinenet.SpineNet96(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters

in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()

Returns: If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context): Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints(): Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs): Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs): Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1): Match shape between feats of the input connections.

copy()

property device

forward(x, use_amp=False)

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None): Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)

unfreeze()

class hyperion.torch.narchs.spinenet.SpineNet143(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters

in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()

Returns: If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context): Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints(): Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs): Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs): Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1): Match shape between feats of the input connections.

copy()

property device

forward(x, use_amp=False)

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None): Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)

unfreeze()

class hyperion.torch.narchs.spinenet.SpineNet190(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters

in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()

Returns: If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context): Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints(): Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs): Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs): Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1): Match shape between feats of the input connections.

copy()

property device

forward(x, use_amp=False)

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None): Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)

unfreeze()

class hyperion.torch.narchs.spinenet.LSpineNet49(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters

in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()

Returns: If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context): Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints(): Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs): Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs): Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1): Match shape between feats of the input connections.

copy()

property device

forward(x, use_amp=False)

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None): Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)

unfreeze()

class hyperion.torch.narchs.spinenet.LSpineNet49_subpixel(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters

in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()

Returns: If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context): Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints(): Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs): Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs): Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1): Match shape between feats of the input connections.

copy()

property device

forward(x, use_amp=False)

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None): Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)

unfreeze()

class hyperion.torch.narchs.spinenet.LSpineNet49_bilinear(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters

in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()

Returns: If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context): Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints(): Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs): Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs): Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1): Match shape between feats of the input connections.

copy()

property device

forward(x, use_amp=False)

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None): Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)

unfreeze()

class hyperion.torch.narchs.spinenet.LSpineNet49_5(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters

in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()

Returns: If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context): Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints(): Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs): Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs): Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1): Match shape between feats of the input connections.

copy()

property device

forward(x, use_amp=False)

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None): Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)

unfreeze()

class hyperion.torch.narchs.spinenet.LSpine2Net49(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters

in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()

Returns: If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context): Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints(): Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs): Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs): Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1): Match shape between feats of the input connections.

copy()

property device

forward(x, use_amp=False)

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None): Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)

unfreeze()

class hyperion.torch.narchs.spinenet.SELSpine2Net49(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters

in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()

Returns: If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context): Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints(): Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs): Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs): Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1): Match shape between feats of the input connections.

copy()

property device

forward(x, use_amp=False)

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None): Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)

unfreeze()

class hyperion.torch.narchs.spinenet.TSELSpine2Net49(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters

in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()

Returns: If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context): Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints(): Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs): Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs): Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1): Match shape between feats of the input connections.

copy()

property device

forward(x, use_amp=False)

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None): Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)

unfreeze()

class hyperion.torch.narchs.spinenet.Spine2Net49(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters

in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()

Returns: If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context): Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints(): Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs): Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs): Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1): Match shape between feats of the input connections.

copy()

property device

forward(x, use_amp=False)

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None): Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)

unfreeze()

class hyperion.torch.narchs.spinenet.SESpine2Net49(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters

in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()

Returns: If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context): Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints(): Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs): Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs): Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1): Match shape between feats of the input connections.

copy()

property device

forward(x, use_amp=False)

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None): Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)

unfreeze()

class hyperion.torch.narchs.spinenet.TSESpine2Net49(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters

in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()

Returns: If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context): Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints(): Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs): Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs): Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1): Match shape between feats of the input connections.

copy()

property device

forward(x, use_amp=False)

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None): Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)

unfreeze()

class hyperion.torch.narchs.spinenet.Spine2Net49S(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters

in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()

Returns: If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context): Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints(): Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs): Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs): Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1): Match shape between feats of the input connections.

copy()

property device

forward(x, use_amp=False)

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None): Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)

unfreeze()

class hyperion.torch.narchs.spinenet.SESpine2Net49S(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters

in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()

Returns: If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context): Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints(): Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs): Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs): Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1): Match shape between feats of the input connections.

copy()

property device

forward(x, use_amp=False)

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None): Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)

unfreeze()

class hyperion.torch.narchs.spinenet.TSESpine2Net49S(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters

in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()

Returns: If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context): Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints(): Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs): Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs): Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1): Match shape between feats of the input connections.

copy()

property device

forward(x, use_amp=False)

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None): Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)

unfreeze()

class hyperion.torch.narchs.spinenet.LR0_SP53(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters

in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()

Returns: If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context): Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints(): Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs): Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs): Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1): Match shape between feats of the input connections.

copy()

property device

forward(x, use_amp=False)

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None): Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)

unfreeze()

class hyperion.torch.narchs.spinenet.R0_SP53(*args: Any, **kwargs: Any)[source]

__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters

in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()

Returns: If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context): Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints(): Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs): Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs): Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1): Match shape between feats of the input connections.

copy()

property device

forward(x, use_amp=False)

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None): Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)

unfreeze()

class hyperion.torch.narchs.spinenet.SpineNet49_concat_time(*args: Any, **kwargs: Any)[source]

_compute_channel_size()

Returns: If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context): Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)

Computes output size given input size.: Output size is not the same as input size because of downsampling steps.

Parameters: in_size – input size of the H or W dimensions
Returns: output_size

_forward(x)

forward function

Parameters: x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
Returns: Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints(): Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs): Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs): Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1): Match shape between feats of the input connections.

copy()

property device

forward(x, use_amp=False)

freeze()

get_config(): Gets network config :returns: dictionary with config params

get_loss()

get_reg_loss()

in_context()

in_dim()

in_shape()

Returns: Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

out_shape(in_shape=None): Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)

unfreeze()

__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters

in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

ResNet Encoder/Decoders

These are Encoder/Decoders based on flexible ResNets 1d and 2d.

ResNet Encoder 1d

class hyperion.torch.narchs.resnet1d_encoder.ResNet1dEncoder(*args: Any, **kwargs: Any)[source]

__init__(in_feats, in_conv_channels=128, in_kernel_size=3, in_stride=1, resb_type='basic', resb_repeats=[1, 1, 1], resb_channels=128, resb_kernel_sizes=3, resb_strides=2, resb_dilations=1, resb_groups=1, head_channels=0, hid_act='relu6', head_act=None, dropout_rate=0, drop_connect_rate=0, se_r=16, res2net_width_factor=1, res2net_scale=4, multilayer=False, multilayer_concat=False, endpoint_channels=None, endpoint_layers=None, endpoint_scale_layer=- 1, use_norm=True, norm_layer=None, norm_before=True, upsampling_mode='nearest')[source]

in_context()[source]

in_shape()[source]

out_shape(in_shape=None)[source]

forward(x)[source]

copy()

property device

forward_hid_feats(x, layers=None, return_output=False)[source]

freeze()

get_loss()

get_reg_loss()

in_dim()

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

save(file_path)

unfreeze()

get_config()[source]

static filter_args(**kwargs)[source]

static add_class_args(parser, prefix=None, skip={'in_feats'})[source]

static add_argparse_args(parser, prefix=None, skip={'in_feats'})

ResNet Decoder 1d

class hyperion.torch.narchs.resnet1d_decoder.ResNet1dDecoder(*args: Any, **kwargs: Any)[source]

__init__(in_channels=128, in_conv_channels=128, in_kernel_size=3, in_stride=1, resb_type='basic', resb_repeats=[1, 1, 1], resb_channels=128, resb_kernel_sizes=3, resb_strides=2, resb_dilations=1, resb_groups=1, head_channels=0, hid_act='relu6', head_act=None, dropout_rate=0, se_r=16, use_norm=True, norm_layer=None, norm_before=True)[source]

in_context()[source]

in_shape()[source]

out_shape(in_shape=None)[source]

forward(x, target_shape=None)[source]

get_config()[source]

copy()

property device

static filter_args(**kwargs)[source]

freeze()

get_loss()

get_reg_loss()

in_dim()

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

save(file_path)

unfreeze()

static add_class_args(parser, prefix=None)[source]

static add_argparse_args(parser, prefix=None)

ResNet Encoder 2d

class hyperion.torch.narchs.resnet2d_encoder.ResNet2dEncoder(*args: Any, **kwargs: Any)[source]

__init__(in_channels=1, in_conv_channels=64, in_kernel_size=3, in_stride=1, resb_type='basic', resb_repeats=[2, 2, 2, 2], resb_channels=[64, 128, 256, 512], resb_kernel_sizes=3, resb_strides=2, resb_dilations=1, resb_groups=1, head_channels=0, hid_act='relu6', head_act=None, dropout_rate=0, se_r=16, time_se=False, in_feats=None, res2net_width_factor=1, res2net_scale=4, use_norm=True, norm_layer=None, norm_before=True)[source]

in_context()[source]

in_shape()[source]

out_shape(in_shape=None)[source]

copy()

property device

forward(x)[source]

freeze()

get_loss()

get_reg_loss()

in_dim()

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

save(file_path)

unfreeze()

get_config()[source]

static filter_args(**kwargs)[source]

static add_class_args(parser, prefix=None, skip={})[source]

static add_argparse_args(parser, prefix=None, skip={})

ResNet Decoder 2d

class hyperion.torch.narchs.resnet2d_decoder.ResNet2dDecoder(*args: Any, **kwargs: Any)[source]

__init__(in_channels=512, in_conv_channels=512, in_kernel_size=3, in_stride=1, resb_type='basic', resb_repeats=[2, 2, 2, 2], resb_channels=[512, 256, 128, 64], resb_kernel_sizes=3, resb_strides=2, resb_dilations=1, resb_groups=1, head_channels=0, hid_act='relu6', head_act=None, dropout_rate=0, se_r=16, use_norm=True, norm_layer=None, norm_before=True)[source]

in_context()[source]

in_shape()[source]

out_shape(in_shape=None)[source]

forward(x, target_shape=None)[source]

get_config()[source]

copy()

property device

static filter_args(**kwargs)[source]

freeze()

get_loss()

get_reg_loss()

in_dim()

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

save(file_path)

unfreeze()

static add_class_args(parser, prefix=None)[source]

static add_argparse_args(parser, prefix=None)

EfficientNet

Transformer

class hyperion.torch.narchs.transformer_encoder_v1.TransformerEncoderV1(*args: Any, **kwargs: Any)[source]

Transformer encoder module.

in_feats: input features dimension

d_model: encoder blocks feature dimension

num_heads: number of heads

num_blocks: number of self attn blocks

att_type: string in [‘scaled-dot-prod-att-v1’, ‘local-scaled-dot-prod-att-v1’]

att_context: maximum context range for local attention

ff_type: string in [‘linear’, ‘conv1dx2’, ‘conv1d-linear’]

d_ff: dimension of middle layer in feed_forward block

ff_kernel_size: kernel size for convolutional versions of ff block

ff_dropout_rate: dropout rate for ff block

pos_dropout_rate: dropout rate for positional encoder

att_dropout_rate: dropout rate for attention block

in_layer_type: input layer block type in [‘linear’,’conv2d-sub’, ‘embed’, None]

rel_pos_enc: if True, use relative postional encodings, absolute encodings otherwise.

causal_pos_enc: if True, use causal positional encodings (when rel_pos_enc=True), it assumes that query q_i only attents to key k_j when j<=i

hid_act: hidden activations in ff and input blocks

norm_before: if True, use layer norm before layers, otherwise after

concat_after

if True, if concats attention input and output and apply linear transform, i.e.,: y = x + linear(concat(x, att(x)))

if False, y = x + att(x)

padding_idx: padding idx for embed layer

in_time_dim: time dimension in the input Tensor

out_time_dim: dimension that we want to be time in the output tensor

__init__(in_feats, d_model=256, num_heads=4, num_blocks=6, att_type='scaled-dot-prod-v1', att_context=25, ff_type='linear', d_ff=2048, ff_kernel_size=1, ff_dropout_rate=0.1, pos_dropout_rate=0.1, att_dropout_rate=0.0, in_layer_type='conv2d-sub', rel_pos_enc=False, causal_pos_enc=False, hid_act='relu6', norm_before=True, concat_after=False, padding_idx=- 1, in_time_dim=- 1, out_time_dim=1)[source]

forward(x, mask=None, target_shape=None, use_amp=False)[source]

_forward(x, mask=None, target_shape=None)[source]

Forward pass function

Parameters

x – input tensor with size=(batch, time, num_feats)
mask – mask to indicate valid time steps for x (batch, time)

Returns

Tensor with output features Tensor with mask

get_config()[source]: Gets network config :returns: dictionary with config params

in_context()[source]

in_shape()[source]

Input shape for network

Returns: Tuple describing input shape

out_shape(in_shape=None)[source]

Infers the network output shape given the input shape

Parameters: in_shape – input shape tuple
Returns: Tuple with the output shape

static filter_args(**kwargs)[source]

Filters arguments correspondin to TransformerXVector: from args dictionary

Parameters: kwargs – args dictionary
Returns: args dictionary

static add_class_args(parser, prefix=None, in_feats=False)[source]

Adds Transformer config parameters to argparser

Parameters

parser – argparse object
prefix – prefix string to add to the argument names

static add_argparse_args(parser, prefix=None, in_feats=False)

Adds Transformer config parameters to argparser

Parameters

parser – argparse object
prefix – prefix string to add to the argument names

copy()

property device

freeze()

get_loss()

get_reg_loss()

in_dim()

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

save(file_path)

unfreeze()

Conformer

class hyperion.torch.narchs.conformer_encoder_v1.ConformerEncoderV1(*args: Any, **kwargs: Any)[source]

Conformer encoder introduced in

https://arxiv.org/pdf/2005.08100.pdf

This includes some optional extra features not included in the original paper:

Choose local-attention (attending only to close frames instead of all the frames in the sequence)

Choose number of conv blocks in each conformer layer

Squeeze-Excitation after depthwise-conv

Allows downsampling in time dimension

Allows choosing activation and layer normalization type

We call this Conformer+

This becomes a standard Transformer by setting conv_repeats=0, pos_enc_type=’abs’, ff_macaron=False.

in_feats: input features dimension

d_model: encoder blocks feature dimension

num_heads: number of heads

num_blocks: number of self attn blocks

att_type: string in [‘scaled-dot-prod-att-v1’, ‘local-scaled-dot-prod-att-v1’]

att_context: maximum context range for local attention

conv_repeats: number of conv blocks in each conformer block

conv_kernel_sizes: kernel size for conv blocks

conv_strides: stride for depth-wise conv in the first conv block of each conformer block

ff_type: string in [‘linear’, ‘conv1dx2’, ‘conv1d-linear’]

d_ff: dimension of middle layer in feed_forward block

ff_kernel_size: kernel size for convolutional versions of ff block

dropout_rate: dropout rate for ff and conv blocks

pos_dropout_rate: dropout rate for positional encoder

att_dropout_rate: dropout rate for attention block

in_layer_type: input layer block type in [‘linear’,’conv2d-sub’, ‘embed’, None]

pos_enc_type: type of positional encoder [‘no’, ‘abs’, ‘rel’]

causal_pos_enc: if True, use causal positional encodings (when rel_pos_enc=True), it assumes that query q_i only attents to key k_j when j<=i

no_pos_enc: if True, it doesn’t use positional encoder.

hid_act: hidden activations in ff and input blocks

conv_norm_layer: norm layer constructor or str for conv block, if None it uses BatchNorm1d

se_r: Squeeze-Excitation compression ratio, if None it doesn’t use Squeeze-Excitation

ff_macaron: if True, it uses macaron-net style ff layers, otherwise transformer style.

red_lnorms: it True, use redundant LNorm layers at the output of the conformer blocks as in the paper

concat_after

if True, if concats attention input and output and apply linear transform, i.e.,: y = x + linear(concat(x, att(x)))

if False, y = x + att(x)

padding_idx: padding idx for embed layer

in_time_dim: time dimension in the input Tensor

out_time_dim: dimension that we want to be time in the output tensor

rel_pos_enc: if True, use relative postional encodings, absolute encodings otherwise. (deprecated)

red_lnorm: (deprecated)

__init__(in_feats, d_model=256, num_heads=4, num_blocks=6, att_type='scaled-dot-prod-v1', att_context=25, conv_repeats=1, conv_kernel_sizes=31, conv_strides=1, ff_type='linear', d_ff=2048, ff_kernel_size=1, dropout_rate=0.1, pos_dropout_rate=0.1, att_dropout_rate=0.0, in_layer_type='conv2d-sub', pos_enc_type='rel', causal_pos_enc=False, hid_act='swish', conv_norm_layer=None, se_r=None, ff_macaron=True, red_lnorms=False, concat_after=False, padding_idx=- 1, in_time_dim=- 1, out_time_dim=1, rel_pos_enc=True, red_lnorm=False)[source]

forward(x, mask=None, target_shape=None)[source]

Forward pass function

Parameters

x – input tensor with size=(batch, time, num_feats)
mask – mask to indicate valid time steps for x (batch, time)

Returns

Tensor with output features Tensor with mask

get_config()[source]: Gets network config :returns: dictionary with config params

in_context()[source]

in_shape()[source]

Input shape for network

Returns: Tuple describing input shape

out_shape(in_shape=None)[source]

Infers the network output shape given the input shape

Parameters: in_shape – input shape tuple
Returns: Tuple with the output shape

static filter_args(**kwargs)[source]

Filters arguments correspondin to TransformerXVector: from args dictionary

Parameters: kwargs – args dictionary
Returns: args dictionary

static add_class_args(parser, prefix=None, in_feats=False)[source]

Adds Conformer config parameters to argparser

Parameters

parser – argparse object
prefix – prefix string to add to the argument names

static add_argparse_args(parser, prefix=None, in_feats=False)

Adds Conformer config parameters to argparser

Parameters

parser – argparse object
prefix – prefix string to add to the argument names

copy()

property device

freeze()

get_loss()

get_reg_loss()

in_dim()

classmethod load(file_path=None, cfg=None, state_dict=None)

out_dim()

save(file_path)

unfreeze()

Models

These include complex models created by connecting several network architectures.

x-Vectors

There are several variants of x-vector embeddings. They all derive from the same base class.

class hyperion.torch.models.xvectors.xvector.XVector(*args: Any, **kwargs: Any)[source]

x-Vector base class

__init__(encoder_net, num_classes, pool_net='mean+stddev', embed_dim=256, num_embed_layers=1, hid_act={'inplace': True, 'name': 'relu'}, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=0, num_subcenters=2, norm_layer=None, head_norm_layer=None, use_norm=True, norm_before=True, dropout_rate=0, embed_layer=0, in_feats=None, proj_feats=None)[source]

property pool_feats

property num_classes

property embed_dim

property num_embed_layers

property s

property margin

property margin_warmup_epochs

property num_subcenters

property loss_type

_make_pool_net(pool_net, enc_feats=None)[source]

Makes the pooling block

Parameters

pool_net – str or dict to pass to the pooling factory create function
enc_feats – dimension of the features coming from the encoder

Returns

GlobalPool1d object

update_loss_margin(epoch)[source]

Updates the value of the margin in AAM/AM-softmax losses: given the epoch number

Parameters: epoch – epoch which is about to start

forward(x, y=None, enc_layers=None, classif_layers=None, return_output=True)[source]

forward_output(x, y=None)[source]

Forward function

Parameters

x – input features tensor with shape=(batch, in_feats, time)
y – target classes torch.long tensor with shape=(batch,)

Returns

class posteriors tensor with shape=(batch, num_classes)

forward_hid_feats(x, y=None, enc_layers=None, classif_layers=None, return_output=False)[source]: forwards hidden representations in the x-vector network

extract_embed(x, chunk_length=0, embed_layer=None, detach_chunks=False)[source]

extract_embed_slidwin(x, win_length, win_shift, snip_edges=False, feat_frame_length=None, feat_frame_shift=None, chunk_length=0, embed_layer=None, detach_chunks=False)[source]

compute_slidwin_timestamps(num_windows, win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)[source]

compute_slidwin_left_padding(win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)[source]

get_config()[source]

classmethod load(file_path=None, cfg=None, state_dict=None)[source]

rebuild_output_layer(num_classes=None, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=10)[source]

freeze_preembed_layers()[source]

train_mode(mode='ft-embed-affine')[source]

static filter_args(**kwargs)[source]

static add_class_args(parser, prefix=None, skip={})[source]

static filter_finetune_args(**kwargs)[source]

static add_finetune_args(parser, prefix=None)[source]

static add_argparse_args(parser, prefix=None, skip={})

copy()

property device

freeze()

get_loss()

get_reg_loss()

save(file_path)

unfreeze()

static add_argparse_finetune_args(parser, prefix=None)

TDNN x-Vector

x-Vectors with TDNN, E-TDNN, Residual E-TDNN Encoders.

class hyperion.torch.models.xvectors.tdnn_xvector.TDNNXVector(*args: Any, **kwargs: Any)[source]

__init__(tdnn_type, num_enc_blocks, in_feats, num_classes, enc_hid_units, enc_expand_units=None, kernel_size=3, dilation=1, dilation_factor=1, pool_net='mean+stddev', embed_dim=256, num_embed_layers=1, hid_act={'inplace': True, 'name': 'relu6'}, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=0, num_subcenters=2, dropout_rate=0, norm_layer=None, head_norm_layer=None, use_norm=True, norm_before=False, in_norm=False, embed_layer=0, proj_feats=None)[source]

property num_enc_blocks

property enc_hid_units

property enc_expand_units

property kernel_size

property dilation

property dilation_factor

property in_norm

get_config()[source]

classmethod load(file_path=None, cfg=None, state_dict=None)[source]

filter_args()[source]

static add_class_args(parser, prefix=None)[source]

static add_argparse_args(parser, prefix=None)

_make_pool_net(pool_net, enc_feats=None)

Makes the pooling block

Parameters

pool_net – str or dict to pass to the pooling factory create function
enc_feats – dimension of the features coming from the encoder

Returns

GlobalPool1d object

static add_argparse_finetune_args(parser, prefix=None)

static add_finetune_args(parser, prefix=None)

compute_slidwin_left_padding(win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)

compute_slidwin_timestamps(num_windows, win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)

copy()

property device

property embed_dim

extract_embed(x, chunk_length=0, embed_layer=None, detach_chunks=False)

extract_embed_slidwin(x, win_length, win_shift, snip_edges=False, feat_frame_length=None, feat_frame_shift=None, chunk_length=0, embed_layer=None, detach_chunks=False)

static filter_finetune_args(**kwargs)

forward(x, y=None, enc_layers=None, classif_layers=None, return_output=True)

forward_hid_feats(x, y=None, enc_layers=None, classif_layers=None, return_output=False): forwards hidden representations in the x-vector network

forward_output(x, y=None)

Forward function

Parameters

x – input features tensor with shape=(batch, in_feats, time)
y – target classes torch.long tensor with shape=(batch,)

Returns

class posteriors tensor with shape=(batch, num_classes)

freeze()

freeze_preembed_layers()

get_loss()

get_reg_loss()

property loss_type

property margin

property margin_warmup_epochs

property num_classes

property num_embed_layers

property num_subcenters

property pool_feats

rebuild_output_layer(num_classes=None, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=10)

property s

save(file_path)

train_mode(mode='ft-embed-affine')

unfreeze()

update_loss_margin(epoch)

Updates the value of the margin in AAM/AM-softmax losses: given the epoch number

Parameters: epoch – epoch which is about to start

ResNet x-Vector

x-Vectors with Cannonical ResNet, Res2Net Encoders.

class hyperion.torch.models.xvectors.resnet_xvector.ResNetXVector(*args: Any, **kwargs: Any)[source]

__init__(resnet_type, in_feats, num_classes, in_channels, conv_channels=64, base_channels=64, in_kernel_size=7, in_stride=1, zero_init_residual=False, groups=1, replace_stride_with_dilation=None, do_maxpool=False, pool_net='mean+stddev', embed_dim=256, num_embed_layers=1, hid_act={'inplace': True, 'name': 'relu'}, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=0, num_subcenters=2, dropout_rate=0, norm_layer=None, head_norm_layer=None, use_norm=True, norm_before=True, in_norm=False, embed_layer=0, proj_feats=None, se_r=16, res2net_scale=4, res2net_width_factor=1)[source]

property in_channels

property conv_channels

property base_channels

property in_kernel_size

property in_stride

property zero_init_residual

property groups

property replace_stride_with_dilation

property do_maxpool

property in_norm

property se_r

property res2net_scale

property res2net_width_factor

get_config()[source]

classmethod load(file_path=None, cfg=None, state_dict=None)[source]

filter_args()[source]

static add_class_args(parser, prefix=None)[source]

static add_argparse_args(parser, prefix=None)

_make_pool_net(pool_net, enc_feats=None)

Makes the pooling block

Parameters

pool_net – str or dict to pass to the pooling factory create function
enc_feats – dimension of the features coming from the encoder

Returns

GlobalPool1d object

static add_argparse_finetune_args(parser, prefix=None)

static add_finetune_args(parser, prefix=None)

compute_slidwin_left_padding(win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)

compute_slidwin_timestamps(num_windows, win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)

copy()

property device

property embed_dim

extract_embed(x, chunk_length=0, embed_layer=None, detach_chunks=False)

extract_embed_slidwin(x, win_length, win_shift, snip_edges=False, feat_frame_length=None, feat_frame_shift=None, chunk_length=0, embed_layer=None, detach_chunks=False)

static filter_finetune_args(**kwargs)

forward(x, y=None, enc_layers=None, classif_layers=None, return_output=True)

forward_hid_feats(x, y=None, enc_layers=None, classif_layers=None, return_output=False): forwards hidden representations in the x-vector network

forward_output(x, y=None)

Forward function

Parameters

x – input features tensor with shape=(batch, in_feats, time)
y – target classes torch.long tensor with shape=(batch,)

Returns

class posteriors tensor with shape=(batch, num_classes)

freeze()

freeze_preembed_layers()

get_loss()

get_reg_loss()

property loss_type

property margin

property margin_warmup_epochs

property num_classes

property num_embed_layers

property num_subcenters

property pool_feats

rebuild_output_layer(num_classes=None, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=10)

property s

save(file_path)

train_mode(mode='ft-embed-affine')

unfreeze()

update_loss_margin(epoch)

Updates the value of the margin in AAM/AM-softmax losses: given the epoch number

Parameters: epoch – epoch which is about to start

SpineNet x-Vector

x-Vectors with SpineNet, Spine2Net Encoders.

class hyperion.torch.models.xvectors.spinenet_xvector.SpineNetXVector(*args: Any, **kwargs: Any)[source]

__init__(spinenet_type, in_feats, num_classes, in_channels, output_levels=[3, 4, 5, 6, 7], endpoints_num_filters=256, resample_alpha=0.5, block_repeats=1, filter_size_scale=1.0, conv_channels=64, base_channels=64, in_kernel_size=7, in_stride=1, zero_init_residual=False, groups=1, do_maxpool=False, pool_net='mean+stddev', embed_dim=256, num_embed_layers=1, hid_act={'inplace': True, 'name': 'relu'}, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=0, num_subcenters=2, dropout_rate=0, norm_layer=None, head_norm_layer=None, use_norm=True, norm_before=True, in_norm=False, embed_layer=0, proj_feats=None, se_r=16, res2net_scale=4, res2net_width_factor=1)[source]

property in_channels

property output_levels

property endpoints_num_filters

property resample_alpha

property block_repeats

property filter_size_scale

property conv_channels

property base_channels

property in_kernel_size

property in_stride

property zero_init_residual

property groups

property do_maxpool

property in_norm

property se_r

property res2net_scale

property res2net_width_factor

get_config()[source]

classmethod load(file_path=None, cfg=None, state_dict=None)[source]

filter_args()[source]

static add_class_args(parser, prefix=None)[source]

static add_argparse_args(parser, prefix=None)

_make_pool_net(pool_net, enc_feats=None)

Makes the pooling block

Parameters

pool_net – str or dict to pass to the pooling factory create function
enc_feats – dimension of the features coming from the encoder

Returns

GlobalPool1d object

static add_argparse_finetune_args(parser, prefix=None)

static add_finetune_args(parser, prefix=None)

compute_slidwin_left_padding(win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)

compute_slidwin_timestamps(num_windows, win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)

copy()

property device

property embed_dim

extract_embed(x, chunk_length=0, embed_layer=None, detach_chunks=False)

extract_embed_slidwin(x, win_length, win_shift, snip_edges=False, feat_frame_length=None, feat_frame_shift=None, chunk_length=0, embed_layer=None, detach_chunks=False)

static filter_finetune_args(**kwargs)

forward(x, y=None, enc_layers=None, classif_layers=None, return_output=True)

forward_hid_feats(x, y=None, enc_layers=None, classif_layers=None, return_output=False): forwards hidden representations in the x-vector network

forward_output(x, y=None)

Forward function

Parameters

x – input features tensor with shape=(batch, in_feats, time)
y – target classes torch.long tensor with shape=(batch,)

Returns

class posteriors tensor with shape=(batch, num_classes)

freeze()

freeze_preembed_layers()

get_loss()

get_reg_loss()

property loss_type

property margin

property margin_warmup_epochs

property num_classes

property num_embed_layers

property num_subcenters

property pool_feats

rebuild_output_layer(num_classes=None, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=10)

property s

save(file_path)

train_mode(mode='ft-embed-affine')

unfreeze()

update_loss_margin(epoch)

Updates the value of the margin in AAM/AM-softmax losses: given the epoch number

Parameters: epoch – epoch which is about to start

ResNet 1d x-Vector

x-Vectors with ResNet, Res2Net 1d Encoders. It can be cofigured as ECAPA-TDNN

class hyperion.torch.models.xvectors.resnet1d_xvector.ResNet1dXVector(*args: Any, **kwargs: Any)[source]

__init__(resnet_enc, num_classes, pool_net='mean+stddev', embed_dim=256, num_embed_layers=1, hid_act={'inplace': True, 'name': 'relu'}, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=0, num_subcenters=2, dropout_rate=0, norm_layer=None, head_norm_layer=None, use_norm=True, norm_before=True, in_norm=False, embed_layer=0, proj_feats=None)[source]

get_config()[source]

classmethod load(file_path=None, cfg=None, state_dict=None)[source]

filter_args()[source]

static add_class_args(parser, prefix=None)[source]

static add_argparse_args(parser, prefix=None)

_make_pool_net(pool_net, enc_feats=None)

Makes the pooling block

Parameters

pool_net – str or dict to pass to the pooling factory create function
enc_feats – dimension of the features coming from the encoder

Returns

GlobalPool1d object

static add_argparse_finetune_args(parser, prefix=None)

static add_finetune_args(parser, prefix=None)

compute_slidwin_left_padding(win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)

compute_slidwin_timestamps(num_windows, win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)

copy()

property device

property embed_dim

extract_embed(x, chunk_length=0, embed_layer=None, detach_chunks=False)

extract_embed_slidwin(x, win_length, win_shift, snip_edges=False, feat_frame_length=None, feat_frame_shift=None, chunk_length=0, embed_layer=None, detach_chunks=False)

static filter_finetune_args(**kwargs)

forward(x, y=None, enc_layers=None, classif_layers=None, return_output=True)

forward_hid_feats(x, y=None, enc_layers=None, classif_layers=None, return_output=False): forwards hidden representations in the x-vector network

forward_output(x, y=None)

Forward function

Parameters

x – input features tensor with shape=(batch, in_feats, time)
y – target classes torch.long tensor with shape=(batch,)

Returns

class posteriors tensor with shape=(batch, num_classes)

freeze()

freeze_preembed_layers()

get_loss()

get_reg_loss()

property loss_type

property margin

property margin_warmup_epochs

property num_classes

property num_embed_layers

property num_subcenters

property pool_feats

rebuild_output_layer(num_classes=None, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=10)

property s

save(file_path)

train_mode(mode='ft-embed-affine')

unfreeze()

update_loss_margin(epoch)

Updates the value of the margin in AAM/AM-softmax losses: given the epoch number

Parameters: epoch – epoch which is about to start

Transfomer x-Vector

x-Vectors based on Transformer Encoder

class hyperion.torch.models.xvectors.transformer_xvector_v1.TransformerXVectorV1(*args: Any, **kwargs: Any)[source]

x-Vector with Transformer encoder.

in_feats: input features dimension

num_classes: number of training classes

enc_d_model: encoder blocks feature dimension

num_enc_heads: number of heads

num_enc_blocks: number of self attn blocks

enc_att_type: string in [‘scaled-dot-prod-att-v1’, ‘local-scaled-dot-prod-att-v1’]

enc_att_context: maximum context range for local attention

enc_ff_type: string in [‘linear’, ‘conv1dx2’, ‘conv1d-linear’]

enc_d_ff: dimension of middle layer in feed_forward block

enc_ff_kernel_size: kernel size for convolutional versions of ff block

in_layer_type: input layer block type in [‘linear’,’conv2d-sub’, ‘embed’, None]

enc_concat_after

if True, if concats attention input and output and apply linear transform, i.e.,: y = x + linear(concat(x, att(x)))

if False, y = x + att(x)

pool_net: pooling block configuration string or dictionary of params

embed_dim: x-vector dimension

num_embed_layers: number of hidden layers in classification head

hid_act: hidden activation configuration string or dictionary

loss_type: sofmax losss type string in [‘softmax’, ‘arc-softmax’, ‘cos-softmax’]

s: s parameter in arc/cos-softmax losses

margin: margin in arc/cos-sofmtax losses

margin_warmup_epochs: number of epochs until we reach the maximum value for margin

dropout_rate: dropout rate for ff block and classification head

pos_dropout_rate: dropout rate for positional encoder

att_dropout_rate: dropout rate for attention block

use_norm: if True use batch/layer norm

norm_before: if True, use layer norm before layers, otherwise after

in_norm: add batchnorm at the input

embed_layer: which layer to use to extract x-vectors

proj_feats: add linear projection layer after the encoder to project feature dimension to proj_feats

__init__(in_feats, num_classes, enc_d_model=512, num_enc_heads=4, num_enc_blocks=6, enc_att_type='scaled-dot-prod-v1', enc_att_context=25, enc_ff_type='linear', enc_d_ff=2048, enc_ff_kernel_size=1, in_layer_type='conv2d-sub', enc_concat_after=False, pool_net='mean+stddev', embed_dim=256, num_embed_layers=1, hid_act={'inplace': True, 'name': 'relu6'}, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=0, num_subcenters=2, dropout_rate=0.1, pos_dropout_rate=0.1, att_dropout_rate=0.0, norm_layer=None, head_norm_layer=None, use_norm=True, norm_before=False, in_norm=False, embed_layer=0, proj_feats=None)[source]

property enc_d_model

property num_enc_heads

property num_enc_blocks

property enc_att_type

property enc_att_context

property enc_d_ff

property enc_ff_kernel_size

property pos_dropout_rate

property att_dropout_rate

property in_layer_type

property enc_concat_after

property enc_ff_type

get_config()[source]: Gets network config :returns: dictionary with config params

classmethod load(file_path=None, cfg=None, state_dict=None)[source]: Loads model from file

static filter_args(**kwargs)[source]

Filters arguments correspondin to TransformerXVector: from args dictionary

Parameters

prefix – prefix string
kwargs – args dictionary

Returns

args dictionary

static add_class_args(parser, prefix=None)[source]

Adds TransformerXVector config parameters to argparser

Parameters

parser – argparse object
prefix – prefix string to add to the argument names

_make_pool_net(pool_net, enc_feats=None)

Makes the pooling block

Parameters

pool_net – str or dict to pass to the pooling factory create function
enc_feats – dimension of the features coming from the encoder

Returns

GlobalPool1d object

static add_argparse_args(parser, prefix=None)

Adds TransformerXVector config parameters to argparser

Parameters

parser – argparse object
prefix – prefix string to add to the argument names

static add_argparse_finetune_args(parser, prefix=None)

static add_finetune_args(parser, prefix=None)

compute_slidwin_left_padding(win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)

compute_slidwin_timestamps(num_windows, win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)

copy()

property device

property embed_dim

extract_embed(x, chunk_length=0, embed_layer=None, detach_chunks=False)

extract_embed_slidwin(x, win_length, win_shift, snip_edges=False, feat_frame_length=None, feat_frame_shift=None, chunk_length=0, embed_layer=None, detach_chunks=False)

static filter_finetune_args(**kwargs)

forward(x, y=None, enc_layers=None, classif_layers=None, return_output=True)

forward_hid_feats(x, y=None, enc_layers=None, classif_layers=None, return_output=False): forwards hidden representations in the x-vector network

forward_output(x, y=None)

Forward function

Parameters

x – input features tensor with shape=(batch, in_feats, time)
y – target classes torch.long tensor with shape=(batch,)

Returns

class posteriors tensor with shape=(batch, num_classes)

freeze()

freeze_preembed_layers()

get_loss()

get_reg_loss()

property loss_type

property margin

property margin_warmup_epochs

property num_classes

property num_embed_layers

property num_subcenters

property pool_feats

rebuild_output_layer(num_classes=None, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=10)

property s

save(file_path)

train_mode(mode='ft-embed-affine')

unfreeze()

update_loss_margin(epoch)

Updates the value of the margin in AAM/AM-softmax losses: given the epoch number

Parameters: epoch – epoch which is about to start

Auto-Encoder

class hyperion.torch.models.ae.ae.AE(*args: Any, **kwargs: Any)[source]

Basic Autoencoder class

encoder_net: NArch encoder network object

decoder_net: NArch decoder network object

z_dim: latent variable dimension (inferred from encoder_net output shape)

__init__(encoder_net, decoder_net)[source]

forward(x, x_target=None, use_amp=False)[source]

get_config()[source]

classmethod load(file_path=None, cfg=None, state_dict=None)[source]

copy()

property device

freeze()

get_loss()

get_reg_loss()

save(file_path)

unfreeze()

Variational Auto-Encoders

class hyperion.torch.models.vae.vae.VAE(*args: Any, **kwargs: Any)[source]

Variational Autoencoder class: From: https://arxiv.org/abs/1312.6114

encoder_net: NArch encoder network object

decoder_net: NArch decoder network object

z_dim: latent variable dimension

kldiv_weight: weight KL divergene when computing ELBO

qz_pdf: type of prob distribution of the approx. latent posterior

pz_pdf: type of prob distribution of the latent prior

px_pdf: type of prob distribution for the data likelihood

flatten_spatial: if True all time/spatial dimensions are generated from a single latent vector, if False, we have multiple latents depending on the data size.

spatial_shape: shape of the data, only needed if flatten_spatial=True

scale_invariant: for future use

data_scale = for future use

__init__(encoder_net, decoder_net, z_dim, kldiv_weight=1, qz_pdf='normal-glob-diag-cov', pz_pdf='std-normal', px_pdf='normal-glob-diag-cov', flatten_spatial=False, spatial_shape=None, scale_invariant=False, data_scale=None)[source]

property pz

forward(x, x_target=None, return_x_mean=False, return_x_sample=False, return_z_sample=False, return_px=False, return_qz=False, serialize_pdfs=True, use_amp=False)[source]

compute_qz(x)[source]

compute_px_given_z(z, x_shape=None)[source]

get_config()[source]

classmethod load(file_path=None, cfg=None, state_dict=None)[source]

static filter_args(**kwargs)[source]

static add_class_args(parser, prefix=None)[source]

static add_argparse_args(parser, prefix=None)

copy()

property device

freeze()

get_loss()

get_reg_loss()

save(file_path)

unfreeze()

class hyperion.torch.models.vae.vq_vae.VQVAE(*args: Any, **kwargs: Any)[source]

Vector Quantized Variational Autoencoder class: From: https://arxiv.org/abs/1711.00937

encoder_net: NArch encoder network object

decoder_net: NArch decoder network object

z_dim: latent variable dimension

kldiv_weight: weight KL divergene when computing ELBO

diversity_weight: weigth for log-perplexity of the codebook, it inteds to maximize the number of codewords used.

vq_type: type of vector quantizer

vq_gropus: number of vector quantization groups.

vq_clusters: number of codewords in each vq group

vq_commitment_cost: weigth of the commitmenet loss

vq_ema_gamma: exponential moving average decay coeff.

vq_ema_eps: Laplace smoothing parameter

px_pdf: type of prob distribution for the data likelihood

flatten_spatial: if True all time/spatial dimensions are generated from a single latent vector, if False, we have multiple latents depending on the data size.

spatial_shape: shape of the data, only needed if flatten_spatial=True

scale_invariant: for future use

data_scale = for future use

__init__(encoder_net, decoder_net, z_dim, kldiv_weight=1, diversity_weight=0.1, vq_type='multi-ema-k-means-vq', vq_groups=1, vq_clusters=64, vq_commitment_cost=0.25, vq_ema_gamma=0.99, vq_ema_eps=1e-05, px_pdf='normal-glob-diag-cov', flatten_spatial=False, spatial_shape=None, scale_invariant=False, data_scale=None)[source]

forward(x, x_target=None, return_x_mean=False, return_x_sample=False, return_z_sample=False, return_px=False, serialize_pdfs=True, use_amp=False)[source]

compute_z(x)[source]

compute_px_given_z(z, x_shape=None)[source]

get_config()[source]

classmethod load(file_path=None, cfg=None, state_dict=None)[source]

static filter_args(**kwargs)[source]

static add_class_args(parser, prefix=None)[source]

static add_argparse_args(parser, prefix=None)

copy()

property device

freeze()

get_loss()

get_reg_loss()

save(file_path)

unfreeze()

Losses

Custom loss classes

class hyperion.torch.losses.bce_with_llr.BCEWithLLR(p_tar=0.5)[source]

__init__(p_tar=0.5)[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x, y)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

T_destination: alias of TypeVar(‘T_destination’, bound=Mapping[str, torch.Tensor])

_get_backward_hooks(): Returns the backward hooks for use in the call function. It returns two lists, one with the full backward hooks and one with the non-full backward hooks.

_load_from_state_dict(state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs)

Copies parameters and buffers from state_dict into only this module, but not its descendants. This is called on every submodule in load_state_dict(). Metadata saved for this module in input state_dict is provided as local_metadata. For state dicts without metadata, local_metadata is empty. Subclasses can achieve class-specific backward compatible loading using the version number at local_metadata.get(“version”, None).

Note

state_dict is not the same object as the input state_dict to load_state_dict(). So it can be modified.

Parameters

state_dict (dict) – a dict containing parameters and persistent buffers.
prefix (str) – the prefix for parameters and buffers used in this module
local_metadata (dict) – a dict containing the metadata for this module. See
strict (bool) – whether to strictly enforce that the keys in state_dict with prefix match the names of parameters and buffers in this module
missing_keys (list of str) – if strict=True, add missing keys to this list
unexpected_keys (list of str) – if strict=True, add unexpected keys to this list
error_msgs (list of str) – error messages should be added to this list, and will be reported together in load_state_dict()

_named_members(get_members_fn, prefix='', recurse=True): Helper method for yielding various names + members of modules.

_register_load_state_dict_pre_hook(hook): These hooks will be called with arguments: state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs, before loading state_dict into self. These arguments are exactly the same as those of _load_from_state_dict.

_register_state_dict_hook(hook): These hooks will be called with arguments: self, state_dict, prefix, local_metadata, after the state_dict of self is set. Note that only parameters and buffers of self or its children are guaranteed to exist in state_dict. The hooks may modify state_dict inplace or return a new one.

_save_to_state_dict(destination, prefix, keep_vars)

Saves module state to destination dictionary, containing a state of the module, but not its descendants. This is called on every submodule in state_dict().

In rare cases, subclasses can achieve class-specific behavior by overriding this method with custom logic.

Parameters

destination (dict) – a dict where state will be stored
prefix (str) – the prefix for parameters and buffers used in this module

add_module(name: str, module: Optional[torch.nn.modules.module.Module]) → None

Adds a child module to the current module.

The module can be accessed as an attribute using the given name.

Parameters

name (string) – name of the child module. The child module can be accessed from this module using the given name
module (Module) – child module to be added to the module.

apply(fn: Callable[[torch.nn.modules.module.Module], None]) → torch.nn.modules.module.T

Applies fn recursively to every submodule (as returned by .children()) as well as self. Typical use includes initializing the parameters of a model (see also nn-init-doc).

Parameters: fn (Module -> None) – function to be applied to each submodule
Returns: self
Return type: Module

Example:

>>> @torch.no_grad()
>>> def init_weights(m):
>>>     print(m)
>>>     if type(m) == nn.Linear:
>>>         m.weight.fill_(1.0)
>>>         print(m.weight)
>>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
>>> net.apply(init_weights)
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[ 1.,  1.],
        [ 1.,  1.]])
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[ 1.,  1.],
        [ 1.,  1.]])
Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)

bfloat16() → torch.nn.modules.module.T

Casts all floating point parameters and buffers to bfloat16 datatype.

Returns: self
Return type: Module

buffers(recurse: bool = True) → Iterator[torch.Tensor]

Returns an iterator over module buffers.

Parameters: recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.
Yields: torch.Tensor – module buffer

Example:

>>> for buf in model.buffers():
>>>     print(type(buf), buf.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)

children() → Iterator[torch.nn.modules.module.Module]

Returns an iterator over immediate children modules.

Yields: Module – a child module

cpu() → torch.nn.modules.module.T

Moves all model parameters and buffers to the CPU.

Returns: self
Return type: Module

cuda(device: Optional[Union[int, torch.device]] = None) → torch.nn.modules.module.T

Moves all model parameters and buffers to the GPU.

This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.

Parameters: device (int, optional) – if specified, all parameters will be copied to that device
Returns: self
Return type: Module

double() → torch.nn.modules.module.T

Casts all floating point parameters and buffers to double datatype.

Returns: self
Return type: Module

dump_patches: bool = False

This allows better BC support for load_state_dict(). In state_dict(), the version number will be saved as in the attribute _metadata of the returned state dict, and thus pickled. _metadata is a dictionary with keys that follow the naming convention of state dict. See _load_from_state_dict on how to use this information in loading.

If new parameters/buffers are added/removed from a module, this number shall be bumped, and the module’s _load_from_state_dict method can compare the version number and do appropriate changes if the state dict is from before the change.

eval() → torch.nn.modules.module.T

Sets the module in evaluation mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

This is equivalent with self.train(False).

Returns: self
Return type: Module

extra_repr() → str

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

float() → torch.nn.modules.module.T

Casts all floating point parameters and buffers to float datatype.

Returns: self
Return type: Module

half() → torch.nn.modules.module.T

Casts all floating point parameters and buffers to half datatype.

Returns: self
Return type: Module

load_state_dict(state_dict: OrderedDict[str, Tensor], strict: bool = True)

Copies parameters and buffers from state_dict into this module and its descendants. If strict is True, then the keys of state_dict must exactly match the keys returned by this module’s state_dict() function.

Parameters

state_dict (dict) – a dict containing parameters and persistent buffers.
strict (bool, optional) – whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict() function. Default: True

Returns

missing_keys is a list of str containing the missing keys
unexpected_keys is a list of str containing the unexpected keys

Return type

NamedTuple with missing_keys and unexpected_keys fields

modules() → Iterator[torch.nn.modules.module.Module]

Returns an iterator over all modules in the network.

Yields: Module – a module in the network

Note

Duplicate modules are returned only once. In the following example, l will be returned only once.

Example:

>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.modules()):
        print(idx, '->', m)

0 -> Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
1 -> Linear(in_features=2, out_features=2, bias=True)

named_buffers(prefix: str = '', recurse: bool = True) → Iterator[Tuple[str, torch.Tensor]]

Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

Parameters

prefix (str) – prefix to prepend to all buffer names.
recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.

Yields

(string, torch.Tensor) – Tuple containing the name and buffer

Example:

>>> for name, buf in self.named_buffers():
>>>    if name in ['running_var']:
>>>        print(buf.size())

named_children() → Iterator[Tuple[str, torch.nn.modules.module.Module]]

Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

Yields: (string, Module) – Tuple containing a name and child module

Example:

>>> for name, module in model.named_children():
>>>     if name in ['conv4', 'conv5']:
>>>         print(module)

named_modules(memo: Optional[Set[torch.nn.modules.module.Module]] = None, prefix: str = '')

Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

Yields: (string, Module) – Tuple of name and module

Note

Duplicate modules are returned only once. In the following example, l will be returned only once.

Example:

>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.named_modules()):
        print(idx, '->', m)

0 -> ('', Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
))
1 -> ('0', Linear(in_features=2, out_features=2, bias=True))

named_parameters(prefix: str = '', recurse: bool = True) → Iterator[Tuple[str, torch.nn.parameter.Parameter]]

Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

Parameters

prefix (str) – prefix to prepend to all parameter names.
recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.

Yields

(string, Parameter) – Tuple containing the name and parameter

Example:

>>> for name, param in self.named_parameters():
>>>    if name in ['bias']:
>>>        print(param.size())

parameters(recurse: bool = True) → Iterator[torch.nn.parameter.Parameter]

Returns an iterator over module parameters.

This is typically passed to an optimizer.

Parameters: recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.
Yields: Parameter – module parameter

Example:

>>> for param in model.parameters():
>>>     print(type(param), param.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)

register_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) → torch.utils.hooks.RemovableHandle

Registers a backward hook on the module.

This function is deprecated in favor of nn.Module.register_full_backward_hook() and the behavior of this function will change in future versions.

Returns: a handle that can be used to remove the added hook by calling handle.remove()
Return type: torch.utils.hooks.RemovableHandle

register_buffer(name: str, tensor: Optional[torch.Tensor], persistent: bool = True) → None

Adds a buffer to the module.

This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s running_mean is not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by setting persistent to False. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’s state_dict.

Buffers can be accessed as attributes using given names.

Parameters

name (string) – name of the buffer. The buffer can be accessed from this module using the given name
tensor (Tensor) – buffer to be registered.
persistent (bool) – whether the buffer is part of this module’s state_dict.

Example:

>>> self.register_buffer('running_mean', torch.zeros(num_features))

register_forward_hook(hook: Callable[[...], None]) → torch.utils.hooks.RemovableHandle

Registers a forward hook on the module.

The hook will be called every time after forward() has computed an output. It should have the following signature:

hook(module, input, output) -> None or modified output

The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the forward. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called after forward() is called.

Returns: a handle that can be used to remove the added hook by calling handle.remove()
Return type: torch.utils.hooks.RemovableHandle

register_forward_pre_hook(hook: Callable[[...], None]) → torch.utils.hooks.RemovableHandle

Registers a forward pre-hook on the module.

The hook will be called every time before forward() is invoked. It should have the following signature:

hook(module, input) -> None or modified input

The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the forward. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple).

Returns: a handle that can be used to remove the added hook by calling handle.remove()
Return type: torch.utils.hooks.RemovableHandle

register_full_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) → torch.utils.hooks.RemovableHandle

Registers a backward hook on the module.

The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature:

hook(module, grad_input, grad_output) -> tuple(Tensor) or None

The grad_input and grad_output are tuples that contain the gradients with respect to the inputs and outputs respectively. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place of grad_input in subsequent computations. grad_input will only correspond to the inputs given as positional arguments and all kwarg arguments are ignored. Entries in grad_input and grad_output will be None for all non-Tensor arguments.

Warning

Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error.

Returns: a handle that can be used to remove the added hook by calling handle.remove()
Return type: torch.utils.hooks.RemovableHandle

register_parameter(name: str, param: Optional[torch.nn.parameter.Parameter]) → None

Adds a parameter to the module.

The parameter can be accessed as an attribute using given name.

Parameters

name (string) – name of the parameter. The parameter can be accessed from this module using the given name
param (Parameter) – parameter to be added to the module.

requires_grad_(requires_grad: bool = True) → torch.nn.modules.module.T

Change if autograd should record operations on parameters in this module.

This method sets the parameters’ requires_grad attributes in-place.

This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training).

Parameters: requires_grad (bool) – whether autograd should record operations on parameters in this module. Default: True.
Returns: self
Return type: Module

share_memory() → torch.nn.modules.module.T

state_dict(destination=None, prefix='', keep_vars=False)

Returns a dictionary containing a whole state of the module.

Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names.

Returns: a dictionary containing a whole state of the module
Return type: dict

Example:

>>> module.state_dict().keys()
['bias', 'weight']

to(*args, **kwargs)

Moves and/or casts the parameters and buffers.

This can be called as

to(device=None, dtype=None, non_blocking=False)

to(dtype, non_blocking=False)

to(tensor, non_blocking=False)

to(memory_format=torch.channels_last)

Its signature is similar to torch.Tensor.to(), but only accepts floating point or complex dtype`s. In addition, this method will only cast the floating point or complex parameters and buffers to :attr:`dtype (if given). The integral parameters and buffers will be moved device, if that is given, but with dtypes unchanged. When non_blocking is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.

See below for examples.

Note

This method modifies the module in-place.

Parameters

device (torch.device) – the desired device of the parameters and buffers in this module
dtype (torch.dtype) – the desired floating point or complex dtype of the parameters and buffers in this module
tensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module
memory_format (torch.memory_format) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)

Returns

self

Return type

Module

Examples:

>>> linear = nn.Linear(2, 2)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]])
>>> linear.to(torch.double)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]], dtype=torch.float64)
>>> gpu1 = torch.device("cuda:1")
>>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
>>> cpu = torch.device("cpu")
>>> linear.to(cpu)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16)

>>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble)
>>> linear.weight
Parameter containing:
tensor([[ 0.3741+0.j,  0.2382+0.j],
        [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128)
>>> linear(torch.ones(3, 2, dtype=torch.cdouble))
tensor([[0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)

train(mode: bool = True) → torch.nn.modules.module.T

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters: mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.
Returns: self
Return type: Module

type(dst_type: Union[torch.dtype, str]) → torch.nn.modules.module.T

Casts all parameters and buffers to dst_type.

Parameters: dst_type (type or string) – the desired type
Returns: self
Return type: Module

xpu(device: Optional[Union[int, torch.device]] = None) → torch.nn.modules.module.T

Moves all model parameters and buffers to the XPU.

This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on XPU while being optimized.

Parameters: device (int, optional) – if specified, all parameters will be copied to that device
Returns: self
Return type: Module

zero_grad(set_to_none: bool = False) → None

Sets gradients of all model parameters to zero. See similar function under torch.optim.Optimizer for more context.

Parameters: set_to_none (bool) – instead of setting to zero, set the grads to None. See torch.optim.Optimizer.zero_grad() for details.

training: bool

Adversarial Attacks

It contains classes to generate adversarial attacks for speaker recognition.

Attack Generation Classes

All the adv. attacks derive from the same base class:

class hyperion.torch.adv_attacks.adv_attack.AdvAttack(model, loss=None, targeted=True, range_min=None, range_max=None)[source]

__init__(model, loss=None, targeted=True, range_min=None, range_max=None)[source]

to(device)[source]

property attack_info

generate(input, target)[source]

FGSM

class hyperion.torch.adv_attacks.fgsm_attack.FGSMAttack(model, eps, loss=None, targeted=False, range_min=None, range_max=None)[source]

__init__(model, eps, loss=None, targeted=False, range_min=None, range_max=None)[source]

property attack_info

generate(input, target)[source]

to(device)

class hyperion.torch.adv_attacks.snr_fgsm_attack.SNRFGSMAttack(model, snr, loss=None, targeted=False, range_min=None, range_max=None)[source]

__init__(model, snr, loss=None, targeted=False, range_min=None, range_max=None)[source]

property attack_info

generate(input, target)[source]

to(device)

class hyperion.torch.adv_attacks.rand_fgsm_attack.RandFGSMAttack(model, eps, alpha, loss=None, targeted=False, range_min=None, range_max=None)[source]

__init__(model, eps, alpha, loss=None, targeted=False, range_min=None, range_max=None)[source]

property attack_info

generate(input, target)[source]

to(device)

class hyperion.torch.adv_attacks.iter_fgsm_attack.IterFGSMAttack(model, eps, alpha, loss=None, targeted=False, range_min=None, range_max=None)[source]

__init__(model, eps, alpha, loss=None, targeted=False, range_min=None, range_max=None)[source]

property attack_info

generate(input, target)[source]

to(device)

PGD

class hyperion.torch.adv_attacks.pgd_attack.PGDAttack(model, eps, alpha, norm, max_iter=10, random_eps=False, num_random_init=0, loss=None, norm_time=False, time_dim=None, targeted=False, range_min=None, range_max=None)[source]

__init__(model, eps, alpha, norm, max_iter=10, random_eps=False, num_random_init=0, loss=None, norm_time=False, time_dim=None, targeted=False, range_min=None, range_max=None)[source]

property attack_info

static _random_sphere(shape, eps, norm, dtype, device)[source]: We use Theorem 1 in https://arxiv.org/pdf/math/0503650.pdf to sample uniformly from l_p balls in R^n

generate(input, target)[source]

to(device)

Carlini-Wagner

Carlini-Wagner attacks derive from the same base class:

class hyperion.torch.adv_attacks.carlini_wagner.CarliniWagner(model, confidence=0.0, lr=0.01, max_iter=10000, abort_early=True, initial_c=0.001, norm_time=False, time_dim=None, use_snr=False, targeted=False, range_min=None, range_max=None)[source]

__init__(model, confidence=0.0, lr=0.01, max_iter=10000, abort_early=True, initial_c=0.001, norm_time=False, time_dim=None, use_snr=False, targeted=False, range_min=None, range_max=None)[source]

property attack_info

static atanh(x, eps=1e-06)[source]

x_w(w)[source]

w_x(x)[source]

f(z, target)[source]

generate(input, target)[source]

to(device)

class hyperion.torch.adv_attacks.carlini_wagner_l2.CarliniWagnerL2(model, confidence=0.0, lr=0.01, binary_search_steps=9, max_iter=10000, abort_early=True, initial_c=0.001, norm_time=False, time_dim=None, use_snr=False, targeted=False, range_min=None, range_max=None)[source]

__init__(model, confidence=0.0, lr=0.01, binary_search_steps=9, max_iter=10000, abort_early=True, initial_c=0.001, norm_time=False, time_dim=None, use_snr=False, targeted=False, range_min=None, range_max=None)[source]

property attack_info

generate(input, target)[source]

static atanh(x, eps=1e-06)

f(z, target)

to(device)

w_x(x)

x_w(w)

class hyperion.torch.adv_attacks.carlini_wagner_linf.CarliniWagnerLInf(model, confidence=0.0, lr=0.01, max_iter=10000, abort_early=True, initial_c=0.001, reduce_c=False, c_incr_factor=2, tau_decr_factor=0.9, targeted=False, range_min=None, range_max=None)[source]

__init__(model, confidence=0.0, lr=0.01, max_iter=10000, abort_early=True, initial_c=0.001, reduce_c=False, c_incr_factor=2, tau_decr_factor=0.9, targeted=False, range_min=None, range_max=None)[source]

property attack_info

generate(input, target)[source]

static atanh(x, eps=1e-06)

f(z, target)

to(device)

w_x(x)

x_w(w)

class hyperion.torch.adv_attacks.carlini_wagner_l0.CarliniWagnerL0(model, confidence=0.0, lr=0.01, max_iter=10000, abort_early=True, initial_c=0.001, reduce_c=False, c_incr_factor=2, indep_channels=False, targeted=False, range_min=None, range_max=None)[source]

__init__(model, confidence=0.0, lr=0.01, max_iter=10000, abort_early=True, initial_c=0.001, reduce_c=False, c_incr_factor=2, indep_channels=False, targeted=False, range_min=None, range_max=None)[source]

property attack_info

generate(input, target)[source]

static atanh(x, eps=1e-06)

f(z, target)

to(device)

w_x(x)

x_w(w)

Attack Generator Factories

These are factory classes that create attack generator objects. They create attacks from Hyperion or from the Adversarial Robustness Toolbox <https://github.com/Trusted-AI/adversarial-robustness-toolbox>

class hyperion.torch.adv_attacks.attack_factory.AttackFactory[source]

static create(model, attack_type, eps=0, snr=100, alpha=0, norm=inf, random_eps=False, num_random_init=0, confidence=0.0, lr=0.01, binary_search_steps=9, max_iter=10, abort_early=True, c=0.001, reduce_c=False, c_incr_factor=2, tau_decr_factor=0.9, indep_channels=False, norm_time=False, time_dim=None, use_snr=False, loss=None, targeted=False, range_min=None, range_max=None, eps_scale=1)[source]

static filter_args(**kwargs)[source]

static add_class_args(parser, prefix=None)[source]

static add_argparse_args(parser, prefix=None)

class hyperion.torch.adv_attacks.random_attack_factory.RandomAttackFactory(attack_types, min_eps=1e-05, max_eps=0.1, min_snr=30, max_snr=60, min_alpha=1e-05, max_alpha=0.02, norms=[inf], random_eps=False, min_num_random_init=0, max_num_random_init=3, min_confidence=0, max_confidence=1, min_lr=0.001, max_lr=0.01, min_binary_search_steps=9, max_binary_search_steps=9, min_iter=5, max_iter=10, abort_early=True, min_c=0.001, max_c=0.01, reduce_c=False, c_incr_factor=2, tau_decr_factor=0.9, indep_channels=False, norm_time=False, time_dim=None, use_snr=False, loss=None, targeted=False, range_min=None, range_max=None, eps_scale=1)[source]

__init__(attack_types, min_eps=1e-05, max_eps=0.1, min_snr=30, max_snr=60, min_alpha=1e-05, max_alpha=0.02, norms=[inf], random_eps=False, min_num_random_init=0, max_num_random_init=3, min_confidence=0, max_confidence=1, min_lr=0.001, max_lr=0.01, min_binary_search_steps=9, max_binary_search_steps=9, min_iter=5, max_iter=10, abort_early=True, min_c=0.001, max_c=0.01, reduce_c=False, c_incr_factor=2, tau_decr_factor=0.9, indep_channels=False, norm_time=False, time_dim=None, use_snr=False, loss=None, targeted=False, range_min=None, range_max=None, eps_scale=1)[source]

sample_attack(model=None)[source]

static filter_args(**kwargs)[source]

static add_class_args(parser, prefix=None)[source]

static add_argparse_args(parser, prefix=None)

class hyperion.torch.adv_attacks.art_attack_factory.ARTAttackFactory[source]

static create(model, attack_type, eps=0, delta=0.01, step_adapt=0.667, num_trial=25, sample_size=20, init_size=100, norm=inf, eps_step=0.1, num_random_init=0, minimal=False, random_eps=False, min_eps=None, beta=0.001, theta=0.1, gamma=1.0, etha=0.01, confidence=0.0, lr=0.01, lr_decay=0.5, lr_num_decay=20, momentum=0.8, binary_search_steps=9, max_iter=10, overshoot=1.1, num_grads=10, c=0.001, max_halving=5, max_doubling=5, decision_rule='EN', init_eval=100, max_eval=10000, num_parallel=128, variable_h=0.0001, use_importance=False, abort_early=True, th=None, sigma=0.5, lambda_tv=0.3, labmda_c=1.0, lambda_s=0.5, reg=3000, kernel_size=5, eps_factor=1.1, eps_iter=10, conj_sinkhorn_iter=400, proj_sinkhorn_iter=400, targeted=False, num_samples=1, eps_scale=1, batch_size=1)[source]

static filter_args(**kwargs)[source]

static add_class_args(parser, prefix=None)[source]

static add_argparse_args(parser, prefix=None)

Trainers

Generic Trainer

class hyperion.torch.trainers.torch_trainer.TorchTrainer(model, loss, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

Base Trainer class to train basic neural network models

model: model object.

loss: nn.Module loss class

optim: pytorch optimizer object or optimizer options dict

epochs: max. number of epochs

exp_path: experiment output path

cur_epoch: current epoch

grad_acc_steps: gradient accumulation steps to simulate larger batch size.

device: cpu/gpu device

metrics: extra metrics to compute besides cxe.

lrsched: learning rate scheduler object

loggers: LoggerList object, loggers write training progress to std. output and file.

ddp: if True use distributed data parallel training

ddp_type: type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)

train_mode: training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]

use_amp: uses mixed precision training.

log_interval: number of optim. steps between log outputs

use_tensorboard: use tensorboard logger

use_wandb: use wandb logger

wandb: wandb dictionary of options

grad_clip: norm to clip gradients, if 0 there is no clipping

grad_clip_norm: norm type to clip gradients

swa_start: epoch to start doing swa

swa_lr: SWA learning rate

swa_anneal_epochs: SWA learning rate anneal epochs

cpu_offload: CPU offload of gradients when using fully sharded ddp

__init__(model, loss, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

fit(train_data, val_data=None)[source]

Training function, it performs the training and validation epochs

Parameters

train_data – PyTorch data loader for the training loop
val_data – PyTorch data loader for the validation loop

set_train_mode()[source]

train_epoch(data_loader)[source]

Training epoch loop

Parameters: data_loader – PyTorch data loader return input/output pairs

validation_epoch(data_loader, swa_update_bn=False)[source]

Validation epoch loop

Parameters: data_loader – PyTorch data loader return input/output pairs

bn_update_epoch(data_loader)[source]

update_model()[source]

_default_loggers(log_interval, use_tensorboard, use_wandb, wandb)[source]: Creates the default data loaders

_get_lr()[source]: Returns the current learning rate to show in the loggers

checkpoint(logs=None)[source]

Creates a checkpoint of the training, to save and posterior recovery

Parameters: logs – logs containing the current value of the metrics.

save_checkpoint(logs=None)[source]

Saves a checkpoint of the training status

Parameters: logs – logs containing the current value of the metrics.

save_swa_model(logs=None)[source]

Saves a checkpoint of the training status

Parameters: logs – logs containing the current value of the metrics.

load_checkpoint(file_path)[source]

Loads a training checkpoint from file.

Parameters: file_path – checkpoint file path

load_last_checkpoint()[source]: Loads the last training checkpoint in the experiment dir.

static filter_args(**kwargs)[source]

static add_class_args(parser, prefix=None, skip=[])[source]

static add_argparse_args(parser, prefix=None, skip=[])

x-Vector Trainers

class hyperion.torch.trainers.xvector_trainer.XVectorTrainer(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', loss=None, train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

Trainer to train x-vector style models.

model: x-Vector model object.

optim: pytorch optimizer object or options dict

epochs: max. number of epochs

exp_path: experiment output path

cur_epoch: current epoch

grad_acc_steps: gradient accumulation steps to simulate larger batch size.

device: cpu/gpu device

metrics: extra metrics to compute besides cxe.

lrsched: learning rate scheduler object or options dict

loggers: LoggerList object, loggers write training progress to std. output and file. If None, it uses default loggers.

ddp: if True use distributed data parallel training

ddp_type: type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)

loss: if None, it uses cross-entropy

train_mode: training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]

use_amp: uses mixed precision training.

log_interval: number of optim. steps between log outputs

use_tensorboard: use tensorboard logger

use_wandb: use wandb logger

wandb: wandb dictionary of options

grad_clip: norm to clip gradients, if 0 there is no clipping

grad_clip_norm: norm type to clip gradients

swa_start: epoch to start doing swa

swa_lr: SWA learning rate

swa_anneal_epochs: SWA learning rate anneal epochs

cpu_offload: CPU offload of gradients when using fully sharded ddp

__init__(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', loss=None, train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

train_epoch(data_loader)[source]

Training epoch loop

Parameters: data_loader – pytorch data loader returning features and class labels.

_default_loggers(log_interval, use_tensorboard, use_wandb, wandb): Creates the default data loaders

_get_lr(): Returns the current learning rate to show in the loggers

static add_argparse_args(parser, prefix=None, skip=[])

static add_class_args(parser, prefix=None, skip=[])

bn_update_epoch(data_loader)

checkpoint(logs=None)

Creates a checkpoint of the training, to save and posterior recovery

Parameters: logs – logs containing the current value of the metrics.

static filter_args(**kwargs)

fit(train_data, val_data=None)

Training function, it performs the training and validation epochs

Parameters

train_data – PyTorch data loader for the training loop
val_data – PyTorch data loader for the validation loop

load_checkpoint(file_path)

Loads a training checkpoint from file.

Parameters: file_path – checkpoint file path

load_last_checkpoint(): Loads the last training checkpoint in the experiment dir.

save_checkpoint(logs=None)

Saves a checkpoint of the training status

Parameters: logs – logs containing the current value of the metrics.

save_swa_model(logs=None)

Saves a checkpoint of the training status

Parameters: logs – logs containing the current value of the metrics.

set_train_mode()

update_model()

validation_epoch(data_loader, swa_update_bn=False)

Validation epoch loop

Parameters: data_loader – PyTorch data loader return input/output pairs

class hyperion.torch.trainers.xvector_trainer_from_wav.XVectorTrainerFromWav(model, feat_extractor, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', loss=None, train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

Trainer to train x-vector style models.

model: x-Vector model object.

feat_extractor: feature extractor nn.Module

optim: pytorch optimizer object or options dict

epochs: max. number of epochs

exp_path: experiment output path

cur_epoch: current epoch

grad_acc_steps: gradient accumulation steps to simulate larger batch size.

device: cpu/gpu device

metrics: extra metrics to compute besides cxe.

lrsched: learning rate scheduler object or options dict.

loggers: LoggerList object, loggers write training progress to std. output and file.

ddp: if True use distributed data parallel training

ddp_type: type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)

loss: if None, it uses cross-entropy

train_mode: training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]

use_amp: uses mixed precision training.

log_interval: number of optim. steps between log outputs

use_tensorboard: use tensorboard logger

use_wandb: use wandb logger

wandb: wandb dictionary of options

grad_clip: norm to clip gradients, if 0 there is no clipping

grad_clip_norm: norm type to clip gradients

swa_start: epoch to start doing swa

swa_lr: SWA learning rate

swa_anneal_epochs: SWA learning rate anneal epochs

cpu_offload: CPU offload of gradients when using fully sharded ddp

__init__(model, feat_extractor, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', loss=None, train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

train_epoch(data_loader)[source]

Training epoch loop

Parameters: data_loader – pytorch data loader returning features and class labels.

validation_epoch(data_loader, swa_update_bn=False)[source]

Validation epoch loop

Parameters: data_loader – PyTorch data loader return input/output pairs

_default_loggers(log_interval, use_tensorboard, use_wandb, wandb): Creates the default data loaders

_get_lr(): Returns the current learning rate to show in the loggers

static add_argparse_args(parser, prefix=None, skip=[])

static add_class_args(parser, prefix=None, skip=[])

bn_update_epoch(data_loader)

checkpoint(logs=None)

Creates a checkpoint of the training, to save and posterior recovery

Parameters: logs – logs containing the current value of the metrics.

static filter_args(**kwargs)

fit(train_data, val_data=None)

Training function, it performs the training and validation epochs

Parameters

train_data – PyTorch data loader for the training loop
val_data – PyTorch data loader for the validation loop

load_checkpoint(file_path)

Loads a training checkpoint from file.

Parameters: file_path – checkpoint file path

load_last_checkpoint(): Loads the last training checkpoint in the experiment dir.

save_checkpoint(logs=None)

Saves a checkpoint of the training status

Parameters: logs – logs containing the current value of the metrics.

save_swa_model(logs=None)

Saves a checkpoint of the training status

Parameters: logs – logs containing the current value of the metrics.

set_train_mode()

update_model()

class hyperion.torch.trainers.xvector_trainer_deep_feat_reg.XVectorTrainerDeepFeatReg(model, prior_model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, reg_layers_enc=None, reg_layers_classif=None, reg_weight_enc=0.1, reg_weight_classif=0.1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', loss=None, reg_loss=None, train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

Trainer to train x-vector style models.

model: x-Vector model object that we want to fine-tune

prior_model: x-Vector model object that we use as regularizer

optim: pytorch optimizer object or options dict

epochs: max. number of epochs

exp_path: experiment output path

cur_epoch: current epoch

grad_acc_steps: gradient accumulation steps to simulate larger batch size.

reg_layers_enc: list of encoder layer indexes that we use for regularization

reg_layers_classif: list of classification head layer indexes that we use for regularization

reg_weight_enc: weight of the regularization loss for encoder hidden activations

reg_weight_classif: weight of the regularization loss for classification head hidden activations

device: cpu/gpu device

metrics: extra metrics to compute besides cxe.

lrsched: learning rate scheduler object or options dict.

loggers: LoggerList object, loggers write training progress to std. output and file.

ddp: if True use distributed data parallel training

ddp_type: type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)

loss: if None, it uses cross-entropy

reg_loss: nn.Module loss used for regularization, if None it uses L1 loss.

train_mode: training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]

use_amp: uses mixed precision training.

log_interval: number of optim. steps between log outputs

use_tensorboard: use tensorboard logger

use_wandb: use wandb logger

wandb: wandb dictionary of options

grad_clip: norm to clip gradients, if 0 there is no clipping

grad_clip_norm: norm type to clip gradients

swa_start: epoch to start doing swa

swa_lr: SWA learning rate

swa_anneal_epochs: SWA learning rate anneal epochs

cpu_offload: CPU offload of gradients when using fully sharded ddp

__init__(model, prior_model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, reg_layers_enc=None, reg_layers_classif=None, reg_weight_enc=0.1, reg_weight_classif=0.1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', loss=None, reg_loss=None, train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

train_epoch(data_loader)[source]

Training epoch loop

Parameters: data_loader – PyTorch data loader return input/output pairs

static filter_args(**kwargs)[source]

static add_class_args(parser, prefix=None, skip=[])[source]

_default_loggers(log_interval, use_tensorboard, use_wandb, wandb): Creates the default data loaders

_get_lr(): Returns the current learning rate to show in the loggers

static add_argparse_args(parser, prefix=None, skip=[])

bn_update_epoch(data_loader)

checkpoint(logs=None)

Creates a checkpoint of the training, to save and posterior recovery

Parameters: logs – logs containing the current value of the metrics.

fit(train_data, val_data=None)

Training function, it performs the training and validation epochs

Parameters

train_data – PyTorch data loader for the training loop
val_data – PyTorch data loader for the validation loop

load_checkpoint(file_path)

Loads a training checkpoint from file.

Parameters: file_path – checkpoint file path

load_last_checkpoint(): Loads the last training checkpoint in the experiment dir.

save_checkpoint(logs=None)

Saves a checkpoint of the training status

Parameters: logs – logs containing the current value of the metrics.

save_swa_model(logs=None)

Saves a checkpoint of the training status

Parameters: logs – logs containing the current value of the metrics.

set_train_mode()

update_model()

validation_epoch(data_loader, swa_update_bn=False)

Validation epoch loop

Parameters: data_loader – PyTorch data loader return input/output pairs

class hyperion.torch.trainers.xvector_trainer_deep_feat_reg_from_wav.XVectorTrainerDeepFeatRegFromWav(model, feat_extractor, prior_model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, reg_layers_enc=None, reg_layers_classif=None, reg_weight_enc=0.1, reg_weight_classif=0.1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', loss=None, reg_loss=None, train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

Trainer to train x-vector style models.

model: x-Vector model object that we want to fine-tune

feat_extractor: feature extractor nn.Module

prior_model: x-Vector model object that we use as regularizer

optim: pytorch optimizer object or options dict

epochs: max. number of epochs

exp_path: experiment output path

cur_epoch: current epoch

grad_acc_steps: gradient accumulation steps to simulate larger batch size.

reg_layers_enc: list of encoder layer indexes that we use for regularization

reg_layers_classif: list of classification head layer indexes that we use for regularization

reg_weight_enc: weight of the regularization loss for encoder hidden activations

reg_weight_classif: weight of the regularization loss for classification head hidden activations

device: cpu/gpu device

lrsched: learning rate scheduler object or options dict.

loggers: LoggerList object, loggers write training progress to std. output and file.

ddp: if True use distributed data parallel training

ddp_type: type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)

loss: if None, it uses cross-entropy

reg_loss: nn.Module loss used for regularization, if None it uses L1 loss.

train_mode: training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]

use_amp: uses mixed precision training.

log_interval: number of optim. steps between log outputs

use_tensorboard: use tensorboard logger

use_wandb: use wandb logger

wandb: wandb dictionary of options

grad_clip: norm to clip gradients, if 0 there is no clipping

grad_clip_norm: norm type to clip gradients

swa_start: epoch to start doing swa

swa_lr: SWA learning rate

swa_anneal_epochs: SWA learning rate anneal epochs

cpu_offload: CPU offload of gradients when using fully sharded ddp

__init__(model, feat_extractor, prior_model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, reg_layers_enc=None, reg_layers_classif=None, reg_weight_enc=0.1, reg_weight_classif=0.1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', loss=None, reg_loss=None, train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

train_epoch(data_loader)[source]

Training epoch loop

Parameters: data_loader – PyTorch data loader return input/output pairs

validation_epoch(data_loader, swa_update_bn=False)[source]

Validation epoch loop

Parameters: data_loader – PyTorch data loader return input/output pairs

_default_loggers(log_interval, use_tensorboard, use_wandb, wandb): Creates the default data loaders

_get_lr(): Returns the current learning rate to show in the loggers

static add_argparse_args(parser, prefix=None, skip=[])

static add_class_args(parser, prefix=None, skip=[])

bn_update_epoch(data_loader)

checkpoint(logs=None)

Creates a checkpoint of the training, to save and posterior recovery

Parameters: logs – logs containing the current value of the metrics.

static filter_args(**kwargs)

fit(train_data, val_data=None)

Training function, it performs the training and validation epochs

Parameters

train_data – PyTorch data loader for the training loop
val_data – PyTorch data loader for the validation loop

load_checkpoint(file_path)

Loads a training checkpoint from file.

Parameters: file_path – checkpoint file path

load_last_checkpoint(): Loads the last training checkpoint in the experiment dir.

save_checkpoint(logs=None)

Saves a checkpoint of the training status

Parameters: logs – logs containing the current value of the metrics.

save_swa_model(logs=None)

Saves a checkpoint of the training status

Parameters: logs – logs containing the current value of the metrics.

set_train_mode()

update_model()

Auto-encoder Trainer

class hyperion.torch.trainers.ae_trainer.AETrainer(model, loss, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

Auto-encoder trainer class

model: model object.

loss: nn.Module loss class

optim: pytorch optimizer object or optimizer options dict

epochs: max. number of epochs

exp_path: experiment output path

cur_epoch: current epoch

grad_acc_steps: gradient accumulation steps to simulate larger batch size.

device: cpu/gpu device

metrics: extra metrics to compute besides cxe.

lrsched: learning rate scheduler object

loggers: LoggerList object, loggers write training progress to std. output and file.

ddp: if True use distributed data parallel training

ddp_type: type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)

train_mode: training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]

use_amp: uses mixed precision training.

log_interval: number of optim. steps between log outputs

use_tensorboard: use tensorboard logger

use_wandb: use wandb logger

wandb: wandb dictionary of options

grad_clip: norm to clip gradients, if 0 there is no clipping

grad_clip_norm: norm type to clip gradients

swa_start: epoch to start doing swa

swa_lr: SWA learning rate

swa_anneal_epochs: SWA learning rate anneal epochs

cpu_offload: CPU offload of gradients when using fully sharded ddp

__init__(model, loss, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

train_epoch(data_loader)[source]

Training epoch loop

Parameters: data_loader – pytorch data loader returning features and class labels.

validation_epoch(data_loader, swa_update_bn=False)[source]

Validation epoch loop

Parameters: data_loader – PyTorch data loader return input/output pairs

_default_loggers(log_interval, use_tensorboard, use_wandb, wandb): Creates the default data loaders

_get_lr(): Returns the current learning rate to show in the loggers

static add_argparse_args(parser, prefix=None, skip=[])

static add_class_args(parser, prefix=None, skip=[])

bn_update_epoch(data_loader)

checkpoint(logs=None)

Creates a checkpoint of the training, to save and posterior recovery

Parameters: logs – logs containing the current value of the metrics.

static filter_args(**kwargs)

fit(train_data, val_data=None)

Training function, it performs the training and validation epochs

Parameters

train_data – PyTorch data loader for the training loop
val_data – PyTorch data loader for the validation loop

load_checkpoint(file_path)

Loads a training checkpoint from file.

Parameters: file_path – checkpoint file path

load_last_checkpoint(): Loads the last training checkpoint in the experiment dir.

save_checkpoint(logs=None)

Saves a checkpoint of the training status

Parameters: logs – logs containing the current value of the metrics.

save_swa_model(logs=None)

Saves a checkpoint of the training status

Parameters: logs – logs containing the current value of the metrics.

set_train_mode()

update_model()

VAE Trainers

class hyperion.torch.trainers.vae_trainer.VAETrainer(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

Variational Auto-encoder trainer class

model: model object.

optim: pytorch optimizer object or optimizer options dict

epochs: max. number of epochs

exp_path: experiment output path

cur_epoch: current epoch

grad_acc_steps: gradient accumulation steps to simulate larger batch size.

device: cpu/gpu device

metrics: extra metrics to compute besides cxe.

lrsched: learning rate scheduler object

loggers: LoggerList object, loggers write training progress to std. output and file.

ddp: if True use distributed data parallel training

ddp_type: type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)

train_mode: training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]

use_amp: uses mixed precision training.

log_interval: number of optim. steps between log outputs

log_interval: number of optim. steps between log outputs

use_tensorboard: use tensorboard logger

use_wandb: use wandb logger

grad_clip: norm to clip gradients, if 0 there is no clipping

grad_clip_norm: norm type to clip gradients

swa_start: epoch to start doing swa

swa_lr: SWA learning rate

swa_anneal_epochs: SWA learning rate anneal epochs

cpu_offload: CPU offload of gradients when using fully sharded ddp

__init__(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

train_epoch(data_loader)[source]

Training epoch loop

Parameters: data_loader – PyTorch data loader return input/output pairs

validation_epoch(data_loader, swa_update_bn=False)[source]

Validation epoch loop

Parameters: data_loader – PyTorch data loader return input/output pairs

_default_loggers(log_interval, use_tensorboard, use_wandb, wandb): Creates the default data loaders

_get_lr(): Returns the current learning rate to show in the loggers

static add_argparse_args(parser, prefix=None, skip=[])

static add_class_args(parser, prefix=None, skip=[])

bn_update_epoch(data_loader)

checkpoint(logs=None)

Creates a checkpoint of the training, to save and posterior recovery

Parameters: logs – logs containing the current value of the metrics.

static filter_args(**kwargs)

fit(train_data, val_data=None)

Training function, it performs the training and validation epochs

Parameters

train_data – PyTorch data loader for the training loop
val_data – PyTorch data loader for the validation loop

load_checkpoint(file_path)

Loads a training checkpoint from file.

Parameters: file_path – checkpoint file path

load_last_checkpoint(): Loads the last training checkpoint in the experiment dir.

save_checkpoint(logs=None)

Saves a checkpoint of the training status

Parameters: logs – logs containing the current value of the metrics.

save_swa_model(logs=None)

Saves a checkpoint of the training status

Parameters: logs – logs containing the current value of the metrics.

set_train_mode()

update_model()

class hyperion.torch.trainers.dvae_trainer.DVAETrainer(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

Denoising VAE trainer class

model: model object.

optim: pytorch optimizer object or optimizer options dict

epochs: max. number of epochs

exp_path: experiment output path

cur_epoch: current epoch

grad_acc_steps: gradient accumulation steps to simulate larger batch size.

device: cpu/gpu device

metrics: extra metrics to compute besides cxe.

lrsched: learning rate scheduler object

loggers: LoggerList object, loggers write training progress to std. output and file.

ddp: if True use distributed data parallel training

ddp_type: type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)

train_mode: training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]

use_amp: uses mixed precision training.

log_interval: number of optim. steps between log outputs

use_tensorboard: use tensorboard logger

use_wandb: use wandb logger

wandb: wandb dictionary of options

grad_clip: norm to clip gradients, if 0 there is no clipping

grad_clip_norm: norm type to clip gradients

swa_start: epoch to start doing swa

swa_lr: SWA learning rate

swa_anneal_epochs: SWA learning rate anneal epochs

cpu_offload: CPU offload of gradients when using fully sharded ddp

__init__(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

train_epoch(data_loader)[source]

Training epoch loop

Parameters: data_loader – pytorch data loader returning noisy and clean features

validation_epoch(data_loader, swa_update_bn=False)[source]

Validation epoch loop

Parameters: data_loader – PyTorch data loader return input/output pairs

_default_loggers(log_interval, use_tensorboard, use_wandb, wandb): Creates the default data loaders

_get_lr(): Returns the current learning rate to show in the loggers

static add_argparse_args(parser, prefix=None, skip=[])

static add_class_args(parser, prefix=None, skip=[])

bn_update_epoch(data_loader)

checkpoint(logs=None)

Creates a checkpoint of the training, to save and posterior recovery

Parameters: logs – logs containing the current value of the metrics.

static filter_args(**kwargs)

fit(train_data, val_data=None)

Training function, it performs the training and validation epochs

Parameters

train_data – PyTorch data loader for the training loop
val_data – PyTorch data loader for the validation loop

load_checkpoint(file_path)

Loads a training checkpoint from file.

Parameters: file_path – checkpoint file path

load_last_checkpoint(): Loads the last training checkpoint in the experiment dir.

save_checkpoint(logs=None)

Saves a checkpoint of the training status

Parameters: logs – logs containing the current value of the metrics.

save_swa_model(logs=None)

Saves a checkpoint of the training status

Parameters: logs – logs containing the current value of the metrics.

set_train_mode()

update_model()

VQ-VAE Trainers

class hyperion.torch.trainers.vq_vae_trainer.VQVAETrainer(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

Vector Quantized Variational Auto-encoder trainer class

model: model object.

optim: pytorch optimizer object or optimizer options dict

epochs: max. number of epochs

exp_path: experiment output path

cur_epoch: current epoch

grad_acc_steps: gradient accumulation steps to simulate larger batch size.

device: cpu/gpu device

metrics: extra metrics to compute besides cxe.

lrsched: learning rate scheduler object

loggers: LoggerList object, loggers write training progress to std. output and file.

ddp: if True use distributed data parallel training

ddp_type: type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)

train_mode: training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]

use_amp: uses mixed precision training.

log_interval: number of optim. steps between log outputs

use_tensorboard: use tensorboard logger

use_wandb: use wandb logger

wandb: wandb dictionary of options

grad_clip: norm to clip gradients, if 0 there is no clipping

grad_clip_norm: norm type to clip gradients

swa_start: epoch to start doing swa

swa_lr: SWA learning rate

swa_anneal_epochs: SWA learning rate anneal epochs

cpu_offload: CPU offload of gradients when using fully sharded ddp

__init__(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

train_epoch(data_loader)[source]

Training epoch loop

Parameters: data_loader – PyTorch data loader return input/output pairs

validation_epoch(data_loader, swa_update_bn=False)[source]

Validation epoch loop

Parameters: data_loader – PyTorch data loader return input/output pairs

_default_loggers(log_interval, use_tensorboard, use_wandb, wandb): Creates the default data loaders

_get_lr(): Returns the current learning rate to show in the loggers

static add_argparse_args(parser, prefix=None, skip=[])

static add_class_args(parser, prefix=None, skip=[])

bn_update_epoch(data_loader)

checkpoint(logs=None)

Creates a checkpoint of the training, to save and posterior recovery

Parameters: logs – logs containing the current value of the metrics.

static filter_args(**kwargs)

fit(train_data, val_data=None)

Training function, it performs the training and validation epochs

Parameters

train_data – PyTorch data loader for the training loop
val_data – PyTorch data loader for the validation loop

load_checkpoint(file_path)

Loads a training checkpoint from file.

Parameters: file_path – checkpoint file path

load_last_checkpoint(): Loads the last training checkpoint in the experiment dir.

save_checkpoint(logs=None)

Saves a checkpoint of the training status

Parameters: logs – logs containing the current value of the metrics.

save_swa_model(logs=None)

Saves a checkpoint of the training status

Parameters: logs – logs containing the current value of the metrics.

set_train_mode()

update_model()

class hyperion.torch.trainers.vq_dvae_trainer.VQDVAETrainer(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

Vector Quantized Variational Auto-encoder trainer class

model: model object.

optim: pytorch optimizer object or optimizer options dict

epochs: max. number of epochs

exp_path: experiment output path

cur_epoch: current epoch

grad_acc_steps: gradient accumulation steps to simulate larger batch size.

device: cpu/gpu device

metrics: extra metrics to compute besides cxe.

lrsched: learning rate scheduler object

loggers: LoggerList object, loggers write training progress to std. output and file.

ddp: if True use distributed data parallel training

ddp_type: type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)

train_mode: training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]

use_amp: uses mixed precision training.

log_interval: number of optim. steps between log outputs

use_tensorboard: use tensorboard logger

use_wandb: use wandb logger

wandb: wandb dictionary of options

grad_clip: norm to clip gradients, if 0 there is no clipping

grad_clip_norm: norm type to clip gradients

swa_start: epoch to start doing swa

swa_lr: SWA learning rate

swa_anneal_epochs: SWA learning rate anneal epochs

cpu_offload: CPU offload of gradients when using fully sharded ddp

__init__(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

train_epoch(data_loader)[source]

Training epoch loop

Parameters: data_loader – pytorch data loader returning noisy and clean features

validation_epoch(data_loader, swa_update_bn=False)[source]

Validation epoch loop

Parameters: data_loader – PyTorch data loader return input/output pairs

_default_loggers(log_interval, use_tensorboard, use_wandb, wandb): Creates the default data loaders

_get_lr(): Returns the current learning rate to show in the loggers

static add_argparse_args(parser, prefix=None, skip=[])

static add_class_args(parser, prefix=None, skip=[])

bn_update_epoch(data_loader)

checkpoint(logs=None)

Creates a checkpoint of the training, to save and posterior recovery

Parameters: logs – logs containing the current value of the metrics.

static filter_args(**kwargs)

fit(train_data, val_data=None)

Training function, it performs the training and validation epochs

Parameters

train_data – PyTorch data loader for the training loop
val_data – PyTorch data loader for the validation loop

load_checkpoint(file_path)

Loads a training checkpoint from file.

Parameters: file_path – checkpoint file path

load_last_checkpoint(): Loads the last training checkpoint in the experiment dir.

save_checkpoint(logs=None)

Saves a checkpoint of the training status

Parameters: logs – logs containing the current value of the metrics.

save_swa_model(logs=None)

Saves a checkpoint of the training status

Parameters: logs – logs containing the current value of the metrics.

set_train_mode()

update_model()

Datasets, Data Loaders and Samplers

Datasets

Audio Datasets

class hyperion.torch.data.audio_dataset.AudioDataset(audio_path, key_file, class_file=None, time_durs_file=None, min_chunk_length=1, max_chunk_length=None, aug_cfg=None, return_fullseqs=False, return_class=True, return_clean_aug_pair=False, transpose_input=False, wav_scale=32767, is_val=False)[source]

__init__(audio_path, key_file, class_file=None, time_durs_file=None, min_chunk_length=1, max_chunk_length=None, aug_cfg=None, return_fullseqs=False, return_class=True, return_clean_aug_pair=False, transpose_input=False, wav_scale=32767, is_val=False)[source]

property wav_scale

property num_seqs

property seq_lengths

property total_length

property min_chunk_length

property max_chunk_length

property min_seq_length

property max_seq_length

property var_chunk_length

get_random_chunk_length()[source]

static filter_args(**kwargs)[source]

static add_class_args(parser, prefix=None)[source]

static add_argparse_args(parser, prefix=None)

Feature Sequence Datasets

class hyperion.torch.data.feat_seq_dataset.FeatSeqDataset(rspecifier, key_file, class_file=None, num_frames_file=None, path_prefix=None, min_chunk_length=1, max_chunk_length=None, return_fullseqs=False, return_class=True, transpose_input=True, is_val=False)[source]

__init__(rspecifier, key_file, class_file=None, num_frames_file=None, path_prefix=None, min_chunk_length=1, max_chunk_length=None, return_fullseqs=False, return_class=True, transpose_input=True, is_val=False)[source]

property num_seqs

property seq_lengths

property total_length

property min_chunk_length

property max_chunk_length

property min_seq_length

property max_seq_length

property var_chunk_length

get_random_chunk_length()[source]

static filter_args(**kwargs)[source]

static add_class_args(parser, prefix=None)[source]

static add_argparse_args(parser, prefix=None)

class hyperion.torch.data.paired_feat_seq_dataset.PairedFeatSeqDataset(rspecifier, key_file, pairs_file, class_file=None, num_frames_file=None, path_prefix=None, min_chunk_length=1, max_chunk_length=None, return_fullseqs=False, return_class=True, transpose_input=True, is_val=False)[source]

__init__(rspecifier, key_file, pairs_file, class_file=None, num_frames_file=None, path_prefix=None, min_chunk_length=1, max_chunk_length=None, return_fullseqs=False, return_class=True, transpose_input=True, is_val=False)[source]

static add_argparse_args(parser, prefix=None)

static add_class_args(parser, prefix=None)

static filter_args(**kwargs)

get_random_chunk_length()

property max_chunk_length

property max_seq_length

property min_chunk_length

property min_seq_length

property num_seqs

property seq_lengths

property total_length

property var_chunk_length

Embedding Datasets

class hyperion.torch.data.embed_dataset.EmbedDataset(embeds=None, class_ids=None, class_weights=None, rspecifier=None, key_file=None, class_file=None, path_prefix=None, preload_embeds=False, return_class=True, is_val=False)[source]

__init__(embeds=None, class_ids=None, class_weights=None, rspecifier=None, key_file=None, class_file=None, path_prefix=None, preload_embeds=False, return_class=True, is_val=False)[source]

Samplers

class hyperion.torch.data.weighted_seq_sampler.ClassWeightedSeqSampler(dataset, batch_size=1, iters_per_epoch='auto', num_egs_per_class=1, num_egs_per_utt=1, var_batch_size=False)[source]

__init__(dataset, batch_size=1, iters_per_epoch='auto', num_egs_per_class=1, num_egs_per_utt=1, var_batch_size=False)[source]

static filter_args(**kwargs)[source]

static add_class_args(parser, prefix=None)[source]

static add_argparse_args(parser, prefix=None)

class hyperion.torch.data.weighted_embed_sampler.ClassWeightedEmbedSampler(dataset, batch_size=1, iters_per_epoch=1, num_egs_per_class=1)[source]

__init__(dataset, batch_size=1, iters_per_epoch=1, num_egs_per_class=1)[source]

Data Transformations

class hyperion.torch.transforms.reshape.Reshape(shape)[source]

__init__(shape)[source]

Optimizers

These are custom optimizers and a factory class to create optimizers from config params.

Custom Optimizers

class hyperion.torch.optim.radam.RAdam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, degenerated_to_sgd=True)[source]

Implements Rectified Adam optimzier (RAdam) from

Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han. “On the Variance of the Adaptive Learning Rate and Beyond.” arXiv preprint arXiv:1908.03265 (2019).

code taken from:: https://github.com/LiyuanLucasLiu/RAdam/blob/master/radam/radam.py

__init__(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, degenerated_to_sgd=True)[source]

step(closure=None)[source]

Performs a single optimization step (parameter update).

Parameters: closure (callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.

Note

Unless otherwise specified, this function should not modify the .grad field of the parameters.

add_param_group(param_group)

Add a param group to the Optimizer s param_groups.

This can be useful when fine tuning a pre-trained network as frozen layers can be made trainable and added to the Optimizer as training progresses.

Parameters

param_group (dict) – Specifies what Tensors should be optimized along with group
options. (specific optimization) –

load_state_dict(state_dict)

Loads the optimizer state.

Parameters: state_dict (dict) – optimizer state. Should be an object returned from a call to state_dict().

state_dict()

Returns the state of the optimizer as a dict.

It contains two entries:

state - a dict holding current optimization state. Its content
differs between optimizer classes.
param_groups - a dict containing all parameter groups

zero_grad(set_to_none: bool = False)

Sets the gradients of all optimized torch.Tensor s to zero.

Parameters: set_to_none (bool) – instead of setting to zero, set the grads to None. This will in general have lower memory footprint, and can modestly improve performance. However, it changes certain behaviors. For example: 1. When the user tries to access a gradient and perform manual ops on it, a None attribute or a Tensor full of 0s will behave differently. 2. If the user requests zero_grad(set_to_none=True) followed by a backward pass, .grads are guaranteed to be None for params that did not receive a gradient. 3. torch.optim optimizers have a different behavior if the gradient is 0 or None (in one case it does the step with a gradient of 0 and in the other it skips the step altogether).

Optimizer Factory

class hyperion.torch.optim.factory.OptimizerFactory[source]

static create(params, opt_type, lr, momentum=0, beta1=0.9, beta2=0.99, rho=0.9, eps=1e-08, weight_decay=0, amsgrad=False, nesterov=False, lambd=0.0001, asgd_alpha=0.75, t0=1000000.0, rmsprop_alpha=0.99, centered=False, lr_decay=0, init_acc_val=0, max_iter=20, oss=False)[source]

static filter_args(**kwargs)[source]

static add_class_args(parser, prefix=None)[source]

static add_argparse_args(parser, prefix=None)

Learning Rate Schedulers

These are custom learning rate schedulers and a factory class to create schedulers from config params.

Custom LR Schedulers

class hyperion.torch.lr_schedulers.red_lr_on_plateau.ReduceLROnPlateau(optimizer, monitor='val_loss', mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, warmup_steps=0, eps=1e-08)[source]

Reduce learning rate when a metric has stopped improving. Models often benefit from reducing the learning rate by a factor of 2-10 once learning stagnates. This scheduler reads a metrics quantity and if no improvement is seen for a ‘patience’ number of epochs, the learning rate is reduced.

optimizer

optimizer.

Type: Optimizer

mode

One of min, max. In min mode, lr will be reduced when the quantity monitored has stopped decreasing; in max mode it will be reduced when the quantity monitored has stopped increasing. Default: ‘min’.

Type: str

factor

Factor by which the learning rate will be reduced. new_lr = lr * factor. Default: 0.1.

Type: float

patience

Number of epochs with no improvement after which learning rate will be reduced. For example, if patience = 2, then we will ignore the first 2 epochs with no improvement, and will only decrease the LR after the 3rd epoch if the loss still hasn’t improved then. Default: 10.

Type: int

threshold

Threshold for measuring the new optimum, to only focus on significant changes. Default: 1e-4.

Type: float

threshold_mode

One of rel, abs. In rel mode, dynamic_threshold = best * ( 1 + threshold ) in ‘max’ mode or best * ( 1 - threshold ) in min mode. In abs mode, dynamic_threshold = best + threshold in max mode or best - threshold in min mode. Default: ‘rel’.

Type: str

cooldown

Number of epochs to wait before resuming normal operation after lr has been reduced. Default: 0.

Type: int

min_lr

A scalar or a list of scalars. A lower bound on the learning rate of all param groups or each group respectively. Default: 0.

Type: float or list

eps

Minimal decay applied to lr. If the difference between new and old lr is smaller than eps, the update is ignored. Default: 1e-8.

Type: float

__init__(optimizer, monitor='val_loss', mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, warmup_steps=0, eps=1e-08)[source]

_reset()[source]: Resets num_bad_epochs counter and cooldown counter.

on_opt_step()[source]

on_epoch_begin(epoch=None)[source]

on_epoch_end(metrics=None)[source]

property in_cooldown

load_state_dict(state_dict)[source]

Loads the schedulers state.

Parameters: state_dict (dict) – scheduler state. Should be an object returned from a call to state_dict().

get_lr()

get_warmup_lr()

property in_warmup

state_dict()

Returns the state of the scheduler as a dict.

It contains an entry for every variable in self.__dict__ which is not the optimizer.

class hyperion.torch.lr_schedulers.exp_lr.ExponentialLR(optimizer, decay_rate, decay_steps, hold_steps, min_lr=0, warmup_steps=0, epoch=0, step=0, update_lr_on_opt_step=False)[source]

Exponential learning rate scheduler.

__init__(optimizer, decay_rate, decay_steps, hold_steps, min_lr=0, warmup_steps=0, epoch=0, step=0, update_lr_on_opt_step=False)[source]

get_lr(step)[source]

load_state_dict(state_dict)[source]

Loads the schedulers state.

Parameters: state_dict (dict) – scheduler state. Should be an object returned from a call to state_dict().

get_warmup_lr()

property in_warmup

on_epoch_begin(epoch=None, **kwargs)

on_epoch_end(metrics=None)

on_opt_step()

state_dict()

Returns the state of the scheduler as a dict.

It contains an entry for every variable in self.__dict__ which is not the optimizer.

class hyperion.torch.lr_schedulers.invpow_lr.InvPowLR(optimizer, power=0.5, hold_steps=0, min_lr=0, warmup_steps=0, epoch=0, step=0, update_lr_on_opt_step=False)[source]

inverse power learning rate scheduler.

__init__(optimizer, power=0.5, hold_steps=0, min_lr=0, warmup_steps=0, epoch=0, step=0, update_lr_on_opt_step=False)[source]

get_lr(step)[source]

load_state_dict(state_dict)[source]

Loads the schedulers state.

Parameters: state_dict (dict) – scheduler state. Should be an object returned from a call to state_dict().

get_warmup_lr()

property in_warmup

on_epoch_begin(epoch=None, **kwargs)

on_epoch_end(metrics=None)

on_opt_step()

state_dict()

Returns the state of the scheduler as a dict.

It contains an entry for every variable in self.__dict__ which is not the optimizer.

class hyperion.torch.lr_schedulers.cos_lr.CosineLR(optimizer, T, T_mul=1, min_lr=0, warmup_steps=0, warm_restarts=False, gamma=1, last_restart=0, num_restarts=0, epoch=0, step=0, update_lr_on_opt_step=False)[source]

Set the learning rate of each parameter group using a cosine annealing schedule, where \(\eta_{max}\) is set to the initial lr and \(T_{cur}\) is the number of epochs since the last restart in SGDR:

\[\eta_t = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})(1 + \cos(\frac{T_{cur}}{T_{max}}\pi))\]

When epoch=-1, sets initial lr as lr.

It has been proposed in SGDR: Stochastic Gradient Descent with Warm Restarts.

Parameters

optimizer (Optimizer) – Wrapped optimizer.
T_max (int) – Maximum number of iterations.
eta_min (float) – Minimum learning rate. Default: 0.
epoch (int) – The index of last epoch. Default: -1.

__init__(optimizer, T, T_mul=1, min_lr=0, warmup_steps=0, warm_restarts=False, gamma=1, last_restart=0, num_restarts=0, epoch=0, step=0, update_lr_on_opt_step=False)[source]

on_epoch_begin(epoch=None, epoch_updates=1, **kwargs)[source]

get_lr(step)[source]

get_warmup_lr()

property in_warmup

load_state_dict(state_dict)

Loads the schedulers state.

Parameters: state_dict (dict) – scheduler state. Should be an object returned from a call to state_dict().

on_epoch_end(metrics=None)

on_opt_step()

state_dict()

Returns the state of the scheduler as a dict.

It contains an entry for every variable in self.__dict__ which is not the optimizer.

LR Scheduler Factory

class hyperion.torch.lr_schedulers.factory.LRSchedulerFactory[source]

create(lrsch_type, decay_rate=0.01, decay_steps=100, power=0.5, hold_steps=10, t=10, t_mul=1, warm_restarts=False, gamma=1, monitor='val_loss', mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0, eps=1e-08, min_lr=0, warmup_steps=0, update_lr_on_opt_step=False)[source]

static filter_args(**kwargs)[source]

static add_class_args(parser, prefix=None)[source]

static add_argparse_args(parser, prefix=None)

Metrics

This are metric classes and functions that cannot be used as loss function.

Metric Classes

class hyperion.torch.metrics.metrics.TorchMetric(weight=None, reduction='mean')[source]

Base class for metrics that cannot be objective functions

__init__(weight=None, reduction='mean')[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

T_destination: alias of TypeVar(‘T_destination’, bound=Mapping[str, torch.Tensor])

_get_backward_hooks(): Returns the backward hooks for use in the call function. It returns two lists, one with the full backward hooks and one with the non-full backward hooks.

_load_from_state_dict(state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs)

Copies parameters and buffers from state_dict into only this module, but not its descendants. This is called on every submodule in load_state_dict(). Metadata saved for this module in input state_dict is provided as local_metadata. For state dicts without metadata, local_metadata is empty. Subclasses can achieve class-specific backward compatible loading using the version number at local_metadata.get(“version”, None).

Note

state_dict is not the same object as the input state_dict to load_state_dict(). So it can be modified.

Parameters

state_dict (dict) – a dict containing parameters and persistent buffers.
prefix (str) – the prefix for parameters and buffers used in this module
local_metadata (dict) – a dict containing the metadata for this module. See
strict (bool) – whether to strictly enforce that the keys in state_dict with prefix match the names of parameters and buffers in this module
missing_keys (list of str) – if strict=True, add missing keys to this list
unexpected_keys (list of str) – if strict=True, add unexpected keys to this list
error_msgs (list of str) – error messages should be added to this list, and will be reported together in load_state_dict()

_named_members(get_members_fn, prefix='', recurse=True): Helper method for yielding various names + members of modules.

_register_load_state_dict_pre_hook(hook): These hooks will be called with arguments: state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs, before loading state_dict into self. These arguments are exactly the same as those of _load_from_state_dict.

_register_state_dict_hook(hook): These hooks will be called with arguments: self, state_dict, prefix, local_metadata, after the state_dict of self is set. Note that only parameters and buffers of self or its children are guaranteed to exist in state_dict. The hooks may modify state_dict inplace or return a new one.

_save_to_state_dict(destination, prefix, keep_vars)

Saves module state to destination dictionary, containing a state of the module, but not its descendants. This is called on every submodule in state_dict().

In rare cases, subclasses can achieve class-specific behavior by overriding this method with custom logic.

Parameters

destination (dict) – a dict where state will be stored
prefix (str) – the prefix for parameters and buffers used in this module

add_module(name: str, module: Optional[torch.nn.modules.module.Module]) → None

Adds a child module to the current module.

The module can be accessed as an attribute using the given name.

Parameters

name (string) – name of the child module. The child module can be accessed from this module using the given name
module (Module) – child module to be added to the module.

apply(fn: Callable[[torch.nn.modules.module.Module], None]) → torch.nn.modules.module.T

Applies fn recursively to every submodule (as returned by .children()) as well as self. Typical use includes initializing the parameters of a model (see also nn-init-doc).

Parameters: fn (Module -> None) – function to be applied to each submodule
Returns: self
Return type: Module

Example:

>>> @torch.no_grad()
>>> def init_weights(m):
>>>     print(m)
>>>     if type(m) == nn.Linear:
>>>         m.weight.fill_(1.0)
>>>         print(m.weight)
>>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
>>> net.apply(init_weights)
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[ 1.,  1.],
        [ 1.,  1.]])
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[ 1.,  1.],
        [ 1.,  1.]])
Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)

bfloat16() → torch.nn.modules.module.T

Casts all floating point parameters and buffers to bfloat16 datatype.

Returns: self
Return type: Module

buffers(recurse: bool = True) → Iterator[torch.Tensor]

Returns an iterator over module buffers.

Parameters: recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.
Yields: torch.Tensor – module buffer

Example:

>>> for buf in model.buffers():
>>>     print(type(buf), buf.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)

children() → Iterator[torch.nn.modules.module.Module]

Returns an iterator over immediate children modules.

Yields: Module – a child module

cpu() → torch.nn.modules.module.T

Moves all model parameters and buffers to the CPU.

Returns: self
Return type: Module

cuda(device: Optional[Union[int, torch.device]] = None) → torch.nn.modules.module.T

Moves all model parameters and buffers to the GPU.

This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.

Parameters: device (int, optional) – if specified, all parameters will be copied to that device
Returns: self
Return type: Module

double() → torch.nn.modules.module.T

Casts all floating point parameters and buffers to double datatype.

Returns: self
Return type: Module

dump_patches: bool = False

This allows better BC support for load_state_dict(). In state_dict(), the version number will be saved as in the attribute _metadata of the returned state dict, and thus pickled. _metadata is a dictionary with keys that follow the naming convention of state dict. See _load_from_state_dict on how to use this information in loading.

If new parameters/buffers are added/removed from a module, this number shall be bumped, and the module’s _load_from_state_dict method can compare the version number and do appropriate changes if the state dict is from before the change.

eval() → torch.nn.modules.module.T

Sets the module in evaluation mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

This is equivalent with self.train(False).

Returns: self
Return type: Module

extra_repr() → str

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

float() → torch.nn.modules.module.T

Casts all floating point parameters and buffers to float datatype.

Returns: self
Return type: Module

forward(*input: Any) → None

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

half() → torch.nn.modules.module.T

Casts all floating point parameters and buffers to half datatype.

Returns: self
Return type: Module

load_state_dict(state_dict: OrderedDict[str, Tensor], strict: bool = True)

Copies parameters and buffers from state_dict into this module and its descendants. If strict is True, then the keys of state_dict must exactly match the keys returned by this module’s state_dict() function.

Parameters

state_dict (dict) – a dict containing parameters and persistent buffers.
strict (bool, optional) – whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict() function. Default: True

Returns

missing_keys is a list of str containing the missing keys
unexpected_keys is a list of str containing the unexpected keys

Return type

NamedTuple with missing_keys and unexpected_keys fields

modules() → Iterator[torch.nn.modules.module.Module]

Returns an iterator over all modules in the network.

Yields: Module – a module in the network

Note

Duplicate modules are returned only once. In the following example, l will be returned only once.

Example:

>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.modules()):
        print(idx, '->', m)

0 -> Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
1 -> Linear(in_features=2, out_features=2, bias=True)

named_buffers(prefix: str = '', recurse: bool = True) → Iterator[Tuple[str, torch.Tensor]]

Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

Parameters

prefix (str) – prefix to prepend to all buffer names.
recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.

Yields

(string, torch.Tensor) – Tuple containing the name and buffer

Example:

>>> for name, buf in self.named_buffers():
>>>    if name in ['running_var']:
>>>        print(buf.size())

named_children() → Iterator[Tuple[str, torch.nn.modules.module.Module]]

Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

Yields: (string, Module) – Tuple containing a name and child module

Example:

>>> for name, module in model.named_children():
>>>     if name in ['conv4', 'conv5']:
>>>         print(module)

named_modules(memo: Optional[Set[torch.nn.modules.module.Module]] = None, prefix: str = '')

Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

Yields: (string, Module) – Tuple of name and module

Note

Duplicate modules are returned only once. In the following example, l will be returned only once.

Example:

>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.named_modules()):
        print(idx, '->', m)

0 -> ('', Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
))
1 -> ('0', Linear(in_features=2, out_features=2, bias=True))

named_parameters(prefix: str = '', recurse: bool = True) → Iterator[Tuple[str, torch.nn.parameter.Parameter]]

Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

Parameters

prefix (str) – prefix to prepend to all parameter names.
recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.

Yields

(string, Parameter) – Tuple containing the name and parameter

Example:

>>> for name, param in self.named_parameters():
>>>    if name in ['bias']:
>>>        print(param.size())

parameters(recurse: bool = True) → Iterator[torch.nn.parameter.Parameter]

Returns an iterator over module parameters.

This is typically passed to an optimizer.

Parameters: recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.
Yields: Parameter – module parameter

Example:

>>> for param in model.parameters():
>>>     print(type(param), param.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)

register_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) → torch.utils.hooks.RemovableHandle

Registers a backward hook on the module.

This function is deprecated in favor of nn.Module.register_full_backward_hook() and the behavior of this function will change in future versions.

Returns: a handle that can be used to remove the added hook by calling handle.remove()
Return type: torch.utils.hooks.RemovableHandle

register_buffer(name: str, tensor: Optional[torch.Tensor], persistent: bool = True) → None

Adds a buffer to the module.

This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s running_mean is not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by setting persistent to False. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’s state_dict.

Buffers can be accessed as attributes using given names.

Parameters

name (string) – name of the buffer. The buffer can be accessed from this module using the given name
tensor (Tensor) – buffer to be registered.
persistent (bool) – whether the buffer is part of this module’s state_dict.

Example:

>>> self.register_buffer('running_mean', torch.zeros(num_features))

register_forward_hook(hook: Callable[[...], None]) → torch.utils.hooks.RemovableHandle

Registers a forward hook on the module.

The hook will be called every time after forward() has computed an output. It should have the following signature:

hook(module, input, output) -> None or modified output

The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the forward. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called after forward() is called.

Returns: a handle that can be used to remove the added hook by calling handle.remove()
Return type: torch.utils.hooks.RemovableHandle

register_forward_pre_hook(hook: Callable[[...], None]) → torch.utils.hooks.RemovableHandle

Registers a forward pre-hook on the module.

The hook will be called every time before forward() is invoked. It should have the following signature:

hook(module, input) -> None or modified input

The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the forward. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple).

Returns: a handle that can be used to remove the added hook by calling handle.remove()
Return type: torch.utils.hooks.RemovableHandle

register_full_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) → torch.utils.hooks.RemovableHandle

Registers a backward hook on the module.

The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature:

hook(module, grad_input, grad_output) -> tuple(Tensor) or None

The grad_input and grad_output are tuples that contain the gradients with respect to the inputs and outputs respectively. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place of grad_input in subsequent computations. grad_input will only correspond to the inputs given as positional arguments and all kwarg arguments are ignored. Entries in grad_input and grad_output will be None for all non-Tensor arguments.

Warning

Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error.

Returns: a handle that can be used to remove the added hook by calling handle.remove()
Return type: torch.utils.hooks.RemovableHandle

register_parameter(name: str, param: Optional[torch.nn.parameter.Parameter]) → None

Adds a parameter to the module.

The parameter can be accessed as an attribute using given name.

Parameters

name (string) – name of the parameter. The parameter can be accessed from this module using the given name
param (Parameter) – parameter to be added to the module.

requires_grad_(requires_grad: bool = True) → torch.nn.modules.module.T

Change if autograd should record operations on parameters in this module.

This method sets the parameters’ requires_grad attributes in-place.

This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training).

Parameters: requires_grad (bool) – whether autograd should record operations on parameters in this module. Default: True.
Returns: self
Return type: Module

share_memory() → torch.nn.modules.module.T

state_dict(destination=None, prefix='', keep_vars=False)

Returns a dictionary containing a whole state of the module.

Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names.

Returns: a dictionary containing a whole state of the module
Return type: dict

Example:

>>> module.state_dict().keys()
['bias', 'weight']

to(*args, **kwargs)

Moves and/or casts the parameters and buffers.

This can be called as

to(device=None, dtype=None, non_blocking=False)

to(dtype, non_blocking=False)

to(tensor, non_blocking=False)

to(memory_format=torch.channels_last)

Its signature is similar to torch.Tensor.to(), but only accepts floating point or complex dtype`s. In addition, this method will only cast the floating point or complex parameters and buffers to :attr:`dtype (if given). The integral parameters and buffers will be moved device, if that is given, but with dtypes unchanged. When non_blocking is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.

See below for examples.

Note

This method modifies the module in-place.

Parameters

device (torch.device) – the desired device of the parameters and buffers in this module
dtype (torch.dtype) – the desired floating point or complex dtype of the parameters and buffers in this module
tensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module
memory_format (torch.memory_format) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)

Returns

self

Return type

Module

Examples:

>>> linear = nn.Linear(2, 2)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]])
>>> linear.to(torch.double)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]], dtype=torch.float64)
>>> gpu1 = torch.device("cuda:1")
>>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
>>> cpu = torch.device("cpu")
>>> linear.to(cpu)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16)

>>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble)
>>> linear.weight
Parameter containing:
tensor([[ 0.3741+0.j,  0.2382+0.j],
        [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128)
>>> linear(torch.ones(3, 2, dtype=torch.cdouble))
tensor([[0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)

train(mode: bool = True) → torch.nn.modules.module.T

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters: mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.
Returns: self
Return type: Module

type(dst_type: Union[torch.dtype, str]) → torch.nn.modules.module.T

Casts all parameters and buffers to dst_type.

Parameters: dst_type (type or string) – the desired type
Returns: self
Return type: Module

xpu(device: Optional[Union[int, torch.device]] = None) → torch.nn.modules.module.T

Moves all model parameters and buffers to the XPU.

This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on XPU while being optimized.

Parameters: device (int, optional) – if specified, all parameters will be copied to that device
Returns: self
Return type: Module

zero_grad(set_to_none: bool = False) → None

Sets gradients of all model parameters to zero. See similar function under torch.optim.Optimizer for more context.

Parameters: set_to_none (bool) – instead of setting to zero, set the grads to None. See torch.optim.Optimizer.zero_grad() for details.

training: bool

class hyperion.torch.metrics.accuracy.CategoricalAccuracy(weight=None, reduction='mean')[source]

__init__(weight=None, reduction='mean')[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(input, target)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

T_destination: alias of TypeVar(‘T_destination’, bound=Mapping[str, torch.Tensor])

_get_backward_hooks(): Returns the backward hooks for use in the call function. It returns two lists, one with the full backward hooks and one with the non-full backward hooks.

_load_from_state_dict(state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs)

Copies parameters and buffers from state_dict into only this module, but not its descendants. This is called on every submodule in load_state_dict(). Metadata saved for this module in input state_dict is provided as local_metadata. For state dicts without metadata, local_metadata is empty. Subclasses can achieve class-specific backward compatible loading using the version number at local_metadata.get(“version”, None).

Note

state_dict is not the same object as the input state_dict to load_state_dict(). So it can be modified.

Parameters

state_dict (dict) – a dict containing parameters and persistent buffers.
prefix (str) – the prefix for parameters and buffers used in this module
local_metadata (dict) – a dict containing the metadata for this module. See
strict (bool) – whether to strictly enforce that the keys in state_dict with prefix match the names of parameters and buffers in this module
missing_keys (list of str) – if strict=True, add missing keys to this list
unexpected_keys (list of str) – if strict=True, add unexpected keys to this list
error_msgs (list of str) – error messages should be added to this list, and will be reported together in load_state_dict()

_named_members(get_members_fn, prefix='', recurse=True): Helper method for yielding various names + members of modules.

_register_load_state_dict_pre_hook(hook): These hooks will be called with arguments: state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs, before loading state_dict into self. These arguments are exactly the same as those of _load_from_state_dict.

_register_state_dict_hook(hook): These hooks will be called with arguments: self, state_dict, prefix, local_metadata, after the state_dict of self is set. Note that only parameters and buffers of self or its children are guaranteed to exist in state_dict. The hooks may modify state_dict inplace or return a new one.

_save_to_state_dict(destination, prefix, keep_vars)

Saves module state to destination dictionary, containing a state of the module, but not its descendants. This is called on every submodule in state_dict().

In rare cases, subclasses can achieve class-specific behavior by overriding this method with custom logic.

Parameters

destination (dict) – a dict where state will be stored
prefix (str) – the prefix for parameters and buffers used in this module

add_module(name: str, module: Optional[torch.nn.modules.module.Module]) → None

Adds a child module to the current module.

The module can be accessed as an attribute using the given name.

Parameters

name (string) – name of the child module. The child module can be accessed from this module using the given name
module (Module) – child module to be added to the module.

apply(fn: Callable[[torch.nn.modules.module.Module], None]) → torch.nn.modules.module.T

Applies fn recursively to every submodule (as returned by .children()) as well as self. Typical use includes initializing the parameters of a model (see also nn-init-doc).

Parameters: fn (Module -> None) – function to be applied to each submodule
Returns: self
Return type: Module

Example:

>>> @torch.no_grad()
>>> def init_weights(m):
>>>     print(m)
>>>     if type(m) == nn.Linear:
>>>         m.weight.fill_(1.0)
>>>         print(m.weight)
>>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
>>> net.apply(init_weights)
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[ 1.,  1.],
        [ 1.,  1.]])
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[ 1.,  1.],
        [ 1.,  1.]])
Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)

bfloat16() → torch.nn.modules.module.T

Casts all floating point parameters and buffers to bfloat16 datatype.

Returns: self
Return type: Module

buffers(recurse: bool = True) → Iterator[torch.Tensor]

Returns an iterator over module buffers.

Parameters: recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.
Yields: torch.Tensor – module buffer

Example:

>>> for buf in model.buffers():
>>>     print(type(buf), buf.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)

children() → Iterator[torch.nn.modules.module.Module]

Returns an iterator over immediate children modules.

Yields: Module – a child module

cpu() → torch.nn.modules.module.T

Moves all model parameters and buffers to the CPU.

Returns: self
Return type: Module

cuda(device: Optional[Union[int, torch.device]] = None) → torch.nn.modules.module.T

Moves all model parameters and buffers to the GPU.

This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.

Parameters: device (int, optional) – if specified, all parameters will be copied to that device
Returns: self
Return type: Module

double() → torch.nn.modules.module.T

Casts all floating point parameters and buffers to double datatype.

Returns: self
Return type: Module

dump_patches: bool = False

This allows better BC support for load_state_dict(). In state_dict(), the version number will be saved as in the attribute _metadata of the returned state dict, and thus pickled. _metadata is a dictionary with keys that follow the naming convention of state dict. See _load_from_state_dict on how to use this information in loading.

If new parameters/buffers are added/removed from a module, this number shall be bumped, and the module’s _load_from_state_dict method can compare the version number and do appropriate changes if the state dict is from before the change.

eval() → torch.nn.modules.module.T

Sets the module in evaluation mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

This is equivalent with self.train(False).

Returns: self
Return type: Module

extra_repr() → str

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

float() → torch.nn.modules.module.T

Casts all floating point parameters and buffers to float datatype.

Returns: self
Return type: Module

half() → torch.nn.modules.module.T

Casts all floating point parameters and buffers to half datatype.

Returns: self
Return type: Module

load_state_dict(state_dict: OrderedDict[str, Tensor], strict: bool = True)

Copies parameters and buffers from state_dict into this module and its descendants. If strict is True, then the keys of state_dict must exactly match the keys returned by this module’s state_dict() function.

Parameters

state_dict (dict) – a dict containing parameters and persistent buffers.
strict (bool, optional) – whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict() function. Default: True

Returns

missing_keys is a list of str containing the missing keys
unexpected_keys is a list of str containing the unexpected keys

Return type

NamedTuple with missing_keys and unexpected_keys fields

modules() → Iterator[torch.nn.modules.module.Module]

Returns an iterator over all modules in the network.

Yields: Module – a module in the network

Note

Duplicate modules are returned only once. In the following example, l will be returned only once.

Example:

>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.modules()):
        print(idx, '->', m)

0 -> Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
1 -> Linear(in_features=2, out_features=2, bias=True)

named_buffers(prefix: str = '', recurse: bool = True) → Iterator[Tuple[str, torch.Tensor]]

Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

Parameters

prefix (str) – prefix to prepend to all buffer names.
recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.

Yields

(string, torch.Tensor) – Tuple containing the name and buffer

Example:

>>> for name, buf in self.named_buffers():
>>>    if name in ['running_var']:
>>>        print(buf.size())

named_children() → Iterator[Tuple[str, torch.nn.modules.module.Module]]

Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

Yields: (string, Module) – Tuple containing a name and child module

Example:

>>> for name, module in model.named_children():
>>>     if name in ['conv4', 'conv5']:
>>>         print(module)

named_modules(memo: Optional[Set[torch.nn.modules.module.Module]] = None, prefix: str = '')

Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

Yields: (string, Module) – Tuple of name and module

Note

Duplicate modules are returned only once. In the following example, l will be returned only once.

Example:

>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.named_modules()):
        print(idx, '->', m)

0 -> ('', Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
))
1 -> ('0', Linear(in_features=2, out_features=2, bias=True))

named_parameters(prefix: str = '', recurse: bool = True) → Iterator[Tuple[str, torch.nn.parameter.Parameter]]

Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

Parameters

prefix (str) – prefix to prepend to all parameter names.
recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.

Yields

(string, Parameter) – Tuple containing the name and parameter

Example:

>>> for name, param in self.named_parameters():
>>>    if name in ['bias']:
>>>        print(param.size())

parameters(recurse: bool = True) → Iterator[torch.nn.parameter.Parameter]

Returns an iterator over module parameters.

This is typically passed to an optimizer.

Parameters: recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.
Yields: Parameter – module parameter

Example:

>>> for param in model.parameters():
>>>     print(type(param), param.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)

register_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) → torch.utils.hooks.RemovableHandle

Registers a backward hook on the module.

This function is deprecated in favor of nn.Module.register_full_backward_hook() and the behavior of this function will change in future versions.

Returns: a handle that can be used to remove the added hook by calling handle.remove()
Return type: torch.utils.hooks.RemovableHandle

register_buffer(name: str, tensor: Optional[torch.Tensor], persistent: bool = True) → None

Adds a buffer to the module.

This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s running_mean is not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by setting persistent to False. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’s state_dict.

Buffers can be accessed as attributes using given names.

Parameters

name (string) – name of the buffer. The buffer can be accessed from this module using the given name
tensor (Tensor) – buffer to be registered.
persistent (bool) – whether the buffer is part of this module’s state_dict.

Example:

>>> self.register_buffer('running_mean', torch.zeros(num_features))

register_forward_hook(hook: Callable[[...], None]) → torch.utils.hooks.RemovableHandle

Registers a forward hook on the module.

The hook will be called every time after forward() has computed an output. It should have the following signature:

hook(module, input, output) -> None or modified output

The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the forward. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called after forward() is called.

Returns: a handle that can be used to remove the added hook by calling handle.remove()
Return type: torch.utils.hooks.RemovableHandle

register_forward_pre_hook(hook: Callable[[...], None]) → torch.utils.hooks.RemovableHandle

Registers a forward pre-hook on the module.

The hook will be called every time before forward() is invoked. It should have the following signature:

hook(module, input) -> None or modified input

The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the forward. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple).

Returns: a handle that can be used to remove the added hook by calling handle.remove()
Return type: torch.utils.hooks.RemovableHandle

register_full_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) → torch.utils.hooks.RemovableHandle

Registers a backward hook on the module.

The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature:

hook(module, grad_input, grad_output) -> tuple(Tensor) or None

The grad_input and grad_output are tuples that contain the gradients with respect to the inputs and outputs respectively. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place of grad_input in subsequent computations. grad_input will only correspond to the inputs given as positional arguments and all kwarg arguments are ignored. Entries in grad_input and grad_output will be None for all non-Tensor arguments.

Warning

Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error.

Returns: a handle that can be used to remove the added hook by calling handle.remove()
Return type: torch.utils.hooks.RemovableHandle

register_parameter(name: str, param: Optional[torch.nn.parameter.Parameter]) → None

Adds a parameter to the module.

The parameter can be accessed as an attribute using given name.

Parameters

name (string) – name of the parameter. The parameter can be accessed from this module using the given name
param (Parameter) – parameter to be added to the module.

requires_grad_(requires_grad: bool = True) → torch.nn.modules.module.T

Change if autograd should record operations on parameters in this module.

This method sets the parameters’ requires_grad attributes in-place.

This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training).

Parameters: requires_grad (bool) – whether autograd should record operations on parameters in this module. Default: True.
Returns: self
Return type: Module

share_memory() → torch.nn.modules.module.T

state_dict(destination=None, prefix='', keep_vars=False)

Returns a dictionary containing a whole state of the module.

Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names.

Returns: a dictionary containing a whole state of the module
Return type: dict

Example:

>>> module.state_dict().keys()
['bias', 'weight']

to(*args, **kwargs)

Moves and/or casts the parameters and buffers.

This can be called as

to(device=None, dtype=None, non_blocking=False)

to(dtype, non_blocking=False)

to(tensor, non_blocking=False)

to(memory_format=torch.channels_last)

Its signature is similar to torch.Tensor.to(), but only accepts floating point or complex dtype`s. In addition, this method will only cast the floating point or complex parameters and buffers to :attr:`dtype (if given). The integral parameters and buffers will be moved device, if that is given, but with dtypes unchanged. When non_blocking is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.

See below for examples.

Note

This method modifies the module in-place.

Parameters

device (torch.device) – the desired device of the parameters and buffers in this module
dtype (torch.dtype) – the desired floating point or complex dtype of the parameters and buffers in this module
tensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module
memory_format (torch.memory_format) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)

Returns

self

Return type

Module

Examples:

>>> linear = nn.Linear(2, 2)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]])
>>> linear.to(torch.double)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]], dtype=torch.float64)
>>> gpu1 = torch.device("cuda:1")
>>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
>>> cpu = torch.device("cpu")
>>> linear.to(cpu)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16)

>>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble)
>>> linear.weight
Parameter containing:
tensor([[ 0.3741+0.j,  0.2382+0.j],
        [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128)
>>> linear(torch.ones(3, 2, dtype=torch.cdouble))
tensor([[0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)

train(mode: bool = True) → torch.nn.modules.module.T

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters: mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.
Returns: self
Return type: Module

type(dst_type: Union[torch.dtype, str]) → torch.nn.modules.module.T

Casts all parameters and buffers to dst_type.

Parameters: dst_type (type or string) – the desired type
Returns: self
Return type: Module

xpu(device: Optional[Union[int, torch.device]] = None) → torch.nn.modules.module.T

Moves all model parameters and buffers to the XPU.

This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on XPU while being optimized.

Parameters: device (int, optional) – if specified, all parameters will be copied to that device
Returns: self
Return type: Module

zero_grad(set_to_none: bool = False) → None

Sets gradients of all model parameters to zero. See similar function under torch.optim.Optimizer for more context.

Parameters: set_to_none (bool) – instead of setting to zero, set the grads to None. See torch.optim.Optimizer.zero_grad() for details.

training: bool

class hyperion.torch.metrics.accuracy.BinaryAccuracy(weight=None, reduction='mean', thr=0.5)[source]

__init__(weight=None, reduction='mean', thr=0.5)[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(input, target)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

T_destination: alias of TypeVar(‘T_destination’, bound=Mapping[str, torch.Tensor])

_get_backward_hooks(): Returns the backward hooks for use in the call function. It returns two lists, one with the full backward hooks and one with the non-full backward hooks.

_load_from_state_dict(state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs)

Copies parameters and buffers from state_dict into only this module, but not its descendants. This is called on every submodule in load_state_dict(). Metadata saved for this module in input state_dict is provided as local_metadata. For state dicts without metadata, local_metadata is empty. Subclasses can achieve class-specific backward compatible loading using the version number at local_metadata.get(“version”, None).

Note

state_dict is not the same object as the input state_dict to load_state_dict(). So it can be modified.

Parameters

state_dict (dict) – a dict containing parameters and persistent buffers.
prefix (str) – the prefix for parameters and buffers used in this module
local_metadata (dict) – a dict containing the metadata for this module. See
strict (bool) – whether to strictly enforce that the keys in state_dict with prefix match the names of parameters and buffers in this module
missing_keys (list of str) – if strict=True, add missing keys to this list
unexpected_keys (list of str) – if strict=True, add unexpected keys to this list
error_msgs (list of str) – error messages should be added to this list, and will be reported together in load_state_dict()

_named_members(get_members_fn, prefix='', recurse=True): Helper method for yielding various names + members of modules.

_register_load_state_dict_pre_hook(hook): These hooks will be called with arguments: state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs, before loading state_dict into self. These arguments are exactly the same as those of _load_from_state_dict.

_register_state_dict_hook(hook): These hooks will be called with arguments: self, state_dict, prefix, local_metadata, after the state_dict of self is set. Note that only parameters and buffers of self or its children are guaranteed to exist in state_dict. The hooks may modify state_dict inplace or return a new one.

_save_to_state_dict(destination, prefix, keep_vars)

Saves module state to destination dictionary, containing a state of the module, but not its descendants. This is called on every submodule in state_dict().

In rare cases, subclasses can achieve class-specific behavior by overriding this method with custom logic.

Parameters

destination (dict) – a dict where state will be stored
prefix (str) – the prefix for parameters and buffers used in this module

add_module(name: str, module: Optional[torch.nn.modules.module.Module]) → None

Adds a child module to the current module.

The module can be accessed as an attribute using the given name.

Parameters

name (string) – name of the child module. The child module can be accessed from this module using the given name
module (Module) – child module to be added to the module.

apply(fn: Callable[[torch.nn.modules.module.Module], None]) → torch.nn.modules.module.T

Applies fn recursively to every submodule (as returned by .children()) as well as self. Typical use includes initializing the parameters of a model (see also nn-init-doc).

Parameters: fn (Module -> None) – function to be applied to each submodule
Returns: self
Return type: Module

Example:

>>> @torch.no_grad()
>>> def init_weights(m):
>>>     print(m)
>>>     if type(m) == nn.Linear:
>>>         m.weight.fill_(1.0)
>>>         print(m.weight)
>>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
>>> net.apply(init_weights)
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[ 1.,  1.],
        [ 1.,  1.]])
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[ 1.,  1.],
        [ 1.,  1.]])
Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)

bfloat16() → torch.nn.modules.module.T

Casts all floating point parameters and buffers to bfloat16 datatype.

Returns: self
Return type: Module

buffers(recurse: bool = True) → Iterator[torch.Tensor]

Returns an iterator over module buffers.

Parameters: recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.
Yields: torch.Tensor – module buffer

Example:

>>> for buf in model.buffers():
>>>     print(type(buf), buf.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)

children() → Iterator[torch.nn.modules.module.Module]

Returns an iterator over immediate children modules.

Yields: Module – a child module

cpu() → torch.nn.modules.module.T

Moves all model parameters and buffers to the CPU.

Returns: self
Return type: Module

cuda(device: Optional[Union[int, torch.device]] = None) → torch.nn.modules.module.T

Moves all model parameters and buffers to the GPU.

This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.

Parameters: device (int, optional) – if specified, all parameters will be copied to that device
Returns: self
Return type: Module

double() → torch.nn.modules.module.T

Casts all floating point parameters and buffers to double datatype.

Returns: self
Return type: Module

dump_patches: bool = False

This allows better BC support for load_state_dict(). In state_dict(), the version number will be saved as in the attribute _metadata of the returned state dict, and thus pickled. _metadata is a dictionary with keys that follow the naming convention of state dict. See _load_from_state_dict on how to use this information in loading.

If new parameters/buffers are added/removed from a module, this number shall be bumped, and the module’s _load_from_state_dict method can compare the version number and do appropriate changes if the state dict is from before the change.

eval() → torch.nn.modules.module.T

Sets the module in evaluation mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

This is equivalent with self.train(False).

Returns: self
Return type: Module

extra_repr() → str

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

float() → torch.nn.modules.module.T

Casts all floating point parameters and buffers to float datatype.

Returns: self
Return type: Module

half() → torch.nn.modules.module.T

Casts all floating point parameters and buffers to half datatype.

Returns: self
Return type: Module

load_state_dict(state_dict: OrderedDict[str, Tensor], strict: bool = True)

Copies parameters and buffers from state_dict into this module and its descendants. If strict is True, then the keys of state_dict must exactly match the keys returned by this module’s state_dict() function.

Parameters

state_dict (dict) – a dict containing parameters and persistent buffers.
strict (bool, optional) – whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict() function. Default: True

Returns

missing_keys is a list of str containing the missing keys
unexpected_keys is a list of str containing the unexpected keys

Return type

NamedTuple with missing_keys and unexpected_keys fields

modules() → Iterator[torch.nn.modules.module.Module]

Returns an iterator over all modules in the network.

Yields: Module – a module in the network

Note

Duplicate modules are returned only once. In the following example, l will be returned only once.

Example:

>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.modules()):
        print(idx, '->', m)

0 -> Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
1 -> Linear(in_features=2, out_features=2, bias=True)

named_buffers(prefix: str = '', recurse: bool = True) → Iterator[Tuple[str, torch.Tensor]]

Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

Parameters

prefix (str) – prefix to prepend to all buffer names.
recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.

Yields

(string, torch.Tensor) – Tuple containing the name and buffer

Example:

>>> for name, buf in self.named_buffers():
>>>    if name in ['running_var']:
>>>        print(buf.size())

named_children() → Iterator[Tuple[str, torch.nn.modules.module.Module]]

Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

Yields: (string, Module) – Tuple containing a name and child module

Example:

>>> for name, module in model.named_children():
>>>     if name in ['conv4', 'conv5']:
>>>         print(module)

named_modules(memo: Optional[Set[torch.nn.modules.module.Module]] = None, prefix: str = '')

Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

Yields: (string, Module) – Tuple of name and module

Note

Duplicate modules are returned only once. In the following example, l will be returned only once.

Example:

>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.named_modules()):
        print(idx, '->', m)

0 -> ('', Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
))
1 -> ('0', Linear(in_features=2, out_features=2, bias=True))

named_parameters(prefix: str = '', recurse: bool = True) → Iterator[Tuple[str, torch.nn.parameter.Parameter]]

Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

Parameters

prefix (str) – prefix to prepend to all parameter names.
recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.

Yields

(string, Parameter) – Tuple containing the name and parameter

Example:

>>> for name, param in self.named_parameters():
>>>    if name in ['bias']:
>>>        print(param.size())

parameters(recurse: bool = True) → Iterator[torch.nn.parameter.Parameter]

Returns an iterator over module parameters.

This is typically passed to an optimizer.

Parameters: recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.
Yields: Parameter – module parameter

Example:

>>> for param in model.parameters():
>>>     print(type(param), param.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)

register_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) → torch.utils.hooks.RemovableHandle

Registers a backward hook on the module.

This function is deprecated in favor of nn.Module.register_full_backward_hook() and the behavior of this function will change in future versions.

Returns: a handle that can be used to remove the added hook by calling handle.remove()
Return type: torch.utils.hooks.RemovableHandle

register_buffer(name: str, tensor: Optional[torch.Tensor], persistent: bool = True) → None

Adds a buffer to the module.

This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s running_mean is not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by setting persistent to False. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’s state_dict.

Buffers can be accessed as attributes using given names.

Parameters

name (string) – name of the buffer. The buffer can be accessed from this module using the given name
tensor (Tensor) – buffer to be registered.
persistent (bool) – whether the buffer is part of this module’s state_dict.

Example:

>>> self.register_buffer('running_mean', torch.zeros(num_features))

register_forward_hook(hook: Callable[[...], None]) → torch.utils.hooks.RemovableHandle

Registers a forward hook on the module.

The hook will be called every time after forward() has computed an output. It should have the following signature:

hook(module, input, output) -> None or modified output

The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the forward. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called after forward() is called.

Returns: a handle that can be used to remove the added hook by calling handle.remove()
Return type: torch.utils.hooks.RemovableHandle

register_forward_pre_hook(hook: Callable[[...], None]) → torch.utils.hooks.RemovableHandle

Registers a forward pre-hook on the module.

The hook will be called every time before forward() is invoked. It should have the following signature:

hook(module, input) -> None or modified input

The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the forward. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple).

Returns: a handle that can be used to remove the added hook by calling handle.remove()
Return type: torch.utils.hooks.RemovableHandle

register_full_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) → torch.utils.hooks.RemovableHandle

Registers a backward hook on the module.

The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature:

hook(module, grad_input, grad_output) -> tuple(Tensor) or None

The grad_input and grad_output are tuples that contain the gradients with respect to the inputs and outputs respectively. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place of grad_input in subsequent computations. grad_input will only correspond to the inputs given as positional arguments and all kwarg arguments are ignored. Entries in grad_input and grad_output will be None for all non-Tensor arguments.

Warning

Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error.

Returns: a handle that can be used to remove the added hook by calling handle.remove()
Return type: torch.utils.hooks.RemovableHandle

register_parameter(name: str, param: Optional[torch.nn.parameter.Parameter]) → None

Adds a parameter to the module.

The parameter can be accessed as an attribute using given name.

Parameters

name (string) – name of the parameter. The parameter can be accessed from this module using the given name
param (Parameter) – parameter to be added to the module.

requires_grad_(requires_grad: bool = True) → torch.nn.modules.module.T

Change if autograd should record operations on parameters in this module.

This method sets the parameters’ requires_grad attributes in-place.

This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training).

Parameters: requires_grad (bool) – whether autograd should record operations on parameters in this module. Default: True.
Returns: self
Return type: Module

share_memory() → torch.nn.modules.module.T

state_dict(destination=None, prefix='', keep_vars=False)

Returns a dictionary containing a whole state of the module.

Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names.

Returns: a dictionary containing a whole state of the module
Return type: dict

Example:

>>> module.state_dict().keys()
['bias', 'weight']

to(*args, **kwargs)

Moves and/or casts the parameters and buffers.

This can be called as

to(device=None, dtype=None, non_blocking=False)

to(dtype, non_blocking=False)

to(tensor, non_blocking=False)

to(memory_format=torch.channels_last)

Its signature is similar to torch.Tensor.to(), but only accepts floating point or complex dtype`s. In addition, this method will only cast the floating point or complex parameters and buffers to :attr:`dtype (if given). The integral parameters and buffers will be moved device, if that is given, but with dtypes unchanged. When non_blocking is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.

See below for examples.

Note

This method modifies the module in-place.

Parameters

device (torch.device) – the desired device of the parameters and buffers in this module
dtype (torch.dtype) – the desired floating point or complex dtype of the parameters and buffers in this module
tensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module
memory_format (torch.memory_format) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)

Returns

self

Return type

Module

Examples:

>>> linear = nn.Linear(2, 2)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]])
>>> linear.to(torch.double)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]], dtype=torch.float64)
>>> gpu1 = torch.device("cuda:1")
>>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
>>> cpu = torch.device("cpu")
>>> linear.to(cpu)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16)

>>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble)
>>> linear.weight
Parameter containing:
tensor([[ 0.3741+0.j,  0.2382+0.j],
        [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128)
>>> linear(torch.ones(3, 2, dtype=torch.cdouble))
tensor([[0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)

train(mode: bool = True) → torch.nn.modules.module.T

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters: mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.
Returns: self
Return type: Module

type(dst_type: Union[torch.dtype, str]) → torch.nn.modules.module.T

Casts all parameters and buffers to dst_type.

Parameters: dst_type (type or string) – the desired type
Returns: self
Return type: Module

xpu(device: Optional[Union[int, torch.device]] = None) → torch.nn.modules.module.T

Moves all model parameters and buffers to the XPU.

This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on XPU while being optimized.

Parameters: device (int, optional) – if specified, all parameters will be copied to that device
Returns: self
Return type: Module

zero_grad(set_to_none: bool = False) → None

Sets gradients of all model parameters to zero. See similar function under torch.optim.Optimizer for more context.

Parameters: set_to_none (bool) – instead of setting to zero, set the grads to None. See torch.optim.Optimizer.zero_grad() for details.

training: bool

class hyperion.torch.metrics.accuracy.BinaryAccuracyWithLogits(weight=None, reduction='mean', thr=0.0)[source]

__init__(weight=None, reduction='mean', thr=0.0)[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(input, target)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

T_destination: alias of TypeVar(‘T_destination’, bound=Mapping[str, torch.Tensor])

_get_backward_hooks(): Returns the backward hooks for use in the call function. It returns two lists, one with the full backward hooks and one with the non-full backward hooks.

_load_from_state_dict(state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs)

Copies parameters and buffers from state_dict into only this module, but not its descendants. This is called on every submodule in load_state_dict(). Metadata saved for this module in input state_dict is provided as local_metadata. For state dicts without metadata, local_metadata is empty. Subclasses can achieve class-specific backward compatible loading using the version number at local_metadata.get(“version”, None).

Note

state_dict is not the same object as the input state_dict to load_state_dict(). So it can be modified.

Parameters

state_dict (dict) – a dict containing parameters and persistent buffers.
prefix (str) – the prefix for parameters and buffers used in this module
local_metadata (dict) – a dict containing the metadata for this module. See
strict (bool) – whether to strictly enforce that the keys in state_dict with prefix match the names of parameters and buffers in this module
missing_keys (list of str) – if strict=True, add missing keys to this list
unexpected_keys (list of str) – if strict=True, add unexpected keys to this list
error_msgs (list of str) – error messages should be added to this list, and will be reported together in load_state_dict()

_named_members(get_members_fn, prefix='', recurse=True): Helper method for yielding various names + members of modules.

_register_load_state_dict_pre_hook(hook): These hooks will be called with arguments: state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs, before loading state_dict into self. These arguments are exactly the same as those of _load_from_state_dict.

_register_state_dict_hook(hook): These hooks will be called with arguments: self, state_dict, prefix, local_metadata, after the state_dict of self is set. Note that only parameters and buffers of self or its children are guaranteed to exist in state_dict. The hooks may modify state_dict inplace or return a new one.

_save_to_state_dict(destination, prefix, keep_vars)

Saves module state to destination dictionary, containing a state of the module, but not its descendants. This is called on every submodule in state_dict().

In rare cases, subclasses can achieve class-specific behavior by overriding this method with custom logic.

Parameters

destination (dict) – a dict where state will be stored
prefix (str) – the prefix for parameters and buffers used in this module

add_module(name: str, module: Optional[torch.nn.modules.module.Module]) → None

Adds a child module to the current module.

The module can be accessed as an attribute using the given name.

Parameters

name (string) – name of the child module. The child module can be accessed from this module using the given name
module (Module) – child module to be added to the module.

apply(fn: Callable[[torch.nn.modules.module.Module], None]) → torch.nn.modules.module.T

Applies fn recursively to every submodule (as returned by .children()) as well as self. Typical use includes initializing the parameters of a model (see also nn-init-doc).

Parameters: fn (Module -> None) – function to be applied to each submodule
Returns: self
Return type: Module

Example:

>>> @torch.no_grad()
>>> def init_weights(m):
>>>     print(m)
>>>     if type(m) == nn.Linear:
>>>         m.weight.fill_(1.0)
>>>         print(m.weight)
>>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
>>> net.apply(init_weights)
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[ 1.,  1.],
        [ 1.,  1.]])
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[ 1.,  1.],
        [ 1.,  1.]])
Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)

bfloat16() → torch.nn.modules.module.T

Casts all floating point parameters and buffers to bfloat16 datatype.

Returns: self
Return type: Module

buffers(recurse: bool = True) → Iterator[torch.Tensor]

Returns an iterator over module buffers.

Parameters: recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.
Yields: torch.Tensor – module buffer

Example:

>>> for buf in model.buffers():
>>>     print(type(buf), buf.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)

children() → Iterator[torch.nn.modules.module.Module]

Returns an iterator over immediate children modules.

Yields: Module – a child module

cpu() → torch.nn.modules.module.T

Moves all model parameters and buffers to the CPU.

Returns: self
Return type: Module

cuda(device: Optional[Union[int, torch.device]] = None) → torch.nn.modules.module.T

Moves all model parameters and buffers to the GPU.

This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.

Parameters: device (int, optional) – if specified, all parameters will be copied to that device
Returns: self
Return type: Module

double() → torch.nn.modules.module.T

Casts all floating point parameters and buffers to double datatype.

Returns: self
Return type: Module

dump_patches: bool = False

This allows better BC support for load_state_dict(). In state_dict(), the version number will be saved as in the attribute _metadata of the returned state dict, and thus pickled. _metadata is a dictionary with keys that follow the naming convention of state dict. See _load_from_state_dict on how to use this information in loading.

If new parameters/buffers are added/removed from a module, this number shall be bumped, and the module’s _load_from_state_dict method can compare the version number and do appropriate changes if the state dict is from before the change.

eval() → torch.nn.modules.module.T

Sets the module in evaluation mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

This is equivalent with self.train(False).

Returns: self
Return type: Module

extra_repr() → str

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

float() → torch.nn.modules.module.T

Casts all floating point parameters and buffers to float datatype.

Returns: self
Return type: Module

half() → torch.nn.modules.module.T

Casts all floating point parameters and buffers to half datatype.

Returns: self
Return type: Module

load_state_dict(state_dict: OrderedDict[str, Tensor], strict: bool = True)

Copies parameters and buffers from state_dict into this module and its descendants. If strict is True, then the keys of state_dict must exactly match the keys returned by this module’s state_dict() function.

Parameters

state_dict (dict) – a dict containing parameters and persistent buffers.
strict (bool, optional) – whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict() function. Default: True

Returns

missing_keys is a list of str containing the missing keys
unexpected_keys is a list of str containing the unexpected keys

Return type

NamedTuple with missing_keys and unexpected_keys fields

modules() → Iterator[torch.nn.modules.module.Module]

Returns an iterator over all modules in the network.

Yields: Module – a module in the network

Note

Duplicate modules are returned only once. In the following example, l will be returned only once.

Example:

>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.modules()):
        print(idx, '->', m)

0 -> Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
1 -> Linear(in_features=2, out_features=2, bias=True)

named_buffers(prefix: str = '', recurse: bool = True) → Iterator[Tuple[str, torch.Tensor]]

Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

Parameters

prefix (str) – prefix to prepend to all buffer names.
recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.

Yields

(string, torch.Tensor) – Tuple containing the name and buffer

Example:

>>> for name, buf in self.named_buffers():
>>>    if name in ['running_var']:
>>>        print(buf.size())

named_children() → Iterator[Tuple[str, torch.nn.modules.module.Module]]

Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

Yields: (string, Module) – Tuple containing a name and child module

Example:

>>> for name, module in model.named_children():
>>>     if name in ['conv4', 'conv5']:
>>>         print(module)

named_modules(memo: Optional[Set[torch.nn.modules.module.Module]] = None, prefix: str = '')

Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

Yields: (string, Module) – Tuple of name and module

Note

Duplicate modules are returned only once. In the following example, l will be returned only once.

Example:

>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.named_modules()):
        print(idx, '->', m)

0 -> ('', Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
))
1 -> ('0', Linear(in_features=2, out_features=2, bias=True))

named_parameters(prefix: str = '', recurse: bool = True) → Iterator[Tuple[str, torch.nn.parameter.Parameter]]

Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

Parameters

prefix (str) – prefix to prepend to all parameter names.
recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.

Yields

(string, Parameter) – Tuple containing the name and parameter

Example:

>>> for name, param in self.named_parameters():
>>>    if name in ['bias']:
>>>        print(param.size())

parameters(recurse: bool = True) → Iterator[torch.nn.parameter.Parameter]

Returns an iterator over module parameters.

This is typically passed to an optimizer.

Parameters: recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.
Yields: Parameter – module parameter

Example:

>>> for param in model.parameters():
>>>     print(type(param), param.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)

register_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) → torch.utils.hooks.RemovableHandle

Registers a backward hook on the module.

This function is deprecated in favor of nn.Module.register_full_backward_hook() and the behavior of this function will change in future versions.

Returns: a handle that can be used to remove the added hook by calling handle.remove()
Return type: torch.utils.hooks.RemovableHandle

register_buffer(name: str, tensor: Optional[torch.Tensor], persistent: bool = True) → None

Adds a buffer to the module.

This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s running_mean is not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by setting persistent to False. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’s state_dict.

Buffers can be accessed as attributes using given names.

Parameters

name (string) – name of the buffer. The buffer can be accessed from this module using the given name
tensor (Tensor) – buffer to be registered.
persistent (bool) – whether the buffer is part of this module’s state_dict.

Example:

>>> self.register_buffer('running_mean', torch.zeros(num_features))

register_forward_hook(hook: Callable[[...], None]) → torch.utils.hooks.RemovableHandle

Registers a forward hook on the module.

The hook will be called every time after forward() has computed an output. It should have the following signature:

hook(module, input, output) -> None or modified output

The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the forward. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called after forward() is called.

Returns: a handle that can be used to remove the added hook by calling handle.remove()
Return type: torch.utils.hooks.RemovableHandle

register_forward_pre_hook(hook: Callable[[...], None]) → torch.utils.hooks.RemovableHandle

Registers a forward pre-hook on the module.

The hook will be called every time before forward() is invoked. It should have the following signature:

hook(module, input) -> None or modified input

The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the forward. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple).

Returns: a handle that can be used to remove the added hook by calling handle.remove()
Return type: torch.utils.hooks.RemovableHandle

register_full_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) → torch.utils.hooks.RemovableHandle

Registers a backward hook on the module.

The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature:

hook(module, grad_input, grad_output) -> tuple(Tensor) or None

The grad_input and grad_output are tuples that contain the gradients with respect to the inputs and outputs respectively. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place of grad_input in subsequent computations. grad_input will only correspond to the inputs given as positional arguments and all kwarg arguments are ignored. Entries in grad_input and grad_output will be None for all non-Tensor arguments.

Warning

Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error.

Returns: a handle that can be used to remove the added hook by calling handle.remove()
Return type: torch.utils.hooks.RemovableHandle

register_parameter(name: str, param: Optional[torch.nn.parameter.Parameter]) → None

Adds a parameter to the module.

The parameter can be accessed as an attribute using given name.

Parameters

name (string) – name of the parameter. The parameter can be accessed from this module using the given name
param (Parameter) – parameter to be added to the module.

requires_grad_(requires_grad: bool = True) → torch.nn.modules.module.T

Change if autograd should record operations on parameters in this module.

This method sets the parameters’ requires_grad attributes in-place.

This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training).

Parameters: requires_grad (bool) – whether autograd should record operations on parameters in this module. Default: True.
Returns: self
Return type: Module

share_memory() → torch.nn.modules.module.T

state_dict(destination=None, prefix='', keep_vars=False)

Returns a dictionary containing a whole state of the module.

Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names.

Returns: a dictionary containing a whole state of the module
Return type: dict

Example:

>>> module.state_dict().keys()
['bias', 'weight']

to(*args, **kwargs)

Moves and/or casts the parameters and buffers.

This can be called as

to(device=None, dtype=None, non_blocking=False)

to(dtype, non_blocking=False)

to(tensor, non_blocking=False)

to(memory_format=torch.channels_last)

Its signature is similar to torch.Tensor.to(), but only accepts floating point or complex dtype`s. In addition, this method will only cast the floating point or complex parameters and buffers to :attr:`dtype (if given). The integral parameters and buffers will be moved device, if that is given, but with dtypes unchanged. When non_blocking is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.

See below for examples.

Note

This method modifies the module in-place.

Parameters

device (torch.device) – the desired device of the parameters and buffers in this module
dtype (torch.dtype) – the desired floating point or complex dtype of the parameters and buffers in this module
tensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module
memory_format (torch.memory_format) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)

Returns

self

Return type

Module

Examples:

>>> linear = nn.Linear(2, 2)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]])
>>> linear.to(torch.double)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]], dtype=torch.float64)
>>> gpu1 = torch.device("cuda:1")
>>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
>>> cpu = torch.device("cpu")
>>> linear.to(cpu)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16)

>>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble)
>>> linear.weight
Parameter containing:
tensor([[ 0.3741+0.j,  0.2382+0.j],
        [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128)
>>> linear(torch.ones(3, 2, dtype=torch.cdouble))
tensor([[0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)

train(mode: bool = True) → torch.nn.modules.module.T

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters: mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.
Returns: self
Return type: Module

type(dst_type: Union[torch.dtype, str]) → torch.nn.modules.module.T

Casts all parameters and buffers to dst_type.

Parameters: dst_type (type or string) – the desired type
Returns: self
Return type: Module

xpu(device: Optional[Union[int, torch.device]] = None) → torch.nn.modules.module.T

Moves all model parameters and buffers to the XPU.

This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on XPU while being optimized.

Parameters: device (int, optional) – if specified, all parameters will be copied to that device
Returns: self
Return type: Module

zero_grad(set_to_none: bool = False) → None

Sets gradients of all model parameters to zero. See similar function under torch.optim.Optimizer for more context.

Parameters: set_to_none (bool) – instead of setting to zero, set the grads to None. See torch.optim.Optimizer.zero_grad() for details.

training: bool

Metric Functions

hyperion.torch.metrics.accuracy_functional.categorical_accuracy(input, target, weight=None, reduction='mean')[source]

hyperion.torch.metrics.accuracy_functional.binary_accuracy(input, target, weight=None, reduction='mean', thr=0.5)[source]

hyperion.torch.metrics.accuracy_functional.binary_accuracy_with_logits(input, target, weight=None, reduction='mean', thr=0)[source]

Loggers

The logger classes are used to write information to standard output, log files, tensorboard or WandB. The LoggerList class contains a set of loggers. When we log something to the LoggerList, the same is written in all the loggers contained in it. The loggers support multi-gpu training with DistributedDataParallel

Individual Loggers

class hyperion.torch.loggers.logger.Logger[source]

Base class for logger objects

params: training params dictionary

__init__()[source]

on_epoch_begin(epoch, logs, **kwargs)[source]

At the start of an epoch

Parameters

epoch – index of the epoch
logs – dictionary of logs

on_epoch_end(logs, **kwargs)[source]

At the end of an epoch

Parameters: logs – dictionary of logs

on_batch_begin(batch, logs, **kwargs)[source]

At the start of a batch

Parameters

batch – batch index within the epoch
logs – dictionary of logs

on_batch_end(logs, **kwargs)[source]

At the end of a batch

Parameters

batch – batch index within the epoch
logs – dictionary of logs

on_train_begin(logs, **kwargs)[source]

At the start of training

Parameters: logs – dictionary of logs

on_train_end(logs, **kwargs)[source]

At the end of training

Parameters

batch – batch index within the epoch
logs – dictionary of logs

class hyperion.torch.loggers.prog_logger.ProgLogger(metrics=None, interval=10)[source]

Logger that prints training progress to stdout

metrics: list of metrics

interval: number of batches between prints

__init__(metrics=None, interval=10)[source]

on_train_begin(logs=None, **kwargs)[source]

At the start of training

Parameters: logs – dictionary of logs

on_epoch_begin(epoch, logs=None, **kwargs)[source]

At the start of an epoch

Parameters

epoch – index of the epoch
logs – dictionary of logs

on_batch_begin(batch, logs=None, **kwargs)[source]

At the start of a batch

Parameters

batch – batch index within the epoch
logs – dictionary of logs

on_batch_end(logs=None, **kwargs)[source]

At the end of a batch

Parameters

batch – batch index within the epoch
logs – dictionary of logs

on_epoch_end(logs=None, **kwargs)[source]

At the end of an epoch

Parameters: logs – dictionary of logs

estimate_epoch_time()[source]

static sec2str(t)[source]

on_train_end(logs, **kwargs)

At the end of training

Parameters

batch – batch index within the epoch
logs – dictionary of logs

class hyperion.torch.loggers.csv_logger.CSVLogger(file_path, sep=',', append=False)[source]

Logger that prints metrics to csv file: at the end of each epoch

file_path: filenane of csv file.

sep: column separator for csv file

append: False, overwrite existing file, True, appends.

__init__(file_path, sep=',', append=False)[source]

on_train_begin(logs=None, **kwargs)[source]

At the start of training

Parameters: logs – dictionary of logs

on_epoch_end(logs=None, **kwargs)[source]

At the end of an epoch

Parameters: logs – dictionary of logs

on_train_end(logs=None, **kwargs)[source]

At the end of training

Parameters

batch – batch index within the epoch
logs – dictionary of logs

on_batch_begin(batch, logs, **kwargs)

At the start of a batch

Parameters

batch – batch index within the epoch
logs – dictionary of logs

on_batch_end(logs, **kwargs)

At the end of a batch

Parameters

batch – batch index within the epoch
logs – dictionary of logs

on_epoch_begin(epoch, logs, **kwargs)

At the start of an epoch

Parameters

epoch – index of the epoch
logs – dictionary of logs

class hyperion.torch.loggers.tensorboard_logger.TensorBoardLogger(tb_path, interval=10)[source]

Logger that sends training progress to tensorboard

tb_path: tensorboard output directory

__init__(tb_path, interval=10)[source]

on_train_begin(logs=None, **kwargs)[source]

At the start of training

Parameters: logs – dictionary of logs

on_epoch_begin(epoch, logs=None, **kwargs)[source]

At the start of an epoch

Parameters

epoch – index of the epoch
logs – dictionary of logs

on_batch_end(logs=None, **kwargs)[source]

At the end of a batch

Parameters

batch – batch index within the epoch
logs – dictionary of logs

on_epoch_end(logs=None, **kwargs)[source]

At the end of an epoch

Parameters: logs – dictionary of logs

on_train_end(logs=None, **kwargs)[source]

At the end of training

Parameters

batch – batch index within the epoch
logs – dictionary of logs

on_batch_begin(batch, logs, **kwargs)

At the start of a batch

Parameters

batch – batch index within the epoch
logs – dictionary of logs

class hyperion.torch.loggers.wandb_logger.WAndBLogger(project=None, group=None, name=None, path=None, mode='online', interval=10)[source]

Logger that sends training progress to weights and biases (wandb)

tb_path: tensorboard output directory

__init__(project=None, group=None, name=None, path=None, mode='online', interval=10)[source]

on_train_begin(logs=None, **kwargs)[source]

At the start of training

Parameters: logs – dictionary of logs

on_epoch_begin(epoch, logs=None, **kwargs)[source]

At the start of an epoch

Parameters

epoch – index of the epoch
logs – dictionary of logs

on_batch_end(logs=None, **kwargs)[source]

At the end of a batch

Parameters

batch – batch index within the epoch
logs – dictionary of logs

on_epoch_end(logs=None, **kwargs)[source]

At the end of an epoch

Parameters: logs – dictionary of logs

on_train_end(logs=None, **kwargs)[source]

At the end of training

Parameters

batch – batch index within the epoch
logs – dictionary of logs

on_batch_begin(batch, logs, **kwargs)

At the start of a batch

Parameters

batch – batch index within the epoch
logs – dictionary of logs

Logger List

class hyperion.torch.loggers.logger_list.LoggerList(loggers=None)[source]

Container for a list of logger callbacks

loggers: list of Logger objects

__init__(loggers=None)[source]

append(logger)[source]

property tensorboard_logger

property tensorboard_writer

on_epoch_begin(epoch, logs=None, **kwargs)[source]

At the start of an epoch

Parameters

epoch – index of the epoch
logs – dictionary of logs

on_epoch_end(logs=None, **kwargs)[source]

At the end of an epoch

Parameters

epoch – index of the epoch
logs – dictionary of logs

on_batch_begin(batch, logs=None, **kwargs)[source]

At the start of a batch

Parameters

batch – batch index within the epoch
logs – dictionary of logs

on_batch_end(logs=None, **kwargs)[source]

At the end of a batch

Parameters

batch – batch index within the epoch
logs – dictionary of logs

on_train_begin(logs=None, **kwargs)[source]

At the start of training

Parameters: logs – dictionary of logs

on_train_end(logs=None, **kwargs)[source]

At the end of training

Parameters

batch – batch index within the epoch
logs – dictionary of logs

Utils

Device Handling Utils

Utilities to handle GPU devices, like finding a free GPU in a shared server.

hyperion.torch.utils.devices.open_device(num_gpus=1, gpu_ids=None, find_free_gpu=False)[source]

hyperion.torch.utils.devices.find_free_gpus(num_gpus)[source]

Distributed Data Parallel Utils

These contains utils to perform multigpu training with Distributed Data Paralell.

hyperion.torch.utils.ddp.add_ddp_args(parser)[source]

hyperion.torch.utils.ddp.filter_ddp_args(**kwargs)[source]

hyperion.torch.utils.ddp.ddp_init(gpu_id, num_gpus, node_id=0, num_nodes=1, master_addr='localhost', master_port=None)[source]

hyperion.torch.utils.ddp.ddp_cleanup()[source]

class hyperion.torch.utils.ddp.TorchDDP(*args: Any, **kwargs: Any)[source]

__init__(*args: Any, **kwargs: Any) → None

class hyperion.torch.utils.ddp.FairShardedDDP(*args: Any, **kwargs: Any)[source]

__init__(module: torch.nn.Module, sharded_optimizer: Union[fairscale.optim.oss.OSS, List[fairscale.optim.oss.OSS]], process_group: Optional[Any] = None, broadcast_buffers: bool = True, sync_models_at_startup: bool = True, reduce_buffer_size: int = 8388608, auto_refresh_trainable: bool = True, reduce_fp16: bool = False)

_clear_counters() → None: Reset all the grad reduce and call counters

_consume_work_handles() → None: Consume all the futures which are tied to this optimizer’s buckets. We start from the first/older ones, since they are the most likely to be ready and non-blocking

_get_reduce_fn(index: int, param: torch.Tensor, dst_rank: int) → Callable

Two possible backward hooks for a given parameter: either directly reduce to the appropriate rank, or contribute to a bucket and reduce when the bucket is full.

Either way a delayed action is necessary and is passed as a callback.

_passing_sync_batchnorm_handle(module: torch.nn.Module) → None: Passes handle required for torch.nn.modules.SyncBatchNorm. Adapted from torch.nn.distributed.DistributedDataParallel.

_setup_backward_hooks() → None: Attach a reduce function to each grad-requiring parameter. This makes the gradient reduction automatic whenever there’s a backward pass

_setup_bucket_strategy() → None: Devise a bucketing strategy on a per-rank ownership level. These buckets will not be sharded, since the gradients would be re-allocated during the backward in that case. This method can be a slow for big models, but it it not typically called often (not for every forward for instance)

_sync_params_and_buffers() → None: Sync the complete model states in between the ranks

_try_consume_work_handle() → None: Try to consume the oldest future. This is non blocking, if not ready we’ll pass

forward(*inputs: Any, **kwargs: Any) → Any: Module forward pass, handles any DDP-specific work in the background. Primes the backward pass for gradient reduction to the proper ranks.

no_sync() → Generator: A context manager to disable gradient synchronization.

reduce() → None: This does not need to be called, the gradient reduction is done automatically during the BW pass. Use this method to reduce the gradients manually

refresh_trainable() → None: If the module trainability has changed, update all the assumptions

sync_buffers(blocking: bool = False) → None

Sync all the param buffers in between ranks (including for instance batch norm statistics).

Parameters: blocking (bool) – wait for the operation to conclude.

to(device: Optional[torch.device], dtype: Optional[torch.dtype] = None, non_blocking: bool = False) → fairscale.nn.data_parallel.sharded_ddp.ShardedDataParallel

Moves and/or casts the parameters and buffers.

Its signature is similar to torch.Tensor.to(), but only accepts floating point desired dtype s. In addition, this method will only cast the floating point parameters and buffers to dtype (if given). The integral parameters and buffers will be moved device, if that is given, but with dtypes unchanged. When non_blocking is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.

Note

This method modifies the module in-place.

Parameters

device (torch.device) – the desired device of the parameters and buffers in this module.
dtype (torch.dtype) – the desired floating point type of the floating point parameters and buffers.
non_blocking (bool) – make it an asynchronous call.

Returns

self.

Return type

Module

zero_grad(set_to_none: bool = False) → None

Sets gradients of all model parameters to zero. See similar function under torch.optim.Optimizer for more context.

Parameters: set_to_none (bool) – instead of setting to zero, set the grads to None. See torch.optim.Optimizer.zero_grad() for details.

class hyperion.torch.utils.ddp.FairFullyShardedDDP(*args: Any, **kwargs: Any)[source]

__getstate__() → Dict[str, str]

Serialize the state of the current FullyShardedDataParallel instance.

Some properties are not serializable (e.g., process groups, streams), so we remove them and try to reconstruct them in __setstate__().

__init__(module: torch.nn.Module, process_group: Optional[torch.distributed.ProcessGroup] = None, reshard_after_forward: bool = True, mixed_precision: bool = False, fp32_reduce_scatter: bool = False, flatten_parameters: bool = True, move_params_to_cpu: bool = False, compute_dtype: Optional[torch.dtype] = None, buffer_dtype: Optional[torch.dtype] = None, move_grads_to_cpu: Optional[bool] = None, bucket_cap_mb: int = 25, compute_device: Optional[torch.device] = None, no_broadcast_optim_state: Optional[bool] = False, state_dict_device: Optional[torch.device] = None, clear_autocast_cache: bool = False, force_input_to_fp32: bool = False, verbose: bool = False, cpu_offload: bool = False)

__setstate__(state: Dict[str, Any]) → None: Intercept state setting and perform needed changes on params.

_broadcast_pad_info_to_r0() → List[List[List[int]]]: Collect [x.numel_padded_per_param for x in self._fsdp_instances] from teach rank.

_cast_buffers(device: Optional[torch.device] = None, dtype: Optional[torch.dtype] = None, memo: Optional[Set] = None) → None

Move all buffers to the given device and dtype.

If device or dtype are not given, then they will default to self.compute_device and self.buffer_dtype, respectively. In the case of nested FSDP instances, we will respect the child instance’s compute_device and buffer_dtype configuration.

Parameters

device (torch.device, Optional) – device to cast buffers to (defaults to compute_device)
dtype (torch.dtype, Optional) – dtype to cast buffers to (defaults to buffer_dtype)
memo (Set, Optional) – set of modules that have already been processed

_cast_fp32_param_shards_to_fp16(params: Optional[List[torch.nn.Parameter]] = None) → None: Cast FP32 param shard to FP16 for a list of params.

_free_fp16_param_shard(params: Optional[List[torch.nn.Parameter]] = None) → None: Free storage for FP16 shards for a list of params.

_free_full_params(params: Optional[List[torch.nn.Parameter]] = None) → None: Free up storage for full parameters.

_gather_optim_state(sd_state: Dict[int, Dict[str, Any]]) → Tuple[Dict[int, Dict[str, List]], Dict[int, Dict[str, List]]]: For each value in state[i], if the value is a tensor, collect it from the world. Else use rank 0’s entry.

_get_shard(tensor: torch.Tensor) → Tuple[torch.Tensor, int]: Return the local shard of a full tensor.

_init_param_attributes(p: torch.nn.Parameter) → None

We manage several attributes on each Parameter instance. The first two are set by _shard_parameters_():

_is_sharded: True if the Parameter is sharded or False
if the Parameter is intentionally not sharded (in which case we will all-reduce grads for this param).

_orig_size: the size of the original Parameter (before sharding)

The remaining attributes are set here:

_fp32_shard: a single shard of the parameters in full precision: (typically FP32, but this is dependent on the dtype of the model as it’s passed in by the user). This can be on CPU or GPU depending on the value of ``cpu_offload``.
_fp16_shard: if ``mixed_precision`` is True, this will be: a single shard of the parameters in FP16, used for all-gather.
_full_param_padded: the full weight (padded to be evenly: divisible by world_size), used for computation in the forward and backward pass. This will be resized in place and only materialized (via all-gather) as needed.

_lazy_init() → None: Initialization steps that should happen lazily, typically right before the first forward pass.

_load_state_dict(state_dict: Union[Dict[str, torch.Tensor], OrderedDict[str, torch.Tensor]], strict: bool = True) → NamedTuple: Load a whole (unsharded) state_dict.

Warning

This needs to be called on all ranks, since synchronization primitives will be used.

_post_backward_hook(param: torch.nn.Parameter, *unused: Any) → None

At the start of _post_backward_hook(), param.grad contains the full gradient for the local batch. The reduce-scatter op will replace param.grad with a single shard of the summed gradient across all GPUs. This shard will align with the current GPU rank. For example:

before reduce_scatter:
    param.grad (GPU #0): [1, 2, 3, 4]
    param.grad (GPU #1): [5, 6, 7, 8]

after reduce_scatter:
    param.grad (GPU #0): [6, 8]    # 1+5, 2+6
    param.grad (GPU #1): [10, 12]  # 3+7, 4+8

The local GPU’s optim.step is responsible for updating a single shard of params, also corresponding to the current GPU’s rank. This alignment is created by _shard_parameters_(), which ensures that the local optimizer only sees the relevant parameter shard.

_post_reduction_hook(param: torch.nn.Parameter, reduced_grad: torch.Tensor) → None: Hook to call on each param after the reduce-scatter.

_prep_grads_for_backward() → None: Make sure p.grad has the correct size/device, otherwise set it to None.

_print_r0(msg: str, restart: bool = False) → None: Debugging utility to print memory usage stats nicely on rank 0

_queue_wait_for_post_backward() → None

Try to queue a wait_for_post_backward callback.

Only called on root and only queue one callback. But can be called by children FSDPs via a closure in case the root instance doesn’t own any params.

_rebuild_full_params(force_full_precision: bool = False) → Optional[List[Tuple[torch.Tensor, bool]]]

Gather all shards of params.

Parameters: force_full_precision (bool, Optional) – by default params will be gathered in compute_dtype (e.g., FP16), unless force_full_precision is True, in which case they will be gathered in full precision (e.g., FP32), possibly in fresh storage. The parameter that’s being rebuilt will end up in full precision as well.
Returns: A list of tuples, where the first element is the full-sized param and the second element is a bool indicating if it’s safe for the caller to free the full-sized param. This will be None if force_full_precision=False and the full params are already gathered.

_register_post_backward_hooks() → None

Register backward hooks to reshard params and reduce-scatter grads.

This is called during forward pass. The goal is to attach a hook on each of the parameter’s gradient generating function (grad_acc below) so that the hook is called after all gradients for that param are computed.

Goals:

1. We want the hook to fire once and only once after all gradients are accumulated for a param. 2. If it fires more than once, we end up incorrectly shard the grad multiple times. (could lead to dimension too small) 3. If it fires once but too early or doesn’t fire, we leave gradients unsharded. (could lead to dimension too large)

Due to multiple-pass forward, this function can be called on the same parameter multiple times in a single forward pass. If we register the hook multiple time, we end up getting called multiple times. We could try to get a new hook every time and delete the previous one registered. However, due to unknown reason (I have debugged it for a long time!), in mixed precision mode, we get two different grad_acc objects below during different calls of this function (in the same forward pass). If we keep the last one, the hook end up firing too early. In full precision mode, we luckily get the same grad_acc object, so deleting and re-registering still ensured the hook fire once after all gradients are generated.

Empirically, keep the first hook register per forward pass seems to work the best. We do need to remove the hook at the end of the backward pass. Otherwise, the next forward pass will not register a new hook, which is needed for a new forward pass.

_register_pre_backward_hooks(outputs: Any) → Any

Register pre-backward hook to run before the wrapped module’s backward. Hooks should be attached to all outputs from the forward.

Returns: new outputs with hooks registered if they requires gradient.
Return type: outputs

_reset_lazy_init() → None: Reset instance so _lazy_init() will run on the next forward.

_set_is_root() → None: If True, implies that no other FullyShardedDataParallel instance wraps this one. Called once by _lazy_init(). Also sets self.children_share_process_group = True if all child instances share the same process group. If some child instances use a different process group, self.clip_grad_norm_ will raise an error.

_setup_streams() → None: Create streams to overlap data transfer and computation.

_shard_parameters_() → None

At initialization we wrap a module with full parameters and shard the parameters in-place. Sharding is implemented by viewing each parameter as a 1D Tensor and retaining only a single slice, where the slice size is determined by the number of data parallel workers.

Wrapping modules with many small parameters (or with a very large data parallel world size) will result in many small parameter shards and slow performance. In this case it’s better to set ``flatten_parameters`` to True, so that all of the small parameters in the module are combined into a single contiguous Tensor and sharded once.

After this initial sharding is complete, the user can initialize a torch.optim.Optimizer in the usual way, i.e.:

.. code-block:: python

optim = torch.optim.Adam(sharded_module.parameters(), lr=0.0001)

The optimizer will see only a single slice of parameters and will thus allocate less memory for optimizer state, avoiding redundancy across data parallel workers.

_use_fp32_param_shard(params: Optional[List[torch.nn.Parameter]] = None) → None: Use FP32 shard for a list of params.

_use_full_params() → None

Switch p.data pointers to use the full params.

Note: this assumes full params are already gathered.

_wait_for_post_backward() → None: Wait for post-backward to finish. Only called on root instance.

_wait_for_previous_optim_step() → None: The outer-most FullyShardedDataParallel instance (i.e., the root instance) needs to synchronize with the default stream to ensure the previous optimizer step is done.

apply(fn: Callable[[torch.nn.Module], None]) → fairscale.nn.data_parallel.fully_sharded_data_parallel.FullyShardedDataParallel

Applies fn recursively to every submodule (as returned by .children()) as well as self. Typical use includes initializing the parameters of a model.

Compared to torch.nn.Module.apply, this version additionally gathers the full parameters before applying fn. It should not be called from within another summon_full_params context.

Parameters: fn (nn.Module) – function to be applied to each submodule
Returns: self
Return type: Module

assert_state(state: Union[fairscale.nn.data_parallel.fully_sharded_data_parallel.TrainingState, List[fairscale.nn.data_parallel.fully_sharded_data_parallel.TrainingState]]) → None: Assert we are in the given state.

clip_grad_norm_(max_norm: Union[float, int], norm_type: Union[float, int] = 2.0) → torch.Tensor

Clip all gradients at this point in time. The norm is computed over all gradients together, as if they were concatenated into a single vector. Gradients are modified in-place.

Parameters

max_norm (float or int) – max norm of the gradients
norm_type (float or int) – type of the used p-norm. Can be 'inf' for infinity norm.

Returns

Total norm of the parameters (viewed as a single vector).

Note

This is analogous to torch.nn.utils.clip_grad_norm_ but handles the partitioning and multiple devices per rank under the hood. The default torch util is not applicable here, because each rank only has a partial view of all the grads in the model, so calling it in the OSS context would lead to different scaling being applied per subset of model parameters.

Warning

This needs to be called on all ranks, since synchronization primitives will be used.

static consolidate_shard_weights(shard_weights: List[Dict[str, torch.Tensor]], shard_metadata: List[Dict[str, Any]], with_module_buffers: bool = True) → Dict[str, torch.Tensor]

Given a list of weights and meta data associated to N shards, reconstruct the weights of an equivalent consolidated (non-sharded) model.

Module parameters are consolidated using the shard metadata.

Module buffers are taken from shard 0: this assumes that module buffers are either synchronized or that the shard 0 value is valid for all shards. If this behavior is not correct for your module (for instance if buffers needs to be reduced instead), you can disable it with with_module_buffers=False.

This method is used to re-assemble checkpoints of shards without having to instantiate FSDP wrappers with the world size originally used to save the shards.

property cpu_offload: bool

extra_repr() → str

forward(*args: Any, **kwargs: Any) → torch.Tensor

gather_full_optim_state_dict(optim: torch.optim.Optimizer, **ignored: Dict) → Optional[Dict[str, Any]]

Return the last known global optimizer state. The returned state is compatible with Pytorch, in that the sharded properties are not exposed. Multiple parameter groups are not yet supported.

This should be called only on the root FSDP instance. Nested FSDP instances are supported as long as they have the same world_size as the parent or world_size=1.

Parameters

optim (Optimizer) – an optimizer instance for this FSDP rank. Its state_dict is used in the consolidation. However, its state is not modified.

Returns

A dict with four entries (On rank zero, other workers return None)
- state - a dict holding gathered optimization state, 1 entry per unflat parameter
- param_groups - a dict containing the 1 parameter group
- param_id_map - global (unflat) to local (flat) id mapping
- uncollected_local_ids - keys in the state dict that were not broadcast

get_shard_from_optim_state_dict(full_optim_state_dict: Dict[str, Any]) → Dict[str, Any]

Get the portion of the optimizer state dict associated with the shard

This can be used to get the right sharded optimizer state to be loaded into the sharded optimizer for this FSDP rank.

Parameters: full_optim_state_dict (dict) – consolidated optimizer state returned by gather_full_optim_state, or loaded from a checkpoint.
Returns: a shard of the optimizer state.
Return type: (dict)

load_local_state_dict(state_dict: Union[Dict[str, torch.Tensor], OrderedDict[str, torch.Tensor]], strict: bool = True) → NamedTuple: Load a local (sharded) state_dict.

load_state_dict(state_dict: Union[Dict[str, torch.Tensor], OrderedDict[str, torch.Tensor]], strict: bool = True) → NamedTuple

local_metadata_dict() → Dict[str, Any]: Get the information needed to reconstruct the model from shards offline.

local_state_dict(*args: Any, **kwargs: Any) → Any: Returns the local (sharded) state of the module. Parameters are sharded, so the resulting state_dict can only be loaded after the Module has been wrapped with FullyShardedDataParallel.

property module: torch.nn.Module

no_sync() → Generator: A context manager to disable gradient synchronizations across DDP processes. Within this context, gradients will be accumulated on module variables, which will later be synchronized in the first forward-backward pass after exiting the context.

Note

This may result in higher memory usage because we will accumulate the full model gradients (instead of gradient shards) until the eventual sync.

property params_with_grad: List[torch.nn.Parameter]: [p for p in self.parameters() if p.grad is not None]

set_gradient_divide_factors(pre: float, post: float, recursive: bool) → None

Allowing user to override the pre and post divide factors.

Parameters

pre (float) – divide factor before the reduction.
post (float) – divide factor after the reduction.
recursive (bool) – recursively set it for all child FSDP instances or not.

state_dict(*args: Any, **kwargs: Any) → Any: Returns the whole (unsharded) state of the module. Parameters are not sharded, so the resulting state_dict can be loaded directly by the wrapped Module without any sharding-specific logic. Returned tensors will be full precision (e.g., FP32).

Warning

This needs to be called on all ranks, since synchronization primitives will be used.

summon_full_params(recurse: bool = True, volatile: bool = False) → Generator

A context manager to expose full params for the current FSDP instance. Can be useful after forward/backward for a model to get the params for additional processing or checking. Parameters will be gathered in full precision (e.g., FP32).

Note

This can be used on inner FSDPs.

Note

This can not be used within a forward or backward pass. Nor can forward and backward be started from within this context.

Note

The full parameters will be freed after the context manager exits; it is up to the caller to clone them if needed.

Note

The full parameters can be modified, but only the portion corresponding to the local param shard will persist after the context manager exits (unless volatile=True, in which case there are no guarantees about persistence).

Parameters

recurse (bool, Optional) – recursively summon all params for nested FSDP instances (default: True)
volatile (bool, Optional) – if True, modifications to params are not guaranteed to persist after the context manager exists; enabling this can be slightly more efficient (default: False)

Metric Accumulators

Tools to combine the metrics computed in multiple GPUs into a single metric

class hyperion.torch.utils.metric_acc.MetricAcc(device=None)[source]

Class to accumulate metrics during an epoch.

__init__(device=None)[source]

reset()[source]: Resets the accumulators.

update(metrics, num_samples=1)[source]

Updates the values of the metric

It uses recursive formula, it may be more numerically stable

m^(i) = m^(i-1) + n^(i)/sum(n^(i)) (x^(i) - m^(i-1))

where i is the batch number, m^(i) is the accumulated average of the metric at batch i, x^(i) is the average of the metric at batch i, n^(i) is the batch_size at batch i.

Parameters

metrics – dictionary with metrics for current batch
num_samples – number of samples in current batch (batch_size)

property metrics: Returns metrics dictionary

Evaluation Utils

Functions that can be usefull when evaluating neural networks. For example, when a signal is too long to fit in memory and needs to be splitted into chunks

hyperion.torch.utils.eval_utils.eval_nnet_by_chunks(x, nnet, chunk_length=0, detach_chunks=True, time_dim=- 1)[source]

hyperion.torch.utils.eval_utils.eval_nnet_overlap_add(x, nnet, chunk_length=0, chunk_overlap=None, detach_chunks=True, time_dim=- 1)[source]

Math Functions

hyperion.torch.utils.math.invert_trimat(A, lower=False, right_inv=False, return_logdet=False, return_inv=False)[source]

Inversion of triangular matrices.: Returns lambda function f that multiplies the inverse of A times a vector.

Parameters

A – Triangular matrix.
lower – if True A is lower triangular, else A is upper triangular.
right_inv – If False, f(v)=A^{-1}v; if True f(v)=v’ A^{-1}
return_logdet – If True, it also returns the log determinant of A.
return_inv – If True, it also returns A^{-1}

Returns

Lambda function that multiplies A^{-1} times vector. Log determinant of A A^{-1}

Miscellaneous Functions

hyperion.torch.utils.misc.l2_norm(x, axis=- 1)[source]

hyperion.torch.utils.misc.compute_snr(x, n, axis=- 1)[source]

hyperion.torch.utils.misc.compute_stats_adv_attack(x, x_adv)[source]

hyperion.torch.utils.misc.get_selfsim_tarnon(y, return_mask=False)[source]