PyTorch Models and Tools

The module hyperion.torch provides utilities, dataloaders, neural architectures and models based on PyTorch

Layers

These include several custom neural network layers.

Activation Function Layers

These includes a factory class the creates activation layers from config parameters, and custom activation layers.

class hyperion.torch.layers.activation_factory.ActivationFactory[source]
static create(activation, **kwargs)[source]

Creates a non-linear activation object

Parameters
  • activation – str with activation type, dictionary with name field indicating the activation type, and extra activation arguments None, then it returns None, Activation constructor

  • **kwargs – extra arguments for activation constructor

Returns

Non-linear activation object

static create_from_str(activation_name, **kwargs)[source]

Creates a non-linear activation object from string

Parameters
  • activation – str with activation type,

  • **kwargs – extra arguments for activation constructor

Returns

Non-linear activation object

static get_config(activation)[source]
class hyperion.torch.layers.swish.Swish(*args: Any, **kwargs: Any)[source]
forward(x)[source]
__init__(*args: Any, **kwargs: Any) None

Normalization Layers

These includes a factory class the creates normalizaton layers from config parameters.

Copyright 2020 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.layers.norm_layer_factory.NormLayer2dFactory[source]
static create(norm_name, num_groups=None, momentum=0.1, eps=1e-05)[source]

Creates a layer-norm callabe constructor

Parameters
  • norm_name

    str with normalization layer name, in [batch-norm, group-norm, instance-norm,

    instance-norm-affine, layer-norm ]

  • num_groups – num_groups for group-norm

  • momentum – default momentum

  • eps – default epsilon for numerical stability

Returns

Callable contructor to crate layer-norm layers

class hyperion.torch.layers.norm_layer_factory.NormLayer1dFactory[source]
static create(norm_name, num_groups=None, momentum=0.1, eps=1e-05)[source]

Creates a layer-norm callabe constructor

Parameters
  • norm_name

    str with normalization layer name, in [batch-norm, group-norm, instance-norm,

    instance-norm-affine, layer-norm ]

  • num_groups – num_groups for group-norm

  • momentum – default momentum

  • eps – default epsilon for numerical stability

Returns

Callable contructor to crate layer-norm layers

Dropout Layers

These include custom dropout and drop-connect layers

Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.layers.dropout.Dropout1d(*args: Any, **kwargs: Any)[source]
forward(inputs)[source]
__init__(*args: Any, **kwargs: Any) None
class hyperion.torch.layers.dropout.DropConnect2d(*args: Any, **kwargs: Any)[source]
__init__(p=0.2)[source]
forward(inputs)[source]
class hyperion.torch.layers.dropout.DropConnect1d(*args: Any, **kwargs: Any)[source]
__init__(p=0.2)[source]
forward(inputs)[source]

Attention Layers

Attention layers like the ones used in Transformers and Conformers.

Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.layers.attention.ScaledDotProdAttV1(*args: Any, **kwargs: Any)[source]

Scaled dot product multihead attention layer

in_feats

input feature dimension

out_feats

output feature dimension

num_heads

number of heads

d_k

key/query projection dimension

d_v

value projection dimension

dropout_rate

dropout rate

time_dim

time dimension in the input, default=1 meaning input dimensions are (batch, time, in_feats)

__init__(in_feats, out_feats, num_heads, d_k, d_v, dropout_rate=0, time_dim=1)[source]
property in_feats
property out_feats
forward(query, key, value, mask=None)[source]

Computes ‘Scaled Dot Product Attention’.

Parameters
  • query – query with size=(batch, time1, in_feats), where time1 is the output time dimension

  • key – key with size=(batch, time2, in_feats) where time1 is the input time dimension

  • value – value with size=(batch, time2, in_feats)

  • mask – optional mask with size=(batch, time1, time2), to zero attention between some time steps or size=(batch, time) to make time1=time2

Returns

Attention weigthed average of the value with size=(batch, time1, out_feats)

class hyperion.torch.layers.attention.LocalScaledDotProdAttV1(*args: Any, **kwargs: Any)[source]
Local Scaled dot product multihead attention layer

It calculates self-attention between time steps within a window of ‘context’ frames.

in_feats

input feature dimension

out_feats

output feature dimension

num_heads

number of heads

d_k

key/query projection dimension

d_v

value projection dimension

context

maximum attention temporal context.

dropout_rate

dropout rate

time_dim

time dimension in the input, default=1 meaning input dimensions are (batch, time, in_feats)

__init__(in_feats, out_feats, num_heads, d_k, d_v, context=25, dropout_rate=0, time_dim=1)[source]

Construct an MultiHeadedAttention object.

static _softmax(scores1, scores2, shift1, shift2, t1, t2)[source]

Computes softmax for block diagonal attention maps

Parameters
  • scores1 – attention scores from block-diagonal score matrix with size=(batch, heads, blocks, t1, t2)

  • scores2 – attention scores from a shifted block-diagonal score matrix with size=(batch, heads, blocks-1, t1, t2)

  • shift1 – shift of diagonal blocks of scores2 wrt scores1 in time steps in the time dimension 1

  • shift2 – shift of diagonal blocks of scores2 wrt scores1 in time steps in the time dimension 2, with self-attention shift1=shift2

  • t1 – length of time dimension 1 (output time dimension)

  • t2 – length of time dimension 2 (input time dimension), with self-att t1=t2.

Returns
probs1: posterior attention scores for block-diagonal att. matrix

with size=(batch, heads, blocks, t1, t2)

probs2: posterior attention scores for a shifted block-diagonal att. matrix

with size=(batch, heads, blocks-1, t1, t2)

forward1(query, key, value, mask)[source]

Computes ‘Local Scaled Dot Product Attention’.

Parameters
  • query – query with size=(batch, time1, in_feats), where time1 is the output time dimension

  • key – key with size=(batch, time2, in_feats) where time1 is the input time dimension

  • value – value with size=(batch, time2, in_feats)

  • mask

    optional mask with size=(batch, time1, time2),

    to zero attention between some time steps.

    or (batch, time) if time1=time2

Returns

Attention weigthed average of the values with size=(batch, time1, out_feats)

forward2(query, key, value, mask)[source]

Computes ‘Local Scaled Dot Product Attention’.

Parameters
  • query – query with size=(batch, time1, in_feats), where time1 is the output time dimension

  • key – key with size=(batch, time2, in_feats) where time1 is the input time dimension

  • value – value with size=(batch, time2, in_feats)

  • mask

    optional mask with size=(batch, time1, time2),

    to zero attention between some time steps.

    or (batch, time) if time1=time2

Returns

Attention weigthed average of the values with size=(batch, time1, out_feats)

forward(query, key, value, mask)[source]

Computes ‘Local Scaled Dot Product Attention’.

Parameters
  • query – query with size=(batch, time1, in_feats), where time1 is the output time dimension

  • key – key with size=(batch, time2, in_feats) where time1 is the input time dimension

  • value – value with size=(batch, time2, in_feats)

  • mask

    optional mask with size=(batch, time1, time2),

    to zero attention between some time steps.

    or (batch, time) if time1=time2

Returns

Attention weigthed average of the values with size=(batch, time1, out_feats)

property in_feats
property out_feats
class hyperion.torch.layers.attention.ScaledDotProdAttRelPosEncV1(*args: Any, **kwargs: Any)[source]
Scaled dot product multihead attention layer

with relative positional encoders as defined in https://arxiv.org/pdf/1901.02860.pdf

in_feats

input feature dimension

out_feats

output feature dimension

num_heads

number of heads

d_k

key/query projection dimension

d_v

value projection dimension

causal_pos_enc

positional encoder is 0 for attending future frames.

dropout_rate

dropout rate

time_dim

time dimension in the input, default=1 meaning input dimensions are (batch, time, in_feats)

__init__(in_feats, out_feats, num_heads, d_k, d_v, causal_pos_enc=False, dropout_rate=0, time_dim=1)[source]
_apply_tril(x)[source]
Applies lower triangular mask to (Q + v^T) W R_{i-j} attention matrix

to keep causal attention points, i.e., i-j >= 0

E.g., if t1=3, t2=4 this will apply a mask [1 1 0 0;

1 1 1 0; 1 1 1 1 ]

_apply_triu(x)[source]
Applies upper triangular mask to (Q + v^T) W R_{i-j} attention matrix

to keep non-causal attention points, i.e., i-j < 0

E.g., if t1=3, t2=4 this will apply a mask [0 0 1 1;

0 0 0 1; 0 0 0 0 ]

_left_shift(x)[source]
Applies left shifts to the rows of x

to get scores with relative pos encodings R_{i-j} i-j >=0, causal attention

E.g.
[q0 R3, q0 R2, q0 R1, q0 R0;

q1 R3, q1 R2, q1 R1, q1 R0; q2 R3, q2 R2, q2 R1, q2 R0]

becomes:
[q0 R1, q0 R0, 0 , 0 ;

q1 R2, q1 R1, q1 R0, 0 ; q2 R3, q2 R2, q2 R1, q2 R0]

_right_shift(x)[source]
Applies right shifts to the rows of x

to get scores with relative pos encodings R_{i-j} i-j < 0, non-causal attention

E.g.
[q0 R_0, q0 R_{-1}, q0 R_{-2};

q1 R_0, q1 R_{-1}, q1 R_{-2}; q2 R_0, q1 R_{-1}, q2 R_{-2}]

becomes:
[ 0, q0 R_{-1}, q0 R_{-2};

0, 0 , q1 R_{-1}; 0, 0 , 0 ]

forward(query, key, value, pos_emb=None, mask=None)[source]

Computes ‘Scaled Dot Product Attention’.

Parameters
  • query – query with size=(batch, time1, in_feats), where time1 is the output time dimension

  • key – key with size=(batch, time2, in_feats) where time1 is the input time dimension

  • value – value with size=(batch, time2, in_feats)

  • pos_emb – positional embedding size=(batch, time2, in_feats) as R_{L-1}, …, R_0

  • mask – optional mask with size=(batch, time1, time2), to zero attention between some time steps or size=(batch, time) to make time1=time2

Returns

Attention weigthed average of the value with size=(batch, time1, out_feats)

property in_feats
property out_feats
class hyperion.torch.layers.attention.LocalScaledDotProdAttRelPosEncV1(*args: Any, **kwargs: Any)[source]
Local Scaled dot product multihead attention layer

It calculates self-attention between time steps within a window of ‘context’ frames.

It uses relative positional encoders as defined in https://arxiv.org/pdf/1901.02860.pdf

in_feats

input feature dimension

out_feats

output feature dimension

num_heads

number of heads

d_k

key/query projection dimension

d_v

value projection dimension

context

maximum attention temporal context.

causal_pos_enc

positional encoder is 0 for attending future frames.

dropout_rate

dropout rate

time_dim

time dimension in the input, default=1 meaning input dimensions are (batch, time, in_feats)

__init__(in_feats, out_feats, num_heads, d_k, d_v, context=25, causal_pos_enc=False, dropout_rate=0, time_dim=1)[source]

Construct an MultiHeadedAttention object.

_apply_tril(x)[source]
Applies lower triangular mask to (Q + v^T) W R_{i-j} attention matrix

to keep causal attention points, i.e., i-j >= 0

E.g., if t1=3, t2=4 this will apply a mask [1 1 0 0;

1 1 1 0; 1 1 1 1 ]

_apply_triu(x)[source]
Applies upper triangular mask to (Q + v^T) W R_{i-j} attention matrix

to keep non-causal attention points, i.e., i-j < 0

E.g., if t1=3, t2=4 this will apply a mask [0 0 1 1;

0 0 0 1; 0 0 0 0 ]

_left_shift(x, context, left_shift)[source]
Applies left shifts to the rows of x

to get scores with relative pos encodings R_{i-j} i-j >=0, causal attention

E.g.
[q0 R3, q0 R2, q0 R1, q0 R0;

q1 R3, q1 R2, q1 R1, q1 R0; q2 R3, q2 R2, q2 R1, q2 R0]

becomes:
[q0 R1, q0 R0, 0 , 0 ;

q1 R2, q1 R1, q1 R0, 0 ; q2 R3, q2 R2, q2 R1, q2 R0]

_right_shift(x, context, left_shift)[source]
Applies right shifts to the rows of x

to get scores with relative pos encodings R_{i-j} i-j < 0, non-causal attention

E.g.
[q0 R_0, q0 R_{-1}, q0 R_{-2};

q1 R_0, q1 R_{-1}, q1 R_{-2}; q2 R_0, q1 R_{-1}, q2 R_{-2}]

becomes:
[ 0, q0 R_{-1}, q0 R_{-2};

0, 0 , q1 R_{-1}; 0, 0 , 0 ]

forward(query, key, value, pos_emb=None, mask=None)[source]

Computes ‘Scaled Dot Product Attention’.

Parameters
  • query – query with size=(batch, time1, in_feats), where time1 is the output time dimension

  • key – key with size=(batch, time2, in_feats) where time1 is the input time dimension

  • value – value with size=(batch, time2, in_feats)

  • pos_emb – positional embedding size=(batch, time2, in_feats) as R_{L-1}, …, R_0

  • mask – optional mask with size=(batch, time1, time2), to zero attention between some time steps or size=(batch, time) to make time1=time2

Returns

Attention weigthed average of the value with size=(batch, time1, out_feats)

static _softmax(scores1, scores2, shift1, shift2, t1, t2)

Computes softmax for block diagonal attention maps

Parameters
  • scores1 – attention scores from block-diagonal score matrix with size=(batch, heads, blocks, t1, t2)

  • scores2 – attention scores from a shifted block-diagonal score matrix with size=(batch, heads, blocks-1, t1, t2)

  • shift1 – shift of diagonal blocks of scores2 wrt scores1 in time steps in the time dimension 1

  • shift2 – shift of diagonal blocks of scores2 wrt scores1 in time steps in the time dimension 2, with self-attention shift1=shift2

  • t1 – length of time dimension 1 (output time dimension)

  • t2 – length of time dimension 2 (input time dimension), with self-att t1=t2.

Returns
probs1: posterior attention scores for block-diagonal att. matrix

with size=(batch, heads, blocks, t1, t2)

probs2: posterior attention scores for a shifted block-diagonal att. matrix

with size=(batch, heads, blocks-1, t1, t2)

forward1(query, key, value, mask)

Computes ‘Local Scaled Dot Product Attention’.

Parameters
  • query – query with size=(batch, time1, in_feats), where time1 is the output time dimension

  • key – key with size=(batch, time2, in_feats) where time1 is the input time dimension

  • value – value with size=(batch, time2, in_feats)

  • mask

    optional mask with size=(batch, time1, time2),

    to zero attention between some time steps.

    or (batch, time) if time1=time2

Returns

Attention weigthed average of the values with size=(batch, time1, out_feats)

forward2(query, key, value, mask)

Computes ‘Local Scaled Dot Product Attention’.

Parameters
  • query – query with size=(batch, time1, in_feats), where time1 is the output time dimension

  • key – key with size=(batch, time2, in_feats) where time1 is the input time dimension

  • value – value with size=(batch, time2, in_feats)

  • mask

    optional mask with size=(batch, time1, time2),

    to zero attention between some time steps.

    or (batch, time) if time1=time2

Returns

Attention weigthed average of the values with size=(batch, time1, out_feats)

property in_feats
property out_feats

Pooling Layers

These include custom pooling layers and factory class to create pooling layers from config parameters.

Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.layers.pool_factory.GlobalPool1dFactory[source]
static create(pool_type, in_feats=None, inner_feats=128, num_comp=64, dist_pow=2, use_bias=False, num_heads=8, d_k=256, d_v=256, bin_attn=False, use_global_context=True, norm_layer=None, dim=- 1, keepdim=False, **kwargs)[source]
static filter_args(**kwargs)[source]
static add_class_args(parser, prefix=None, skip=[])[source]
static get_config(layer)[source]
static add_argparse_args(parser, prefix=None, skip=[])

Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

hyperion.torch.layers.global_pool._conv1(in_channels, out_channels, bias=False)[source]

point-wise convolution

class hyperion.torch.layers.global_pool.GlobalAvgPool1d(*args: Any, **kwargs: Any)[source]

Global average pooling in 1d

dim

pooling dimension

keepdim

it True keeps the same number of dimensions after pooling

__init__(dim=- 1, keepdim=False)[source]
forward(x, weights=None)[source]
forward_slidwin(x, win_length, win_shift, snip_edges=False)[source]
get_config()
class hyperion.torch.layers.global_pool.GlobalMeanStdPool1d(*args: Any, **kwargs: Any)[source]

Global mean + standard deviation pooling in 1d

dim

pooling dimension

keepdim

it True keeps the same number of dimensions after pooling

__init__(dim=- 1, keepdim=False)[source]
forward(x, weights=None)[source]
forward_slidwin(x, win_length, win_shift, snip_edges=False)[source]
get_config()
class hyperion.torch.layers.global_pool.GlobalMeanLogVarPool1d(*args: Any, **kwargs: Any)[source]

Global mean + log-variance pooling in 1d

dim

pooling dimension

keepdim

it True keeps the same number of dimensions after pooling

__init__(dim=- 1, keepdim=False)[source]
forward(x, weights=None)[source]
forward_slidwin(x, win_length, win_shift)
get_config()
class hyperion.torch.layers.global_pool.LDEPool1d(*args: Any, **kwargs: Any)[source]

Learnable dictionary encoder pooling in 1d

in_feats

input feature dimension

num_comp

number of cluster components

dist_pow

power for distance metric

use_bias

use bias parameter when computing posterior responsibility

dim

pooling dimension

keepdim

it True keeps the same number of dimensions after pooling

__init__(in_feats, num_comp=64, dist_pow=2, use_bias=False, dim=- 1, keepdim=False)[source]
property num_comp
property in_feats
forward(x, weights=None)[source]
get_config()[source]
forward_slidwin(x, win_length, win_shift)
class hyperion.torch.layers.global_pool.ScaledDotProdAttV1Pool1d(*args: Any, **kwargs: Any)[source]
__init__(in_feats, num_heads, d_k, d_v, bin_attn=False, dim=- 1, keepdim=False)[source]
property in_feats
forward(x, weights=None)[source]
get_config()[source]
forward_slidwin(x, win_length, win_shift)
class hyperion.torch.layers.global_pool.GlobalChWiseAttMeanStdPool1d(*args: Any, **kwargs: Any)[source]

Attentive mean + stddev pooling for each channel

__init__(in_feats, inner_feats=128, bin_attn=False, use_global_context=True, norm_layer=None, dim=- 1, keepdim=False)[source]
forward(x, weights=None)[source]
forward_slidwin(x, win_length, win_shift)
get_config()[source]

Acoustic Feature Extraction Layers

These define several feature extraction layers that take wave as input and produce Spectrograms, Filter-banks, MFCC, etc. It also includes a factory class to create feature extraction layers from config params.

class hyperion.torch.layers.audio_feats_factory.AudioFeatsFactory[source]
static create(audio_feat, sample_frequency=16000, frame_length=25, frame_shift=10, fft_length=512, remove_dc_offset=True, preemphasis_coeff=0.97, window_type='povey', use_fft_mag=False, dither=1, fb_type='mel_kaldi', low_freq=20, high_freq=0, num_filters=23, norm_filters=False, num_ceps=13, snip_edges=True, center=False, cepstral_lifter=22, energy_floor=0, raw_energy=True, use_energy=True)[source]
static filter_args(**kwargs)[source]

Filters MFCC args from arguments dictionary.

Parameters

kwargs – Arguments dictionary.

Returns

Dictionary with MFCC options.

static add_class_args(parser, prefix=None)[source]

Adds MFCC options to parser.

Parameters
  • parser – Arguments parser

  • prefix – Options prefix.

static add_argparse_args(parser, prefix=None)

Adds MFCC options to parser.

Parameters
  • parser – Arguments parser

  • prefix – Options prefix.

Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

hyperion.torch.layers.audio_feats._get_feature_window_function(window_type, window_size, blackman_coeff=0.42)[source]

Returns a window function with the given type and size

hyperion.torch.layers.audio_feats._get_strided_batch(waveform, window_length, window_shift, snip_edges, center=False)[source]

Given a waveform (1D tensor of size num_samples), it returns a 2D tensor (m, window_size) representing how the window is shifted along the waveform. Each row is a frame.

Parameters
  • waveform (torch.Tensor) – Tensor of size num_samples

  • window_size (int) – Frame length

  • window_shift (int) – Frame shift

  • snip_edges (bool) – If True, end effects will be handled by outputting only frames that completely fit in the file, and the number of frames depends on the frame_length. If False, the number of frames depends only on the frame_shift, and we reflect the data at the ends.

  • center (bool) – If true, if puts the center of the frame at t*window_shift, starting at t=0, If overwrides snip_edges and set it to False

Returns

3D tensor of size (m, window_size) where each row is a frame

Return type

torch.Tensor

hyperion.torch.layers.audio_feats._get_log_energy(x, energy_floor)[source]

Returns the log energy of size (m) for a strided_input (m,*)

class hyperion.torch.layers.audio_feats.Wav2Win(*args: Any, **kwargs: Any)[source]
__init__(fs=16000, frame_length=25, frame_shift=10, pad_length=None, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', dither=1, snip_edges=True, center=False, energy_floor=0, raw_energy=True, return_log_energy=False)[source]
forward(x)[source]
class hyperion.torch.layers.audio_feats.Wav2FFT(*args: Any, **kwargs: Any)[source]
__init__(fs=16000, frame_length=25, frame_shift=10, fft_length=512, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', dither=1, snip_edges=True, center=False, energy_floor=0, raw_energy=True, use_energy=True)[source]
property fs
property frame_length
property frame_shift
property remove_dc_offset
property preemph_coeff
property window_type
property dither
forward(x)[source]
class hyperion.torch.layers.audio_feats.Wav2Spec(*args: Any, **kwargs: Any)[source]
__init__(fs=16000, frame_length=25, frame_shift=10, fft_length=512, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', use_fft_mag=False, dither=1, snip_edges=True, center=False, energy_floor=0, raw_energy=True, use_energy=True)[source]
forward(x)[source]
property dither
property frame_length
property frame_shift
property fs
property preemph_coeff
property remove_dc_offset
property window_type
class hyperion.torch.layers.audio_feats.Wav2LogSpec(*args: Any, **kwargs: Any)[source]
__init__(fs=16000, frame_length=25, frame_shift=10, fft_length=512, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', use_fft_mag=False, dither=1, snip_edges=True, center=False, energy_floor=0, raw_energy=True, use_energy=True)[source]
forward(x)[source]
property dither
property frame_length
property frame_shift
property fs
property preemph_coeff
property remove_dc_offset
property window_type
class hyperion.torch.layers.audio_feats.Wav2LogFilterBank(*args: Any, **kwargs: Any)[source]
__init__(fs=16000, frame_length=25, frame_shift=10, fft_length=512, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', use_fft_mag=False, dither=1, fb_type='mel_kaldi', low_freq=20, high_freq=0, num_filters=23, norm_filters=False, snip_edges=True, center=False, energy_floor=0, raw_energy=True, use_energy=True)[source]
forward(x)[source]
property dither
property frame_length
property frame_shift
property fs
property preemph_coeff
property remove_dc_offset
property window_type
class hyperion.torch.layers.audio_feats.Wav2MFCC(*args: Any, **kwargs: Any)[source]
__init__(fs=16000, frame_length=25, frame_shift=10, fft_length=512, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', use_fft_mag=False, dither=1, fb_type='mel_kaldi', low_freq=20, high_freq=0, num_filters=23, norm_filters=False, num_ceps=13, snip_edges=True, center=False, cepstral_lifter=22, energy_floor=0, raw_energy=True, use_energy=True)[source]
static make_lifter(N, Q)[source]

Makes the liftering function

Parameters
  • N – Number of cepstral coefficients.

  • Q – Liftering parameter

Returns

Liftering vector.

static make_dct_matrix(num_ceps, num_filters)[source]
forward(x)[source]
property dither
property frame_length
property frame_shift
property fs
property preemph_coeff
property remove_dc_offset
property window_type
class hyperion.torch.layers.audio_feats.Wav2KanBayashiLogFilterBank(*args: Any, **kwargs: Any)[source]

Class to replicate log-filter-banks used in Kan Bayashi’s ParallelWaveGAN repository: https://github.com/kan-bayashi/ParallelWaveGAN

__init__(fs=16000, frame_length=64, frame_shift=16, fft_length=1024, remove_dc_offset=True, window_type='hanning', low_freq=80, high_freq=7600, num_filters=80, snip_edges=False, center=True)[source]
forward(x)[source]
property dither
property frame_length
property frame_shift
property fs
property preemph_coeff
property remove_dc_offset
property window_type
class hyperion.torch.layers.audio_feats.Spec2LogFilterBank(fs=16000, fft_length=512, fb_type='mel_kaldi', low_freq=20, high_freq=0, num_filters=23, norm_filters=False)[source]
__init__(fs=16000, fft_length=512, fb_type='mel_kaldi', low_freq=20, high_freq=0, num_filters=23, norm_filters=False)[source]
forward(x)[source]

Feature Normalization Layers

Copyright 2020 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.layers.mvn.MeanVarianceNorm(*args: Any, **kwargs: Any)[source]
__init__(norm_mean=True, norm_var=False, left_context=0, right_context=0, dim=1)[source]
forward(x)[source]
normalize_global(x)[source]
normalize_cumsum(x)[source]
static filter_args(**kwargs)[source]

Filters ST-CMVN args from arguments dictionary.

Parameters

kwargs – Arguments dictionary.

Returns

Dictionary with ST-CMVN options.

static add_class_args(parser, prefix=None)[source]

Adds ST-CMVN options to parser.

Parameters
  • parser – Arguments parser

  • prefix – Options prefix.

static add_argparse_args(parser, prefix=None)

Adds ST-CMVN options to parser.

Parameters
  • parser – Arguments parser

  • prefix – Options prefix.

Feature Augmentation Layers

Copyright 2021 Johns Hopkins University (Author: Jesus Villalba, Nanxin Chen) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.layers.spec_augment.AxisMasker(*args: Any, **kwargs: Any)[source]

Applies a mask to the spectrogram along time or freq dimension. Implementation based on espnet.

mask_width_range

range for the width of the masks

mask_num_range

range for the number of masks

dim

axis where we apply the mask

fill_value

masking value

__init__(min_width=0, max_width=30, min_num_masks=1, max_num_masks=2, dim=- 1, fill_value=0)[source]
forward(x)[source]

Apply mask along time or freq dimension

Parameters

x – spectrogram (batch, *, time, freq)

Returns

Masked spectrogram (batch, *, time, freq)

class hyperion.torch.layers.spec_augment.SpecWarper(*args: Any, **kwargs: Any)[source]

Warps the spectrogram along time or freq dimension. Implementation based on espnet.

window

time warp parameter

__init__(window=80, mode='bicubic', dim=- 2)[source]
forward(x, lengths=None)[source]

warps x along time or freq dimension

Parameters
  • x – spectrogram (batch, *, time, freq)

  • lengths – length ratios

Returns

warped spectrogram (batch, *, time, freq)

class hyperion.torch.layers.spec_augment.SpecAugment(*args: Any, **kwargs: Any)[source]

Implementation of SpecAugment.

Reference:

Daniel S. Park et al. “SpecAugment: A Simple Data

Augmentation Method for Automatic Speech Recognition”

Attributes:

__init__(time_warp_prob=0, time_warp_window=5, time_warp_mode='bicubic', time_mask_prob=0, time_mask_min_width=0, time_mask_max_width=100, time_mask_min_num_masks=1, time_mask_max_num_masks=2, freq_mask_prob=0, freq_mask_min_width=0, freq_mask_max_width=20, freq_mask_min_num_masks=1, freq_mask_max_num_masks=2, fill_value=0)[source]
forward(x, lengths=None)[source]
filter_args()[source]

Filters SpecAugment args from arguments dictionary.

Parameters

kwargs – Arguments dictionary.

Returns

Dictionary with SpecAugment options.

static add_class_args(parser, prefix=None)[source]

Adds SpecAugment options to parser.

Parameters
  • parser – Arguments parser

  • prefix – Options prefix.

Large Margin Losses Layers

These are output layers that are used to create large margin cross-entorpy losses.

Copyright 2019 Johns Hopkins University (Author: Jesus Villalba, Nanxin Chen) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.layers.margin_losses.ArcLossOutput(*args: Any, **kwargs: Any)[source]
__init__(in_feats, num_classes, s=64, margin=0.3, margin_warmup_epochs=0)[source]
update_margin(epoch)[source]
forward(x, y=None)[source]
class hyperion.torch.layers.margin_losses.CosLossOutput(*args: Any, **kwargs: Any)[source]
__init__(in_feats, num_classes, s=64, margin=0.3, margin_warmup_epochs=0)[source]
update_margin(epoch)[source]
forward(x, y=None)[source]
class hyperion.torch.layers.margin_losses.SubCenterArcLossOutput(*args: Any, **kwargs: Any)[source]
__init__(in_feats, num_classes, num_subcenters=2, s=64, margin=0.3, margin_warmup_epochs=0)[source]
update_margin(epoch)
forward(x, y=None)[source]

Prob Densitiy Function Layers

These are layers related to probability density functions used in VAEs

Copyright 2020 Johns Hopkins University (Author: Jesus Villalba, Nanxin Chen) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.layers.pdf_storage.StdNormal(*args: Any, **kwargs: Any)[source]

Storage for Standard Normal distribution

__init__(shape)[source]
property pdf
forward()[source]

Copyright 2020 Johns Hopkins University (Author: Jesus Villalba, Nanxin Chen) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.layers.tensor2pdf.Tensor2PDF(*args: Any, **kwargs: Any)[source]

Base class for layers that create a prob distribution from an input tensor

__init__(pdf_feats, project=True, in_feats=None, in_dim=None)[source]
class hyperion.torch.layers.tensor2pdf.Tensor2NormalICov(*args: Any, **kwargs: Any)[source]

Transforms a Tensor into Normal distribution with identitiy variance

__init__(pdf_feats, project=True, in_feats=None, in_dim=None)[source]
forward(inputs, prior=None, squeeze_dim=None)[source]
class hyperion.torch.layers.tensor2pdf.Tensor2NormalGlobDiagCov(*args: Any, **kwargs: Any)[source]

Transforms a Tensor into Normal distribution

Input tensor will be the mean of the distribution and the standard deviation is a global trainable parameter.

__init__(pdf_feats, project=True, in_feats=None, in_dim=None)[source]
forward(inputs, prior=None, squeeze_dim=None)[source]
class hyperion.torch.layers.tensor2pdf.Tensor2NormalDiagCov(*args: Any, **kwargs: Any)[source]

Transforms a Tensor into Normal distribution

Applies two linear transformation to the tensors to obtain the mean and the log-variance.

__init__(pdf_feats, project=True, in_feats=None, in_dim=None)[source]
forward(inputs, prior=None, squeeze_dim=None)[source]
class hyperion.torch.layers.tensor2pdf.Tensor2BayNormalICovGivenNormalPrior(*args: Any, **kwargs: Any)[source]

Transforms a Tensor into Normal distribution with identitiy variance

Uses Bayesian interpolation between Gaussian prior and Maximum Likelihood estimation

__init__(pdf_feats, project=True, in_feats=None, in_dim=None)[source]
forward(inputs, prior=None, squeeze_dim=None)[source]
class hyperion.torch.layers.tensor2pdf.Tensor2BayNormalGlobDiagCovGivenNormalPrior(*args: Any, **kwargs: Any)[source]

Transforms a Tensor into Normal distribution

Input tensor will be the ML mean of the distribution and the ML standard deviation is a global trainable parameter.

Uses Bayesian interpolation between Gaussian prior and Maximum Likelihood estimation

__init__(pdf_feats, project=True, in_feats=None, in_dim=None)[source]
forward(inputs, prior=None, squeeze_dim=None)[source]
class hyperion.torch.layers.tensor2pdf.Tensor2BayNormalDiagCovGivenNormalPrior(*args: Any, **kwargs: Any)[source]

Transforms a Tensor into Normal distribution

Applies two linear transformation to the tensors to obtain the maximum likelihood mean and the log-variance.

Uses Bayesian interpolation between Gaussian prior and Maximum Likelihood estimation

__init__(pdf_feats, project=True, in_feats=None, in_dim=None)[source]
forward(inputs, prior=None, squeeze_dim=None)[source]

Vector Quantization Layers

These are vector quantization layers like the ones used in VQ-VAEs

Copyright 2020 Johns Hopkins University (Author: Jesus Villalba, Nanxin Chen) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.layers.vq.VectorQuantizer(*args: Any, **kwargs: Any)[source]
__init__(num_embed, embed_feats, project=True, in_feats=None, in_dim=None)[source]
class hyperion.torch.layers.vq.KMeansVectorQuantizer(*args: Any, **kwargs: Any)[source]
__init__(num_embed, embed_feats, commitment_cost=0.25, project=True, in_feats=None, in_dim=None)[source]
forward(inputs, return_r=False)[source]
class hyperion.torch.layers.vq.MultiKMeansVectorQuantizer(*args: Any, **kwargs: Any)[source]
__init__(num_groups, num_embed, embed_feats, commitment_cost=0.25, project=True, in_feats=None, in_dim=None)[source]
property commitment_cost
forward(inputs, return_r=False)[source]
class hyperion.torch.layers.vq.EMAKMeansVectorQuantizer(*args: Any, **kwargs: Any)[source]
__init__(num_embed, embed_feats, commitment_cost=0.25, gamma=0.99, eps=1e-05, project=True, in_feats=None, in_dim=None)[source]
forward(inputs, return_r=False)[source]
class hyperion.torch.layers.vq.MultiEMAKMeansVectorQuantizer(*args: Any, **kwargs: Any)[source]
__init__(num_groups, num_embed, embed_feats, commitment_cost=0.25, gamma=0.99, eps=1e-05, project=True, in_feats=None, in_dim=None)[source]
property commitment_cost
property gamma
property eps
forward(inputs, return_r=False)[source]

Upsampling Layers

These include layers related to upsampling operations.

Copyright 2021 Johns Hopkins University (Author: Jesus Villalba, Nanxin Chen) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.layers.interpolate.Interpolate(*args: Any, **kwargs: Any)[source]
__init__(scale_factor, mode='nearest')[source]
forward(x)[source]

Copyright 2020 Johns Hopkins University (Author: Jesus Villalba, Nanxin Chen) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.layers.subpixel_convs.SubPixelConv1d(*args: Any, **kwargs: Any)[source]
__init__(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')[source]
forward(x)[source]
class hyperion.torch.layers.subpixel_convs.SubPixelConv2d(*args: Any, **kwargs: Any)[source]
__init__(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')[source]
forward(x)[source]
hyperion.torch.layers.subpixel_convs.ICNR2d(tensor, stride=2, initializer=torch.nn.init.kaiming_normal)[source]

Initialization method “Initialization to Convolution Nearest neighbours Resize (ICNR)” for subpixel convolutions described in described in “Andrew Aitken et al. (2017) Checkerboard artifact free sub-pixel convolution”

Parameters
  • tensor – torch.Tensor containing the conv weights

  • stride – subpixel conv stride

  • initializer – initizializer to be used for sub_kernel inizialization

Examples

>>> conv = SubPixelConv2d(in_channels, out_channels, kernel_size=3, stride=upscale)
>>> ICNR2d(conv_shuffle.weight, stride=upscale)
hyperion.torch.layers.subpixel_convs.ICNR1d(tensor, stride=2, initializer=torch.nn.init.kaiming_normal)[source]

1d version of the initialization method “Initialization to Convolution Nearest neighbours Resize (ICNR)” for subpixel convolutions described in described in “Andrew Aitken et al. (2017) Checkerboard artifact free sub-pixel convolution”

Parameters
  • tensor – torch.Tensor containing the conv weights

  • stride – subpixel conv stride

  • initializer – initizializer to be used for sub_kernel inizialization

Examples

>>> conv = SubPixelConv1d(in_channels, out_channels, kernel_size=3, stride=upscale)
>>> ICNR1d(conv_shuffle.weight, stride=upscale)

Positional Encoders

These include layers that implement positional encoders used in transformers.

Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.layers.pos_encoder.PosEncoder(*args: Any, **kwargs: Any)[source]

Positional encoding.

num_feats

embedding dim

dropout_rate

dropout rate

__init__(num_feats, dropout_rate=0)[source]
_pe(x, relative=False)[source]

Reset the positional encodings.

forward(x)[source]

Add positional encoding.

Parameters

x – Input with shape=(batch, time, C)

Returns

x-scaled + pos-encoder

class hyperion.torch.layers.pos_encoder.RelPosEncoder(*args: Any, **kwargs: Any)[source]
Relative Positional encoding as defined in

https://arxiv.org/pdf/1901.02860.pdf

It returns the input and the positional encoder separtely so they are mixed in the attention block later.

num_feats

embedding dim

dropout_rate

dropout rate

__init__(num_feats, dropout_rate=0)[source]
forward(x)[source]

Add positional encoding.

Parameters

x – Input with shape=(batch, time, C)

Returns

x-scaled, pos-encoding

_pe(x, relative=False)

Reset the positional encodings.

class hyperion.torch.layers.pos_encoder.NoPosEncoder(*args: Any, **kwargs: Any)[source]

This is a dummy class for the case where we deactivate the positional encoder

__init__()[source]
forward(x)[source]

Identity map

Parameters

x – Input with shape=(batch, time, C)

Returns

x

Calibration

These are layers that are used to simulate the calibration block after the speaker recognition back-end

Copyright 2020 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.layers.calibrators.LinBinCalibrator(*args: Any, **kwargs: Any)[source]
__init__(a, b)[source]
forward(x)[source]

Layer Blocks

These are Torch modules that combine several layers. These are the building blocks used to create more complex architectures like ResNets, Transformers of EfficientNets.

Fully Connected Blocks

These are fully connected blocks used to create simple feed forward networks, classification heads, etc.

Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.layer_blocks.fc_blocks.FCBlock(*args: Any, **kwargs: Any)[source]

Fully connected block

in_feats

input feature dimension

out_feats

output feature dimension

activatoin

str/dict indicating the type of activation function

norm_layer

normalization layer constructor, if None it uses batch-norm

use_norm

if True, it applies the normalization layer, if False no normalization is applied

norm_before

if True normalization layer is applied before the activation function, if False after

__init__(in_feats, out_feats, activation={'inplace': True, 'name': 'relu'}, dropout_rate=0, norm_layer=None, use_norm=True, norm_before=False)[source]
forward(x)[source]

Forward function

forward_linear(x)[source]

Forward function without activation function

Deep Convolutional Blocks

Deep Convolutional 1d Blocks

These are blocks to create deep convolutional networks 1d without residuals.

Copyright 2020 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.layer_blocks.dc1d_blocks.DC1dEncBlock(*args: Any, **kwargs: Any)[source]
__init__(in_channels, out_channels, kernel_size, stride=1, dilation=1, activation='relu', dropout_rate=0, use_norm=True, norm_layer=None, norm_before=True)[source]
freeze()[source]
unfreeze()[source]
forward(x)[source]
class hyperion.torch.layer_blocks.dc1d_blocks.DC1dDecBlock(*args: Any, **kwargs: Any)[source]
__init__(in_channels, out_channels, kernel_size, stride=1, dilation=1, activation='relu', dropout_rate=0, use_norm=True, norm_layer=None, norm_before=True)[source]
freeze()[source]
unfreeze()[source]
forward(x)[source]

Deep Convolutional 2d Blocks

These are blocks to create deep convolutional networks 2d without residuals.

Copyright 2020 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.layer_blocks.dc2d_blocks.DC2dEncBlock(*args: Any, **kwargs: Any)[source]
__init__(in_channels, out_channels, kernel_size, stride=1, dilation=1, activation='relu', dropout_rate=0, use_norm=True, norm_layer=None, norm_before=True)[source]
freeze()[source]
unfreeze()[source]
forward(x)[source]
class hyperion.torch.layer_blocks.dc2d_blocks.DC2dDecBlock(*args: Any, **kwargs: Any)[source]
__init__(in_channels, out_channels, kernel_size, stride=1, dilation=1, activation='relu', dropout_rate=0, use_norm=True, norm_layer=None, norm_before=True)[source]
freeze()[source]
unfreeze()[source]
forward(x)[source]

TDNN Blocks

TDNN Blocks

TDNN blocks used to create TDNN x-vectors

Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.layer_blocks.tdnn_blocks.TDNNBlock(*args: Any, **kwargs: Any)[source]
__init__(in_channels, out_channels, kernel_size, dilation=1, activation={'inplace': True, 'name': 'relu'}, dropout_rate=0, norm_layer=None, use_norm=True, norm_before=False)[source]
freeze()[source]
unfreeze()[source]
forward(x)[source]

Extended TDNN Blocks

Extended TDNN blocks used to create E-TDNN x-vectors

Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.layer_blocks.etdnn_blocks.ETDNNBlock(*args: Any, **kwargs: Any)[source]
__init__(in_channels, out_channels, kernel_size, dilation=1, activation={'inplace': True, 'name': 'relu'}, dropout_rate=0, norm_layer=None, use_norm=True, norm_before=False)[source]
forward(x)[source]

Residual Extended TDNN Blocks

Extended TDNN blocks with residual connections

Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.layer_blocks.resetdnn_blocks.ResETDNNBlock(*args: Any, **kwargs: Any)[source]
__init__(num_channels, kernel_size, dilation=1, activation={'inplace': True, 'name': 'relu'}, dropout_rate=0, norm_layer=None, use_norm=True, norm_before=False)[source]
forward(x)[source]

Squeeze-Excitation Blocks

Squeeze-Excitation Blocks 1d and 2d, which are added at the output ResNet blocks and other to create squeeze-excitation networks.

Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.layer_blocks.se_blocks.SEBlock2D(*args: Any, **kwargs: Any)[source]

From https://arxiv.org/abs/1709.01507

__init__(num_channels, r=16, activation={'inplace': True, 'name': 'relu'})[source]
forward(x)[source]
class hyperion.torch.layer_blocks.se_blocks.TSEBlock2D(*args: Any, **kwargs: Any)[source]

From https://arxiv.org/abs/1709.01507 Modified to do pooling only in time dimension

__init__(num_channels, num_feats, r=16, activation={'inplace': True, 'name': 'relu'})[source]
forward(x)[source]
class hyperion.torch.layer_blocks.se_blocks.SEBlock1d(*args: Any, **kwargs: Any)[source]

1d Squeeze Excitation version of https://arxiv.org/abs/1709.01507

__init__(num_channels, r=16, activation={'inplace': True, 'name': 'relu'})[source]
forward(x)[source]
hyperion.torch.layer_blocks.se_blocks.SEBlock2d

alias of hyperion.torch.layer_blocks.se_blocks.SEBlock2D

hyperion.torch.layer_blocks.se_blocks.TSEBlock2d

alias of hyperion.torch.layer_blocks.se_blocks.TSEBlock2D

Cannonical ResNet Blocks

These are blocks used to create cannonical ResNet, SE-ResNet, Res2Nets, etc.

ResNet Blocks

These blocks are used to create cannonical ResNets.

Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

hyperion.torch.layer_blocks.resnet_blocks._conv3x3(in_channels, out_channels, stride=1, groups=1, dilation=1, bias=False)[source]

3x3 convolution with padding

hyperion.torch.layer_blocks.resnet_blocks._conv1x1(in_channels, out_channels, stride=1, bias=False)[source]

1x1 convolution

class hyperion.torch.layer_blocks.resnet_blocks.ResNetInputBlock(*args: Any, **kwargs: Any)[source]

Input block for ResNet architecture

Parameters
  • in_channels – input channels

  • out_channels – output channels

  • kernel_size – kernel size for conv

  • stride – stride for conv

  • activation – str/dict indicationg activation type and arguments

  • norm_layer – norm_layer object constructor, if None it uses BatchNorm2d

  • norm_before – if True it applies the norm_layer before the activation, if False, after the activation

  • do_maxpool – apply maxpooling 2x2 at the output

__init__(in_channels, out_channels, kernel_size=7, stride=2, activation={'inplace': True, 'name': 'relu'}, norm_layer=None, norm_before=True, do_maxpool=True)[source]
forward(x)[source]
class hyperion.torch.layer_blocks.resnet_blocks.ResNetBasicBlock(*args: Any, **kwargs: Any)[source]
expansion = 1
__init__(in_channels, channels, activation={'inplace': True, 'name': 'relu'}, stride=1, dropout_rate=0, groups=1, dilation=1, norm_layer=None, norm_before=True)[source]
property out_channels
forward(x)[source]
class hyperion.torch.layer_blocks.resnet_blocks.ResNetBNBlock(*args: Any, **kwargs: Any)[source]
expansion = 4
__init__(in_channels, channels, activation={'inplace': True, 'name': 'relu'}, stride=1, dropout_rate=0, groups=1, dilation=1, norm_layer=None, norm_before=True)[source]
property out_channels
forward(x)[source]
class hyperion.torch.layer_blocks.resnet_blocks.Interpolate(*args: Any, **kwargs: Any)[source]
__init__(scale_factor, mode='nearest')[source]
forward(x)[source]
class hyperion.torch.layer_blocks.resnet_blocks.ResNetEndpointBlock(*args: Any, **kwargs: Any)[source]
__init__(in_channels, out_channels, scale, activation={'inplace': True, 'name': 'relu'}, norm_layer=None, norm_before=True)[source]
forward(x)[source]

SE-ResNet Blocks

These blocks are used to create cannonical Squeeze-Excitation ResNets

Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.layer_blocks.seresnet_blocks.SEResNetBasicBlock(*args: Any, **kwargs: Any)[source]
__init__(in_channels, channels, activation={'inplace': True, 'name': 'relu'}, stride=1, dropout_rate=0, groups=1, dilation=1, norm_layer=None, norm_before=True, se_r=16, time_se=False, num_feats=None)[source]
forward(x)[source]
expansion = 1
property out_channels
class hyperion.torch.layer_blocks.seresnet_blocks.SEResNetBNBlock(*args: Any, **kwargs: Any)[source]
__init__(in_channels, channels, activation={'inplace': True, 'name': 'relu'}, stride=1, dropout_rate=0, groups=1, dilation=1, norm_layer=None, norm_before=True, se_r=16, time_se=False, num_feats=None)[source]
expansion = 4
property out_channels
forward(x)[source]

SE-ResNet Blocks

These blocks are used to create cannonical Squeeze-Excitation ResNets.

Copyright 2020 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

hyperion.torch.layer_blocks.res2net_blocks._conv3x3(in_channels, out_channels, stride=1, groups=1, dilation=1, bias=False)[source]

3x3 convolution with padding

hyperion.torch.layer_blocks.res2net_blocks._conv1x1(in_channels, out_channels, stride=1, bias=False)[source]

1x1 convolution

class hyperion.torch.layer_blocks.res2net_blocks.Res2NetBasicBlock(*args: Any, **kwargs: Any)[source]
expansion = 1
__init__(in_channels, channels, activation={'inplace': True, 'name': 'relu'}, stride=1, dropout_rate=0, width_factor=1, scale=4, groups=1, dilation=1, norm_layer=None, norm_before=True, se_r=None, time_se=False, num_feats=None)[source]
property out_channels
forward(x)[source]
class hyperion.torch.layer_blocks.res2net_blocks.Res2NetBNBlock(*args: Any, **kwargs: Any)[source]
expansion = 4
__init__(in_channels, channels, activation={'inplace': True, 'name': 'relu'}, stride=1, dropout_rate=0, width_factor=1, scale=4, groups=1, dilation=1, norm_layer=None, norm_before=True, se_r=None, time_se=False, num_feats=None)[source]
property out_channels
forward(x)[source]

SpineNet Blocks

These are some extra blocks needed to build SpineNet and Spine2Net.

Copyright 2020 Magdalena Rybicka Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.layer_blocks.spine_blocks.Interpolate(*args: Any, **kwargs: Any)[source]
__init__(scale_factor, mode='nearest')[source]
forward(x)[source]
hyperion.torch.layer_blocks.spine_blocks._conv3x3(in_channels, out_channels, stride=1, groups=1, dilation=1, bias=False)[source]

3x3 convolution with padding

hyperion.torch.layer_blocks.spine_blocks._conv1x1(in_channels, out_channels, stride=1, bias=False)[source]

1x1 convolution

hyperion.torch.layer_blocks.spine_blocks._subpixel_conv1x1(in_channels, out_channels, stride=1, bias=False)[source]

point-wise subpixel convolution

class hyperion.torch.layer_blocks.spine_blocks.SpineConv(*args: Any, **kwargs: Any)[source]
__init__(in_channels, channels, stride=1, dropout_rate=0, groups=1, dilation=1, activation={'inplace': True, 'name': 'relu'}, norm_layer=None, norm_before=True)[source]

Class that connects the ouputs of the SpineNet to the rest of the network

forward(x)[source]
class hyperion.torch.layer_blocks.spine_blocks.BlockSpec(level, block_fn, input_offsets, is_output)[source]

A container class that specifies the block configuration for SpineNet.

__init__(level, block_fn, input_offsets, is_output)[source]
static build_block_specs(block_specs=None)[source]

Builds the list of BlockSpec objects for SpineNet.

class hyperion.torch.layer_blocks.spine_blocks.SpineEndpoints(*args: Any, **kwargs: Any)[source]
__init__(in_channels, channels, level, target_level, upsampling_type='nearest', stride=1, activation={'inplace': True, 'name': 'relu'}, norm_layer=None, norm_before=True, do_endpoint_conv=True)[source]

Class that connects the ouputs of the SpineNet to the rest of the network

forward(x)[source]
class hyperion.torch.layer_blocks.spine_blocks.SpineResample(*args: Any, **kwargs: Any)[source]
__init__(spec, in_channels, out_channels, scale, alpha, upsampling_type='nearest', activation={'inplace': True, 'name': 'relu'}, norm_layer=None, norm_before=True)[source]

Class that build a resampling connection between single SpineNet blocks.

forward(x)[source]

MobileNet Blocks

These are blocks needed to build EfficientNet networks.

Copyright 2020 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

hyperion.torch.layer_blocks.mbconv_blocks._conv1x1(in_channels, out_channels, stride=1, bias=False)[source]

1x1 convolution

hyperion.torch.layer_blocks.mbconv_blocks._dwconvkxk(channels, kernel_size=3, stride=1, bias=False)[source]

kxk depth-wise convolution with padding

class hyperion.torch.layer_blocks.mbconv_blocks.MBConvBlock(*args: Any, **kwargs: Any)[source]
__init__(in_channels, out_channels, expansion=6, kernel_size=3, stride=1, activation='swish', drop_connect_rate=0, norm_layer=None, se_r=None, time_se=False, num_feats=None)[source]
forward(x)[source]
class hyperion.torch.layer_blocks.mbconv_blocks.MBConvInOutBlock(*args: Any, **kwargs: Any)[source]
__init__(in_channels, out_channels, kernel_size=3, stride=2, activation='swish', norm_layer=None)[source]
forward(x)[source]

Generic ResNet Blocks

ResNet 1d Blocks

These are blocks used to buld flexible ResNets based on 1d convs.

Copyright 2020 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

hyperion.torch.layer_blocks.resnet1d_blocks._convk(in_channels, out_channels, kernel_size=3, stride=1, groups=1, dilation=1, bias=False)[source]

kernel k convolution with padding

hyperion.torch.layer_blocks.resnet1d_blocks._conv1(in_channels, out_channels, stride=1, bias=False)[source]

point-wise convolution

hyperion.torch.layer_blocks.resnet1d_blocks._subpixel_conv1(in_channels, out_channels, stride=1, bias=False)[source]

point-wise subpixel convolution

hyperion.torch.layer_blocks.resnet1d_blocks._subpixel_convk(in_channels, out_channels, kernel_size=3, stride=1, groups=1, dilation=1, bias=False)[source]

kernel k subpixel convolution with padding

class hyperion.torch.layer_blocks.resnet1d_blocks.ResNet1dBasicBlock(*args: Any, **kwargs: Any)[source]
expansion = 1
__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, drop_connect_rate=0, groups=1, dilation=1, use_norm=True, norm_layer=None, norm_before=True)[source]
property out_channels
forward(x)[source]
class hyperion.torch.layer_blocks.resnet1d_blocks.ResNet1dBasicDecBlock(*args: Any, **kwargs: Any)[source]
expansion = 1
__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, drop_connect_rate=0, groups=1, dilation=1, use_norm=True, norm_layer=None, norm_before=True)[source]
property out_channels
forward(x)[source]
class hyperion.torch.layer_blocks.resnet1d_blocks.ResNet1dBNBlock(*args: Any, **kwargs: Any)[source]
__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, drop_connect_rate=0, groups=1, dilation=1, expansion=4, use_norm=True, norm_layer=None, norm_before=True)[source]
property out_channels
forward(x)[source]
class hyperion.torch.layer_blocks.resnet1d_blocks.ResNet1dBNDecBlock(*args: Any, **kwargs: Any)[source]
__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, drop_connect_rate=0, groups=1, dilation=1, expansion=4, use_norm=True, norm_layer=None, norm_before=True)[source]
property out_channels
forward(x)[source]
class hyperion.torch.layer_blocks.resnet1d_blocks.SEResNet1dBasicBlock(*args: Any, **kwargs: Any)[source]
expansion = 1
__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, drop_connect_rate=0, groups=1, dilation=1, se_r=16, use_norm=True, norm_layer=None, norm_before=True)[source]
forward(x)[source]
property out_channels
class hyperion.torch.layer_blocks.resnet1d_blocks.SEResNet1dBasicDecBlock(*args: Any, **kwargs: Any)[source]
expansion = 1
__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, drop_connect_rate=0, groups=1, dilation=1, se_r=16, use_norm=True, norm_layer=None, norm_before=True)[source]
property out_channels
forward(x)[source]
class hyperion.torch.layer_blocks.resnet1d_blocks.SEResNet1dBNBlock(*args: Any, **kwargs: Any)[source]
property out_channels
__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, drop_connect_rate=0, groups=1, dilation=1, expansion=4, se_r=16, use_norm=True, norm_layer=None, norm_before=True)[source]
forward(x)[source]
class hyperion.torch.layer_blocks.resnet1d_blocks.SEResNet1dBNDecBlock(*args: Any, **kwargs: Any)[source]
property out_channels
__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, drop_connect_rate=0, groups=1, dilation=1, expansion=4, se_r=16, use_norm=True, norm_layer=None, norm_before=True)[source]
forward(x)[source]
class hyperion.torch.layer_blocks.resnet1d_blocks.ResNet1dEndpoint(*args: Any, **kwargs: Any)[source]
__init__(in_channels, channels, in_scale, scale, upsampling_mode='nearest', activation={'inplace': True, 'name': 'relu6'}, norm_layer=None, norm_before=True)[source]

Class that connects the ouputs of the ResNet1d to the rest of the network when using multilevel feature aggregation

It converts the features of all the levels that we are going to aggregate to the same temporal scale

forward(x)[source]

Res2Net 1d Blocks

These are blocks used to buld flexible Res2Nets based on 1d convs.

Copyright 2020 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

hyperion.torch.layer_blocks.res2net1d_blocks._convk(in_channels, out_channels, kernel_size=3, stride=1, groups=1, dilation=1, bias=False)[source]

kernel k convolution with padding

hyperion.torch.layer_blocks.res2net1d_blocks._conv1(in_channels, out_channels, stride=1, bias=False)[source]

point-wise convolution

class hyperion.torch.layer_blocks.res2net1d_blocks.Res2Net1dBasicBlock(*args: Any, **kwargs: Any)[source]
expansion = 1
__init__(in_channels, channels, kernel_size=3, activation={'inplace': True, 'name': 'relu6'}, stride=1, dropout_rate=0, drop_connect_rate=0, width_factor=1, scale=4, groups=1, dilation=1, use_norm=True, norm_layer=None, norm_before=True, se_r=None)[source]
property out_channels
forward(x)[source]
class hyperion.torch.layer_blocks.res2net1d_blocks.Res2Net1dBNBlock(*args: Any, **kwargs: Any)[source]
__init__(in_channels, channels, kernel_size=3, activation={'inplace': True, 'name': 'relu6'}, stride=1, dropout_rate=0, drop_connect_rate=0, width_factor=1, scale=4, groups=1, dilation=1, use_norm=True, norm_layer=None, norm_before=True, se_r=None, num_feats=None)[source]
property out_channels
property expansion
forward(x)[source]

ResNet 2d Blocks

These are blocks used to buld flexible ResNets based on 2d convs.

Copyright 2020 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

hyperion.torch.layer_blocks.resnet2d_blocks._convkxk(in_channels, out_channels, kernel_size=3, stride=1, groups=1, dilation=1, bias=False)[source]

kernel k convolution with padding

hyperion.torch.layer_blocks.resnet2d_blocks._conv1x1(in_channels, out_channels, stride=1, bias=False)[source]

point-wise convolution

hyperion.torch.layer_blocks.resnet2d_blocks._subpixel_conv1x1(in_channels, out_channels, stride=1, bias=False)[source]

point-wise subpixel convolution

hyperion.torch.layer_blocks.resnet2d_blocks._subpixel_convkxk(in_channels, out_channels, kernel_size=3, stride=1, groups=1, dilation=1, bias=False)[source]

kernel k subpixel convolution with padding

class hyperion.torch.layer_blocks.resnet2d_blocks.ResNet2dBasicBlock(*args: Any, **kwargs: Any)[source]
expansion = 1
__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, groups=1, dilation=1, use_norm=True, norm_layer=None, norm_before=True)[source]
property out_channels
forward(x)[source]
class hyperion.torch.layer_blocks.resnet2d_blocks.ResNet2dBasicDecBlock(*args: Any, **kwargs: Any)[source]
expansion = 1
__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, groups=1, dilation=1, use_norm=True, norm_layer=None, norm_before=True)[source]
property out_channels
forward(x)[source]
class hyperion.torch.layer_blocks.resnet2d_blocks.ResNet2dBNBlock(*args: Any, **kwargs: Any)[source]
__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, groups=1, dilation=1, expansion=4, use_norm=True, norm_layer=None, norm_before=True)[source]
property out_channels
forward(x)[source]
class hyperion.torch.layer_blocks.resnet2d_blocks.ResNet2dBNDecBlock(*args: Any, **kwargs: Any)[source]
__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, groups=1, dilation=1, expansion=4, use_norm=True, norm_layer=None, norm_before=True)[source]
property out_channels
forward(x)[source]
class hyperion.torch.layer_blocks.resnet2d_blocks.SEResNet2dBasicBlock(*args: Any, **kwargs: Any)[source]
expansion = 1
__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, groups=1, dilation=1, se_r=16, use_norm=True, norm_layer=None, norm_before=True)[source]
property out_channels
forward(x)[source]
class hyperion.torch.layer_blocks.resnet2d_blocks.SEResNet2dBasicDecBlock(*args: Any, **kwargs: Any)[source]
expansion = 1
__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, groups=1, dilation=1, se_r=16, use_norm=True, norm_layer=None, norm_before=True)[source]
property out_channels
forward(x)[source]
class hyperion.torch.layer_blocks.resnet2d_blocks.SEResNet2dBNBlock(*args: Any, **kwargs: Any)[source]
property out_channels
__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, groups=1, dilation=1, expansion=4, se_r=16, use_norm=True, norm_layer=None, norm_before=True)[source]
forward(x)[source]
class hyperion.torch.layer_blocks.resnet2d_blocks.SEResNet2dBNDecBlock(*args: Any, **kwargs: Any)[source]
property out_channels
__init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, groups=1, dilation=1, expansion=4, se_r=16, use_norm=True, norm_layer=None, norm_before=True)[source]
forward(x)[source]

Res2Net 2d Blocks

These are blocks used to buld flexible Res2Nets based on 2d convs.

Copyright 2020 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

hyperion.torch.layer_blocks.res2net2d_blocks._convkxk(in_channels, out_channels, kernel_size=3, stride=1, groups=1, dilation=1, bias=False)[source]

kernel k convolution with padding

hyperion.torch.layer_blocks.res2net2d_blocks._conv1x1(in_channels, out_channels, stride=1, bias=False)[source]

1x1 convolution

class hyperion.torch.layer_blocks.res2net2d_blocks.Res2Net2dBasicBlock(*args: Any, **kwargs: Any)[source]
expansion = 1
__init__(in_channels, channels, kernel_size=3, activation={'inplace': True, 'name': 'relu6'}, stride=1, dropout_rate=0, width_factor=1, scale=4, groups=1, dilation=1, use_norm=True, norm_layer=None, norm_before=True, se_r=None, time_se=False, num_feats=None)[source]
property out_channels
forward(x)[source]
class hyperion.torch.layer_blocks.res2net2d_blocks.Res2Net2dBNBlock(*args: Any, **kwargs: Any)[source]
__init__(in_channels, channels, kernel_size=3, activation={'inplace': True, 'name': 'relu6'}, stride=1, dropout_rate=0, width_factor=1, scale=4, groups=1, dilation=1, use_norm=True, norm_layer=None, norm_before=True, se_r=None, time_se=False, num_feats=None)[source]
property out_channels
property expansion
forward(x)[source]

Transformer Blocks

These are blocks used to build Transformers.

Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.layer_blocks.transformer_conv2d_subsampler.TransformerConv2dSubsampler(*args: Any, **kwargs: Any)[source]

Convolutional 2D subsampling (to 1/4 length) Tor transformer

in_feats

input feature dimension

out_feats

Transformer d_model

hid_act

activation layer object

pos_enc

positional encoder layer

time_dim

indicates which is the time dimension in the input tensor

__init__(in_feats, out_feats, hid_act, pos_enc, time_dim=1)[source]
forward(x, mask)[source]

Forward function.

Parameters
  • x – input tensor with size=(batch, time, num_feats)

  • mask – mask to indicate valid time steps for x (batch, time1, time2)

Returns

Tensor with output features Tensor with subsampled mask

Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.layer_blocks.transformer_encoder_v1.TransformerEncoderBlockV1(*args: Any, **kwargs: Any)[source]

Building block for transformer encoder.

num_feats

input/output feat. dimension (aka d_model)

self_attn

attention nn.Module or string in [‘scaled-dot-prod-att-v1’, ‘local-scaled-dot-prod-att-v1’]

num_heads

number of heads

feed_forward

position-wise feed-forward nn.Module or string in [‘linear’, ‘conv1dx2’, ‘conv1d-linear’]

d_ff

dimension of middle layer in feed_forward block

ff_kernel_size

kernel size for convolutional versions of ff block

ff_act

ff block hidden activation

ff_dropout_rate

dropout rate for ff block

att_context

maximum context range for local attention

att_dropout_rate

dropout rate for attention block

rel_pos_enc

if True, use relative postional encodings, absolute encodings otherwise.

causal_pos_enc

if True, use causal positional encodings (when rel_pos_enc=True), it assumes that query q_i only attents to key k_j when j<=i

norm_before

if True, use layer norm before layers, otherwise after

concat_after
if True, if concats attention input and output and apply linear transform, i.e.,

y = x + linear(concat(x, att(x)))

if False, y = x + att(x)

__init__(num_feats, self_attn, num_heads, feed_forward, d_ff, ff_kernel_size, ff_act='relu6', ff_dropout_rate=0, att_context=25, att_dropout_rate=0, rel_pos_enc=False, causal_pos_enc=False, norm_before=True, concat_after=False)[source]
static _make_att(att_type, num_feats, num_heads, context, dropout_rate, rel_pos_enc, causal_pos_enc)[source]

Creates multihead attention block from att_type string

Parameters
  • att_type – string in [‘scaled-dot-prod-att-v1’, ‘local-scaled-dot-prod-att-v1’]

  • num_feats – input/output feat. dimension (aka d_model)

  • num_heads – number of heads

  • dropout_rate – dropout rate for attention block

  • rel_pos_enc – if True, use relative postional encodings, absolute encodings otherwise.

  • causal_pos_enc – if True, use causal positional encodings (when rel_pos_enc=True), it assumes that query q_i only attents to key k_j when j<=i

Returns

Attention nn.Module

static _make_ff(ff_type, num_feats, hid_feats, kernel_size, activation, dropout_rate)[source]

Creates position-wise feed forward block from ff_type string

Parameters
  • ff_type – string in [‘linear’, ‘conv1dx2’, ‘conv1d-linear’]

  • num_feats – input/output feat. dimension (aka d_model)

  • hid_feats – dimension of middle layer in feed_forward block

  • kernel_size – kernel size for convolutional versions of ff block

  • dropout_rate – dropout rate for ff block

  • activation – activation function for ff block

Returns

Position-wise feed-forward nn.Module

forward(x, pos_emb=None, mask=None)[source]

Forward pass function

Parameters
  • x – input tensor with size=(batch, time, num_feats)

  • pos_emb – positional embedding size=(batch, time2, in_feats) as R_{L-1}, …, R_0, when using relative postional encoder, otherwise None

  • mask – mask to indicate valid time steps for x (batch, time)

Returns

Tensor with output features Tensor with mask

Copyright 2019 Johns Hopkins University (Author: Jesus Villalba, Nanxin Chen) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.layer_blocks.transformer_feedforward.PositionwiseFeedForward(*args: Any, **kwargs: Any)[source]

Positionwise feed forward layer for transfomer.

num_feats

input/output dimenstion

hid_feats

number of hidden units

activation

activation function for hidden layers

dropout_rate

dropout rate

time_dim

time dimension in the input tensor

__init__(num_feats, hid_feats, activation='relu6', dropout_rate=0, time_dim=1)[source]
forward(x)[source]

Forward function.

Parameters

x – input size=(batch, time, num_feats)

Returns

tensor size=(batch, time, num_feats)

class hyperion.torch.layer_blocks.transformer_feedforward.Conv1dx2(*args: Any, **kwargs: Any)[source]

Two layer Conv1d for transformer feed-forward block

Introduced in FastSpeech: Fast, Robust and Controllable Text to Speech. .. FastSpeech: Fast, Robust and Controllable Text to Speech:

num_channels

input/output channels.

hid_channels

hidden channels

kernel_size

conv kernel size

activation

activation function for hidden layers

dropout_rate

dropout rate

time_dim

indicates what is the time dimension in the input tensor.

__init__(num_channels, hid_channels, kernel_size, dropout_rate=0, time_dim=- 1)[source]
forward(x)[source]

Calculates forward propagation. :param x: input tensors with size=(batch, time, num_channels) or

size=(batch, num_channels, time).

Returns

output tensor same size as input

class hyperion.torch.layer_blocks.transformer_feedforward.Conv1dLinear(*args: Any, **kwargs: Any)[source]

Conv1D + Linear for Transformer block.

num_channels

input/output channels.

hid_channels

hidden channels

kernel_size

conv kernel size

activation

activation function for hidden layers

dropout_rate

dropout rate

time_dim

indicates what is the time dimension in the input tensor.

__init__(num_channels, hid_channels, kernel_size, dropout_rate=0, time_dim=- 1)[source]
forward(x)[source]

Calculates forward propagation. :param x: input tensors with size=(batch, time, num_channels) or

size=(batch, num_channels, time).

Returns

output tensor same size as input

Conformer Blocks

Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.layer_blocks.conformer_encoder_v1.ConformerEncoderBlockV1(*args: Any, **kwargs: Any)[source]
Building block for conformer encoder introduced in

https://arxiv.org/pdf/2005.08100.pdf

This includes some optional extra features not included in the original paper:

  • Choose local-attention (attending only to close frames instead of all the frames in the sequence)

  • Choose number of conv blocks

  • Squeeze-Excitation after depthwise-conv

  • Allows downsampling in time dimension

  • Allows choosing activation and layer normalization type

We call this Conformer+

num_feats

input/output feat. dimension (aka d_model)

self_attn

attention module in [‘scaled-dot-prod-att-v1’, ‘local-scaled-dot-prod-att-v1’]

num_heads

number of heads

conv_repeats

number of conv blocks

conv_kernel_size

kernel size for conv blocks

conv_stride

stride for depth-wise conv in first conv block

feed_forward

position-wise feed-forward string in [‘linear’, ‘conv1dx2’, ‘conv1d-linear’]

d_ff

dimension of middle layer in feed_forward block

ff_kernel_size

kernel size for convolutional versions of ff block

hid_act

ff and conv block hidden activation

dropout_rate

dropout rate for ff and conv blocks

att_context

maximum context range for local attention

att_dropout_rate

dropout rate for attention block

causal_pos_enc

if True, use causal positional encodings (when rel_pos_enc=True), it assumes that query q_i only attents to key k_j when j<=i

conv_norm_layer

norm layer constructor for conv block, if None it uses BatchNorm

se_r

Squeeze-Excitation compression ratio, if None it doesn’t use Squeeze-Excitation

ff_macaron

if True, it uses macaron-net style ff layers, otherwise transformer style.

out_lnorm

if True, use LNorm layer at the output as in the conformer paper, we think that this layer is redundant and put it to False by default

concat_after
if True, if concats attention input and output and apply linear transform, i.e.,

y = x + linear(concat(x, att(x)))

if False, y = x + att(x)

__init__(num_feats, self_attn, num_heads, conv_repeats=1, conv_kernel_size=31, conv_stride=1, feed_forward='linear', d_ff=2048, ff_kernel_size=3, hid_act='swish', dropout_rate=0, att_context=25, att_dropout_rate=0, pos_enc_type='rel', causal_pos_enc=False, conv_norm_layer=None, se_r=None, ff_macaron=True, out_lnorm=False, concat_after=False)[source]
static _make_att(att_type, num_feats, num_heads, context, dropout_rate, pos_enc_type, causal_pos_enc)[source]

Creates multihead attention block from att_type string

Parameters
  • att_type – string in [‘scaled-dot-prod-att-v1’, ‘local-scaled-dot-prod-att-v1’]

  • num_feats – input/output feat. dimension (aka d_model)

  • num_heads – number of heads

  • dropout_rate – dropout rate for attention block

  • pos_enc_type – type of positional encoder

  • causal_pos_enc – if True, use causal positional encodings (when rel_pos_enc=True), it assumes that query q_i only attents to key k_j when j<=i

Returns

Attention nn.Module

static _make_ff(ff_type, num_feats, hid_feats, kernel_size, activation, dropout_rate)[source]

Creates position-wise feed forward block from ff_type string

Parameters
  • ff_type – string in [‘linear’, ‘conv1dx2’, ‘conv1d-linear’]

  • num_feats – input/output feat. dimension (aka d_model)

  • hid_feats – dimension of middle layer in feed_forward block

  • kernel_size – kernel size for convolutional versions of ff block

  • dropout_rate – dropout rate for ff block

  • activation – activation function for ff block

Returns

Position-wise feed-forward nn.Module

forward(x, pos_emb=None, mask=None)[source]

Forward pass function

Parameters
  • x – input tensor with size=(batch, time, num_feats)

  • pos_emb – positional embedding size=(batch, time2, in_feats) as R_{L-1}, …, R_0, when using relative postional encoder, otherwise None

  • mask – mask to indicate valid time steps for x (batch, time)

Returns

Tensor with output features Tensor with mask

Copyright 2020 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

hyperion.torch.layer_blocks.conformer_conv._conv1(in_channels, out_channels, bias=False)[source]

1x1 convolution

hyperion.torch.layer_blocks.conformer_conv._dwconvk(channels, kernel_size, stride=1, bias=False)[source]

kxk depth-wise convolution with padding

class hyperion.torch.layer_blocks.conformer_conv.ConformerConvBlock(*args: Any, **kwargs: Any)[source]
Convolutional block for conformer introduced at

https://arxiv.org/pdf/2005.08100.pdf

This includes some optional extra features not included in the original paper:

  • Squeeze-Excitation after depthwise-conv

  • Allows downsampling in time dimension

  • Allows choosing activation and layer normalization type

num_channels

number of input/output channels

kernel_size

kernel_size for depth-wise conv

stride

stride for depth-wise conv

activation

activation function str or object

norm_layer

norm layer constructor, if None it uses BatchNorm

dropout_rate

dropout rate

se_r

Squeeze-Excitation compression ratio, if None it doesn’t use Squeeze-Excitation

__init__(num_channels, kernel_size, stride=1, activation='swish', norm_layer=None, dropout_rate=0, se_r=None)[source]
forward(x)[source]

Forward function

Parameters

x – input size = (batch, num_channels, time)

Returns

torch.Tensor size = (batch, num_channels, (time-1)//stride+1)

Torch Models and Model Loader

All PyTorch ML Neural Architectures and Models in Hyperion derive from the same base class

class hyperion.torch.TorchModel(*args: Any, **kwargs: Any)[source]
get_config()[source]
copy()[source]
save(file_path)[source]
freeze()[source]
unfreeze()[source]
classmethod load(file_path=None, cfg=None, state_dict=None)[source]
get_reg_loss()[source]
get_loss()[source]
property device
__init__(*args: Any, **kwargs: Any) None

The TorchModelLoader can load any model or network architecture from file.

Neural Architectures

All neural architectures derive from the NetArch class.

class hyperion.torch.narchs.net_arch.NetArch(*args: Any, **kwargs: Any)[source]
in_context()[source]
in_dim()[source]
out_dim()[source]
in_shape()[source]
out_shape(in_shape=None)[source]
__init__(*args: Any, **kwargs: Any) None
copy()
property device
freeze()
get_config()
get_loss()
get_reg_loss()
classmethod load(file_path=None, cfg=None, state_dict=None)
save(file_path)
unfreeze()

The TorchNALoader can load any network architecture from file.

class hyperion.torch.narchs.torch_na_loader.TorchNALoader[source]
static load(file_path, extra_objs={})[source]
static load_from_cfg(cfg, state_dict=None, extra_objs={})[source]

Acoustic Features

class hyperion.torch.narchs.audio_feats_mvn.AudioFeatsMVN(*args: Any, **kwargs: Any)[source]

Acoustic Feature Extractor + ST-MVN Optional SpecAugment

__init__(audio_feats, mvn=None, spec_augment=None, trans=False, aug_after_mvn=False)[source]
property fs
property frame_length
property frame_shift
copy()
property device
forward(x, lengths=None)[source]
freeze()
get_loss()
get_reg_loss()
in_context()
in_dim()
in_shape()
classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)
save(file_path)
unfreeze()
get_config()[source]
static filter_args(**kwargs)[source]
add_class_args(prefix=None)[source]

Fully Connected Network

Classification Head

class hyperion.torch.narchs.classif_head.ClassifHead(*args: Any, **kwargs: Any)[source]

Classification Head for x-vector style networks

in_feats

input features

num_classes

number of output classes

embed_dim

dimension of embedding layer

num_embed_layers

number of hidden layers

hid_act

str or dict hidden activation type in [‘relu’, ‘relu6’, ‘swish’, … ]

loss_type

type of loss function that will be used with the x-vector in [‘softmax’, ‘cos-softmax’, ‘arc-softmax’], corresponding to standard cross-entorpy, additive margin softmax or additive angular margin softmax.

s

scale parameter for cos-softmax and arc-softmax

margin

margin parameter for cos-softmax and arc-softmax

margin_warmup_epochs

number of epochs to anneal the margin from 0 to margin

num_subcenters

number of subcenters in subcenter losses

norm_layer

norm_layer object or str indicating type norm layer, if None it uses BatchNorm1d

use_norm

it True it uses layer/batch-normalization

norm_before

if True, layer-norm is before the activation function

__init__(in_feats, num_classes, embed_dim=256, num_embed_layers=1, hid_act={'inplace': True, 'name': 'relu'}, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=0, num_subcenters=2, norm_layer=None, use_norm=True, norm_before=True, dropout_rate=0)[source]
rebuild_output_layer(num_classes, loss_type, s, margin, margin_warmup_epochs, num_subcenters=2)[source]
set_margin(margin)[source]
set_margin_warmup_epochs(margin_warmup_epochs)[source]
set_s(s)[source]
update_margin(epoch)[source]
freeze_layers(layer_list)[source]
put_layers_in_eval_mode(layer_list)[source]
forward(x, y=None)[source]
forward_hid_feats(x, y=None, layers=None, return_output=False)[source]
extract_embed(x, embed_layer=0)[source]
copy()
property device
freeze()
get_config()[source]
get_loss()
get_reg_loss()
in_context()
in_dim()
in_shape()
classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)
save(file_path)
unfreeze()
static filter_args(**kwargs)[source]
static add_class_args(parser, prefix=None)[source]
static add_argparse_args(parser, prefix=None)

Deep Convolutional Encoder/Decoders

These are Encoder/Decoders based on Deep Convolutional Networks 1d and 2d.

DC Encoder 1d

class hyperion.torch.narchs.dc1d_encoder.DC1dEncoder(*args: Any, **kwargs: Any)[source]
__init__(in_feats, in_conv_channels=128, in_kernel_size=3, in_stride=1, conv_repeats=[1, 1, 1], conv_channels=[128, 64, 32], conv_kernel_sizes=3, conv_strides=2, conv_dilations=1, head_channels=0, hid_act='relu6', head_act=None, dropout_rate=0, use_norm=True, norm_layer=None, norm_before=True)[source]
in_context()[source]
in_shape()[source]
out_shape(in_shape=None)[source]
forward(x)[source]
get_config()[source]
static filter_args(**kwargs)[source]
static add_class_args(parser, prefix=None, head_channels=False, in_feats=False)[source]
static add_argparse_args(parser, prefix=None, head_channels=False, in_feats=False)
copy()
property device
freeze()
get_loss()
get_reg_loss()
in_dim()
classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
save(file_path)
unfreeze()

DC Decoder 1d

class hyperion.torch.narchs.dc1d_decoder.DC1dDecoder(*args: Any, **kwargs: Any)[source]
__init__(in_channels=32, in_conv_channels=32, in_kernel_size=3, in_stride=1, conv_repeats=[1, 1, 1], conv_channels=[64, 128, 128], conv_kernel_sizes=3, conv_strides=2, conv_dilations=1, head_channels=0, hid_act='relu6', head_act=None, dropout_rate=0, use_norm=True, norm_layer=None, norm_before=True)[source]
in_context()[source]
in_shape()[source]
out_shape(in_shape=None)[source]
forward(x, target_shape=None)[source]
get_config()[source]
static filter_args(**kwargs)[source]
static add_class_args(parser, prefix=None, head_channels=False)[source]
static add_argparse_args(parser, prefix=None, head_channels=False)
copy()
property device
freeze()
get_loss()
get_reg_loss()
in_dim()
classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
save(file_path)
unfreeze()

DC Encoder 2d

class hyperion.torch.narchs.dc2d_encoder.DC2dEncoder(*args: Any, **kwargs: Any)[source]
__init__(in_channels=1, in_conv_channels=128, in_kernel_size=3, in_stride=1, conv_repeats=[1, 1, 1], conv_channels=[128, 64, 32], conv_kernel_sizes=3, conv_strides=2, conv_dilations=1, head_channels=0, hid_act='relu6', head_act=None, dropout_rate=0, use_norm=True, norm_layer=None, norm_before=True)[source]
in_context()[source]
in_shape()[source]
out_shape(in_shape=None)[source]
forward(x)[source]
get_config()[source]
static filter_args(**kwargs)[source]
static add_class_args(parser, prefix=None, head_channels=False)[source]
static add_argparse_args(parser, prefix=None, head_channels=False)
copy()
property device
freeze()
get_loss()
get_reg_loss()
in_dim()
classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
save(file_path)
unfreeze()

DC Decoder 2d

class hyperion.torch.narchs.dc2d_decoder.DC2dDecoder(*args: Any, **kwargs: Any)[source]
__init__(in_channels=32, in_conv_channels=32, in_kernel_size=3, in_stride=1, conv_repeats=[1, 1, 1], conv_channels=[64, 128, 128], conv_kernel_sizes=3, conv_strides=2, conv_dilations=1, head_channels=0, hid_act='relu6', head_act=None, dropout_rate=0, use_norm=True, norm_layer=None, norm_before=True)[source]
in_context()[source]
in_shape()[source]
out_shape(in_shape=None)[source]
forward(x, target_shape=None)[source]
get_config()[source]
static filter_args(**kwargs)[source]
static add_class_args(parser, prefix=None, head_channels=False)[source]
static add_argparse_args(parser, prefix=None, head_channels=False)
copy()
property device
freeze()
get_loss()
get_reg_loss()
in_dim()
classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
save(file_path)
unfreeze()

TDNN Variants

These are variants of TDNNs. There is a factory class that creates TDNN networks from config params.

class hyperion.torch.narchs.tdnn_factory.TDNNFactory[source]
static create(tdnn_type, num_enc_blocks, in_feats, enc_hid_units, enc_expand_units=None, kernel_size=3, dilation=1, dilation_factor=1, hid_act={'inplace': True, 'name': 'relu6'}, out_units=0, out_act=None, dropout_rate=0, norm_layer=None, use_norm=True, norm_before=True, in_norm=True)[source]
filter_args()[source]
static add_class_args(parser, prefix=None)[source]
static add_argparse_args(parser, prefix=None)

TDNN

class hyperion.torch.narchs.tdnn.TDNNV1(*args: Any, **kwargs: Any)[source]
__init__(num_blocks, in_units, hid_units, out_units=0, kernel_size=3, dilation=1, dilation_factor=1, hid_act={'inplace': True, 'name': 'relu'}, out_act=None, dropout_rate=0, norm_layer=None, use_norm=True, norm_before=True, in_norm=True, pooling=None)[source]
property in_context
forward(x, use_amp=False)[source]
copy()
property device
freeze()
get_config()[source]
get_loss()
get_reg_loss()
in_dim()
classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
save(file_path)
unfreeze()
in_shape()[source]
out_shape(in_shape=None)[source]

E-TDNN

class hyperion.torch.narchs.etdnn.ETDNNV1(*args: Any, **kwargs: Any)[source]
__init__(num_blocks, in_units, hid_units, out_units=0, kernel_size=3, dilation=1, dilation_factor=1, hid_act={'inplace': True, 'name': 'relu'}, out_act=None, dropout_rate=0, norm_layer=None, use_norm=True, norm_before=True, in_norm=True, pooling=None)[source]
property in_context
forward(x)[source]
copy()
property device
freeze()
get_config()[source]
get_loss()
get_reg_loss()
in_dim()
classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
save(file_path)
unfreeze()
in_shape()[source]
out_shape(in_shape=None)[source]

Residual E-TDNN

class hyperion.torch.narchs.resetdnn.ResETDNNV1(*args: Any, **kwargs: Any)[source]
__init__(num_blocks, in_units, hid_units, expand_units, out_units=0, kernel_size=3, dilation=1, dilation_factor=1, hid_act={'inplace': True, 'name': 'relu'}, out_act=None, dropout_rate=0, norm_layer=None, use_norm=True, norm_before=True, in_norm=True, pooling=None)[source]
property in_context
forward(x)[source]
copy()
property device
freeze()
get_config()[source]
get_loss()
get_reg_loss()
in_dim()
classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
save(file_path)
unfreeze()
in_shape()[source]
out_shape(in_shape=None)[source]

Cannonical ResNets/SE-ResNets/Res2Nets

These classes can be used to build cannonical ResNets, SE-ResNets and Res2Nets. There is a factory class that creates ResNets from config params.

class hyperion.torch.narchs.resnet_factory.ResNetFactory[source]
static create(resnet_type, in_channels, conv_channels=64, base_channels=64, out_units=0, hid_act={'inplace': True, 'name': 'relu6'}, out_act=None, in_kernel_size=7, in_stride=2, zero_init_residual=False, groups=1, replace_stride_with_dilation=None, dropout_rate=0, norm_layer=None, norm_before=True, do_maxpool=True, in_norm=True, se_r=16, in_feats=None, res2net_scale=4, res2net_width_factor=1)[source]
filter_args()[source]
static add_class_args(parser, prefix=None)[source]
static add_argparse_args(parser, prefix=None)

Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.narchs.resnet.ResNet(*args: Any, **kwargs: Any)[source]

ResNet2D base class

block

resnet basic block type in [‘basic’, ‘bn’, ‘sebasic’, ‘sebn’], meaning basic resnet block, bottleneck resnet block, basic block with squeeze-excitation, and bottleneck block with squeeze-excitation

num_layers

list with the number of layers in each of the 4 layer blocks that we find in resnets, after each layer block feature maps are downsmapled times 2 in each dimension and channels are upsampled times 2.

in_channels

number of input channels

conv_channels

number of output channels in first conv layer (stem)

base_channels

number of channels in the first layer block

out_units

number of logits in the output layer, if 0 there is no output layer and resnet is used just as feature extractor, for example for x-vector encoder.

in_kernel_size

kernels size of first conv layer

hid_act

str or dictionary describing hidden activations.

out_act

output activation

zero_init_residual

initializes batchnorm weights to zero so each residual block behaves as identitiy at the beggining. We observed worse results when using this option in x-vectors

groups

number of groups in convolutions

replace_stride_with_dilation

use dialted conv nets instead of downsammpling, we never tested this.

dropout_rate

dropout rate

norm_layer

norm_layer object or str indicating type layer-norm object, if None it uses BatchNorm2d

do_maxpool

if False, removes the maxpooling layer at the stem of the network.

in_norm

if True, adds another batch norm layer in the input

se_r

squeeze-excitation dimension compression

time_se

if True squeeze-excitation embedding is obtaining by averagin only in the time dimension, instead of time-freq dimension or HxW dimensions

in_feats

input feature size (number of components in dimension of 2 of input tensor), this is only required when time_se=True to calculcate the size of the squeeze excitation matrices.

__init__(block, num_layers, in_channels, conv_channels=64, base_channels=64, out_units=0, hid_act={'inplace': True, 'name': 'relu6'}, out_act=None, in_kernel_size=7, in_stride=2, zero_init_residual=False, multilevel=False, endpoint_channels=64, groups=1, replace_stride_with_dilation=None, dropout_rate=0, norm_layer=None, norm_before=True, do_maxpool=True, in_norm=True, se_r=16, time_se=False, in_feats=None, res2net_scale=4, res2net_width_factor=1)[source]
_compute_out_size(in_size)[source]
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

in_context()[source]
Returns

Tuple (past, future) context required to predict one frame.

in_shape()[source]
Returns

Tuple describing input shape for the network

out_shape(in_shape=None)[source]

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

forward(x, use_amp=False)[source]
_forward(x)[source]

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

forward_hid_feats(x, layers=None, return_output=False)[source]

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

get_config()[source]

Gets network config :returns: dictionary with config params

copy()
property device
freeze()
get_loss()
get_reg_loss()
in_dim()
classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.ResNet18(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.ResNet34(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.ResNet50(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.ResNet101(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.ResNet152(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.ResNext50_32x4d(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.ResNext101_32x8d(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.WideResNet50(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.WideResNet101(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.LResNet18(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.LResNet34(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.LResNet50(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.LResNext50_4x4d(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.SEResNet18(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.SEResNet34(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.SEResNet50(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.SEResNet101(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.SEResNet152(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.SEResNext50_32x4d(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.SEResNext101_32x8d(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.SEWideResNet50(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.SEWideResNet101(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.SELResNet18(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.SELResNet34(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.SELResNet50(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.SELResNext50_4x4d(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.TSEResNet18(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.TSEResNet34(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.TSEResNet50(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.TSEResNet101(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.TSEResNet152(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.TSEResNext50_32x4d(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.TSEResNext101_32x8d(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.TSEWideResNet50(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.TSEWideResNet101(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.TSELResNet18(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.TSELResNet34(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.TSELResNet50(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.TSELResNext50_4x4d(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.Res2Net18(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.Res2Net34(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.Res2Net50(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.Res2Net101(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.Res2Net152(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.Res2Next50_32x4d(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.Res2Next101_32x8d(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.WideRes2Net50(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.WideRes2Net101(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.LRes2Net50(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.LRes2Next50_4x4d(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.SERes2Net18(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.SERes2Net34(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.SERes2Net50(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.SERes2Net101(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.SERes2Net152(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.SERes2Next50_32x4d(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.SERes2Next101_32x8d(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.SEWideRes2Net50(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.SEWideRes2Net101(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.SELRes2Net50(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.SELRes2Next50_4x4d(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.TSERes2Net18(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.TSERes2Net34(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.TSERes2Net50(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.TSERes2Net101(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
class hyperion.torch.narchs.resnet.TSERes2Net152(*args: Any, **kwargs: Any)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
__init__(in_channels, **kwargs)[source]
class hyperion.torch.narchs.resnet.TSERes2Next50_32x4d(*args: Any, **kwargs: Any)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
__init__(in_channels, **kwargs)[source]
class hyperion.torch.narchs.resnet.TSERes2Next101_32x8d(*args: Any, **kwargs: Any)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
__init__(in_channels, **kwargs)[source]
class hyperion.torch.narchs.resnet.TSEWideRes2Net50(*args: Any, **kwargs: Any)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
__init__(in_channels, **kwargs)[source]
class hyperion.torch.narchs.resnet.TSEWideRes2Net101(*args: Any, **kwargs: Any)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
__init__(in_channels, **kwargs)[source]
class hyperion.torch.narchs.resnet.TSELRes2Net50(*args: Any, **kwargs: Any)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
__init__(in_channels, **kwargs)[source]
class hyperion.torch.narchs.resnet.TSELRes2Next50_4x4d(*args: Any, **kwargs: Any)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
__init__(in_channels, **kwargs)[source]
class hyperion.torch.narchs.resnet.LResNet34_345(*args: Any, **kwargs: Any)[source]
_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

copy()
property device
forward(x, use_amp=False)
forward_hid_feats(x, layers=None, return_output=False)

forward function which also returns intermediate hidden representations

Parameters
  • x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

  • layers – list of hidden layers to return hidden representations

  • return_output – if True if returns the output representations in a separate tensor.

Returns

List of hidden representation tensors Tensor with output representations if return_output is True

freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
Returns

Tuple (past, future) context required to predict one frame.

in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape

Parameters

in_shape – input shape

Returns

Tuple describing output shape for the network

save(file_path)
unfreeze()
__init__(in_channels, **kwargs)[source]

SpineNets/Spine2Nets

class hyperion.torch.narchs.spinenet_factory.SpineNetFactory[source]
static create(spinenet_type, in_channels, output_levels=[3, 4, 5, 6, 7], endpoints_num_filters=256, resample_alpha=0.5, block_repeats=1, filter_size_scale=1.0, conv_channels=64, base_channels=64, out_units=0, hid_act={'inplace': True, 'name': 'relu6'}, out_act=None, in_kernel_size=7, in_stride=2, zero_init_residual=False, groups=1, dropout_rate=0, norm_layer=None, norm_before=True, do_maxpool=True, in_norm=True, se_r=16, in_feats=None, res2net_scale=4, res2net_width_factor=1)[source]
filter_args()[source]
static add_class_args(parser, prefix=None)[source]
static add_argparse_args(parser, prefix=None)

Copyright 2020 Magdalena Rybicka Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.narchs.spinenet.SpineNet(*args: Any, **kwargs: Any)[source]
__init__(in_channels, block_specs=None, output_levels=[3, 4, 5, 6, 7], endpoints_num_filters=256, resample_alpha=0.5, feature_output_level=None, block_repeats=1, filter_size_scale=1.0, conv_channels=64, base_channels=64, out_units=0, concat=False, do_endpoint_conv=True, concat_ax=3, upsampling_type='nearest', hid_act={'inplace': True, 'name': 'relu6'}, out_act=None, in_kernel_size=7, in_stride=2, zero_init_residual=False, groups=1, dropout_rate=0, norm_layer=None, norm_before=True, do_maxpool=True, in_norm=True, in_feats=None, se_r=16, time_se=False, has_se=False, is_res2net=False, res2net_scale=4, res2net_width_factor=1)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters
  • in_channels – nbr of channels of the input

  • block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_make_permuted_blocks(block_specs)[source]

Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs)[source]

Builds the cross-scale connections between the blocks.

_make_endpoints()[source]

Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_compute_max_context(in_context)[source]

Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)[source]
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_compute_channel_size()[source]
Returns

If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

in_shape()[source]
Returns

Tuple describing input shape for the network

out_shape(in_shape=None)[source]

Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

_match_feat_shape(feat0, feat1)[source]

Match shape between feats of the input connections.

forward(x, use_amp=False)[source]
_forward(x)[source]

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

get_config()[source]

Gets network config :returns: dictionary with config params

copy()
property device
freeze()
get_loss()
get_reg_loss()
in_context()
in_dim()
classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
save(file_path)
unfreeze()
class hyperion.torch.narchs.spinenet.SpineNet49(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters
  • in_channels – nbr of channels of the input

  • block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()
Returns

If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context)

Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints()

Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs)

Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs)

Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1)

Match shape between feats of the input connections.

copy()
property device
forward(x, use_amp=False)
freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)
unfreeze()
class hyperion.torch.narchs.spinenet.SpineNet49S(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters
  • in_channels – nbr of channels of the input

  • block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()
Returns

If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context)

Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints()

Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs)

Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs)

Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1)

Match shape between feats of the input connections.

copy()
property device
forward(x, use_amp=False)
freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)
unfreeze()
class hyperion.torch.narchs.spinenet.SpineNet96(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters
  • in_channels – nbr of channels of the input

  • block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()
Returns

If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context)

Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints()

Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs)

Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs)

Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1)

Match shape between feats of the input connections.

copy()
property device
forward(x, use_amp=False)
freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)
unfreeze()
class hyperion.torch.narchs.spinenet.SpineNet143(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters
  • in_channels – nbr of channels of the input

  • block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()
Returns

If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context)

Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints()

Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs)

Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs)

Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1)

Match shape between feats of the input connections.

copy()
property device
forward(x, use_amp=False)
freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)
unfreeze()
class hyperion.torch.narchs.spinenet.SpineNet190(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters
  • in_channels – nbr of channels of the input

  • block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()
Returns

If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context)

Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints()

Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs)

Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs)

Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1)

Match shape between feats of the input connections.

copy()
property device
forward(x, use_amp=False)
freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)
unfreeze()
class hyperion.torch.narchs.spinenet.LSpineNet49(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters
  • in_channels – nbr of channels of the input

  • block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()
Returns

If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context)

Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints()

Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs)

Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs)

Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1)

Match shape between feats of the input connections.

copy()
property device
forward(x, use_amp=False)
freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)
unfreeze()
class hyperion.torch.narchs.spinenet.LSpineNet49_subpixel(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters
  • in_channels – nbr of channels of the input

  • block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()
Returns

If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context)

Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints()

Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs)

Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs)

Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1)

Match shape between feats of the input connections.

copy()
property device
forward(x, use_amp=False)
freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)
unfreeze()
class hyperion.torch.narchs.spinenet.LSpineNet49_bilinear(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters
  • in_channels – nbr of channels of the input

  • block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()
Returns

If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context)

Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints()

Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs)

Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs)

Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1)

Match shape between feats of the input connections.

copy()
property device
forward(x, use_amp=False)
freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)
unfreeze()
class hyperion.torch.narchs.spinenet.LSpineNet49_5(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters
  • in_channels – nbr of channels of the input

  • block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()
Returns

If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context)

Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints()

Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs)

Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs)

Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1)

Match shape between feats of the input connections.

copy()
property device
forward(x, use_amp=False)
freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)
unfreeze()
class hyperion.torch.narchs.spinenet.LSpine2Net49(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters
  • in_channels – nbr of channels of the input

  • block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()
Returns

If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context)

Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints()

Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs)

Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs)

Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1)

Match shape between feats of the input connections.

copy()
property device
forward(x, use_amp=False)
freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)
unfreeze()
class hyperion.torch.narchs.spinenet.SELSpine2Net49(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters
  • in_channels – nbr of channels of the input

  • block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()
Returns

If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context)

Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints()

Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs)

Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs)

Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1)

Match shape between feats of the input connections.

copy()
property device
forward(x, use_amp=False)
freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)
unfreeze()
class hyperion.torch.narchs.spinenet.TSELSpine2Net49(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters
  • in_channels – nbr of channels of the input

  • block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()
Returns

If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context)

Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints()

Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs)

Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs)

Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1)

Match shape between feats of the input connections.

copy()
property device
forward(x, use_amp=False)
freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)
unfreeze()
class hyperion.torch.narchs.spinenet.Spine2Net49(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters
  • in_channels – nbr of channels of the input

  • block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()
Returns

If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context)

Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints()

Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs)

Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs)

Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1)

Match shape between feats of the input connections.

copy()
property device
forward(x, use_amp=False)
freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)
unfreeze()
class hyperion.torch.narchs.spinenet.SESpine2Net49(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters
  • in_channels – nbr of channels of the input

  • block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()
Returns

If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context)

Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints()

Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs)

Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs)

Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1)

Match shape between feats of the input connections.

copy()
property device
forward(x, use_amp=False)
freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)
unfreeze()
class hyperion.torch.narchs.spinenet.TSESpine2Net49(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters
  • in_channels – nbr of channels of the input

  • block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()
Returns

If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context)

Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints()

Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs)

Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs)

Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1)

Match shape between feats of the input connections.

copy()
property device
forward(x, use_amp=False)
freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)
unfreeze()
class hyperion.torch.narchs.spinenet.Spine2Net49S(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters
  • in_channels – nbr of channels of the input

  • block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()
Returns

If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context)

Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints()

Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs)

Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs)

Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1)

Match shape between feats of the input connections.

copy()
property device
forward(x, use_amp=False)
freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)
unfreeze()
class hyperion.torch.narchs.spinenet.SESpine2Net49S(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters
  • in_channels – nbr of channels of the input

  • block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()
Returns

If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context)

Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints()

Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs)

Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs)

Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1)

Match shape between feats of the input connections.

copy()
property device
forward(x, use_amp=False)
freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)
unfreeze()
class hyperion.torch.narchs.spinenet.TSESpine2Net49S(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters
  • in_channels – nbr of channels of the input

  • block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()
Returns

If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context)

Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints()

Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs)

Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs)

Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1)

Match shape between feats of the input connections.

copy()
property device
forward(x, use_amp=False)
freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)
unfreeze()
class hyperion.torch.narchs.spinenet.LR0_SP53(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters
  • in_channels – nbr of channels of the input

  • block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()
Returns

If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context)

Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints()

Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs)

Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs)

Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1)

Match shape between feats of the input connections.

copy()
property device
forward(x, use_amp=False)
freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)
unfreeze()
class hyperion.torch.narchs.spinenet.R0_SP53(*args: Any, **kwargs: Any)[source]
__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters
  • in_channels – nbr of channels of the input

  • block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

_compute_channel_size()
Returns

If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context)

Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints()

Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs)

Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs)

Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1)

Match shape between feats of the input connections.

copy()
property device
forward(x, use_amp=False)
freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)
unfreeze()
class hyperion.torch.narchs.spinenet.SpineNet49_concat_time(*args: Any, **kwargs: Any)[source]
_compute_channel_size()
Returns

If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.

_compute_max_context(in_context)

Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.

_compute_out_size(in_size)
Computes output size given input size.

Output size is not the same as input size because of downsampling steps.

Parameters

in_size – input size of the H or W dimensions

Returns

output_size

_forward(x)

forward function

Parameters

x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio

Returns

Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)

_make_endpoints()

Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.

_make_permuted_blocks(block_specs)

Builds the blocks of the SpineNet structure.

_make_permuted_connections(block_specs)

Builds the cross-scale connections between the blocks.

_match_feat_shape(feat0, feat1)

Match shape between feats of the input connections.

copy()
property device
forward(x, use_amp=False)
freeze()
get_config()

Gets network config :returns: dictionary with config params

get_loss()
get_reg_loss()
in_context()
in_dim()
in_shape()
Returns

Tuple describing input shape for the network

classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
out_shape(in_shape=None)

Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #

save(file_path)
unfreeze()
__init__(in_channels, **kwargs)[source]

Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027

Parameters
  • in_channels – nbr of channels of the input

  • block_specs – specification of the building blocks: their type, input connections and information if block

is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)

ResNet Encoder/Decoders

These are Encoder/Decoders based on flexible ResNets 1d and 2d.

ResNet Encoder 1d

class hyperion.torch.narchs.resnet1d_encoder.ResNet1dEncoder(*args: Any, **kwargs: Any)[source]
__init__(in_feats, in_conv_channels=128, in_kernel_size=3, in_stride=1, resb_type='basic', resb_repeats=[1, 1, 1], resb_channels=128, resb_kernel_sizes=3, resb_strides=2, resb_dilations=1, resb_groups=1, head_channels=0, hid_act='relu6', head_act=None, dropout_rate=0, drop_connect_rate=0, se_r=16, res2net_width_factor=1, res2net_scale=4, multilayer=False, multilayer_concat=False, endpoint_channels=None, endpoint_layers=None, endpoint_scale_layer=- 1, use_norm=True, norm_layer=None, norm_before=True, upsampling_mode='nearest')[source]
in_context()[source]
in_shape()[source]
out_shape(in_shape=None)[source]
forward(x)[source]
copy()
property device
forward_hid_feats(x, layers=None, return_output=False)[source]
freeze()
get_loss()
get_reg_loss()
in_dim()
classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
save(file_path)
unfreeze()
get_config()[source]
static filter_args(**kwargs)[source]
static add_class_args(parser, prefix=None, skip={'in_feats'})[source]
static add_argparse_args(parser, prefix=None, skip={'in_feats'})

ResNet Decoder 1d

class hyperion.torch.narchs.resnet1d_decoder.ResNet1dDecoder(*args: Any, **kwargs: Any)[source]
__init__(in_channels=128, in_conv_channels=128, in_kernel_size=3, in_stride=1, resb_type='basic', resb_repeats=[1, 1, 1], resb_channels=128, resb_kernel_sizes=3, resb_strides=2, resb_dilations=1, resb_groups=1, head_channels=0, hid_act='relu6', head_act=None, dropout_rate=0, se_r=16, use_norm=True, norm_layer=None, norm_before=True)[source]
in_context()[source]
in_shape()[source]
out_shape(in_shape=None)[source]
forward(x, target_shape=None)[source]
get_config()[source]
copy()
property device
static filter_args(**kwargs)[source]
freeze()
get_loss()
get_reg_loss()
in_dim()
classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
save(file_path)
unfreeze()
static add_class_args(parser, prefix=None)[source]
static add_argparse_args(parser, prefix=None)

ResNet Encoder 2d

class hyperion.torch.narchs.resnet2d_encoder.ResNet2dEncoder(*args: Any, **kwargs: Any)[source]
__init__(in_channels=1, in_conv_channels=64, in_kernel_size=3, in_stride=1, resb_type='basic', resb_repeats=[2, 2, 2, 2], resb_channels=[64, 128, 256, 512], resb_kernel_sizes=3, resb_strides=2, resb_dilations=1, resb_groups=1, head_channels=0, hid_act='relu6', head_act=None, dropout_rate=0, se_r=16, time_se=False, in_feats=None, res2net_width_factor=1, res2net_scale=4, use_norm=True, norm_layer=None, norm_before=True)[source]
in_context()[source]
in_shape()[source]
out_shape(in_shape=None)[source]
copy()
property device
forward(x)[source]
freeze()
get_loss()
get_reg_loss()
in_dim()
classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
save(file_path)
unfreeze()
get_config()[source]
static filter_args(**kwargs)[source]
static add_class_args(parser, prefix=None, skip={})[source]
static add_argparse_args(parser, prefix=None, skip={})

ResNet Decoder 2d

class hyperion.torch.narchs.resnet2d_decoder.ResNet2dDecoder(*args: Any, **kwargs: Any)[source]
__init__(in_channels=512, in_conv_channels=512, in_kernel_size=3, in_stride=1, resb_type='basic', resb_repeats=[2, 2, 2, 2], resb_channels=[512, 256, 128, 64], resb_kernel_sizes=3, resb_strides=2, resb_dilations=1, resb_groups=1, head_channels=0, hid_act='relu6', head_act=None, dropout_rate=0, se_r=16, use_norm=True, norm_layer=None, norm_before=True)[source]
in_context()[source]
in_shape()[source]
out_shape(in_shape=None)[source]
forward(x, target_shape=None)[source]
get_config()[source]
copy()
property device
static filter_args(**kwargs)[source]
freeze()
get_loss()
get_reg_loss()
in_dim()
classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
save(file_path)
unfreeze()
static add_class_args(parser, prefix=None)[source]
static add_argparse_args(parser, prefix=None)

EfficientNet

Transformer

class hyperion.torch.narchs.transformer_encoder_v1.TransformerEncoderV1(*args: Any, **kwargs: Any)[source]

Transformer encoder module.

in_feats

input features dimension

d_model

encoder blocks feature dimension

num_heads

number of heads

num_blocks

number of self attn blocks

att_type

string in [‘scaled-dot-prod-att-v1’, ‘local-scaled-dot-prod-att-v1’]

att_context

maximum context range for local attention

ff_type

string in [‘linear’, ‘conv1dx2’, ‘conv1d-linear’]

d_ff

dimension of middle layer in feed_forward block

ff_kernel_size

kernel size for convolutional versions of ff block

ff_dropout_rate

dropout rate for ff block

pos_dropout_rate

dropout rate for positional encoder

att_dropout_rate

dropout rate for attention block

in_layer_type

input layer block type in [‘linear’,’conv2d-sub’, ‘embed’, None]

rel_pos_enc

if True, use relative postional encodings, absolute encodings otherwise.

causal_pos_enc

if True, use causal positional encodings (when rel_pos_enc=True), it assumes that query q_i only attents to key k_j when j<=i

hid_act

hidden activations in ff and input blocks

norm_before

if True, use layer norm before layers, otherwise after

concat_after
if True, if concats attention input and output and apply linear transform, i.e.,

y = x + linear(concat(x, att(x)))

if False, y = x + att(x)

padding_idx

padding idx for embed layer

in_time_dim

time dimension in the input Tensor

out_time_dim

dimension that we want to be time in the output tensor

__init__(in_feats, d_model=256, num_heads=4, num_blocks=6, att_type='scaled-dot-prod-v1', att_context=25, ff_type='linear', d_ff=2048, ff_kernel_size=1, ff_dropout_rate=0.1, pos_dropout_rate=0.1, att_dropout_rate=0.0, in_layer_type='conv2d-sub', rel_pos_enc=False, causal_pos_enc=False, hid_act='relu6', norm_before=True, concat_after=False, padding_idx=- 1, in_time_dim=- 1, out_time_dim=1)[source]
forward(x, mask=None, target_shape=None, use_amp=False)[source]
_forward(x, mask=None, target_shape=None)[source]

Forward pass function

Parameters
  • x – input tensor with size=(batch, time, num_feats)

  • mask – mask to indicate valid time steps for x (batch, time)

Returns

Tensor with output features Tensor with mask

get_config()[source]

Gets network config :returns: dictionary with config params

in_context()[source]
in_shape()[source]

Input shape for network

Returns

Tuple describing input shape

out_shape(in_shape=None)[source]

Infers the network output shape given the input shape

Parameters

in_shape – input shape tuple

Returns

Tuple with the output shape

static filter_args(**kwargs)[source]
Filters arguments correspondin to TransformerXVector

from args dictionary

Parameters

kwargs – args dictionary

Returns

args dictionary

static add_class_args(parser, prefix=None, in_feats=False)[source]

Adds Transformer config parameters to argparser

Parameters
  • parser – argparse object

  • prefix – prefix string to add to the argument names

static add_argparse_args(parser, prefix=None, in_feats=False)

Adds Transformer config parameters to argparser

Parameters
  • parser – argparse object

  • prefix – prefix string to add to the argument names

copy()
property device
freeze()
get_loss()
get_reg_loss()
in_dim()
classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
save(file_path)
unfreeze()

Conformer

class hyperion.torch.narchs.conformer_encoder_v1.ConformerEncoderV1(*args: Any, **kwargs: Any)[source]
Conformer encoder introduced in

https://arxiv.org/pdf/2005.08100.pdf

This includes some optional extra features not included in the original paper:

  • Choose local-attention (attending only to close frames instead of all the frames in the sequence)

  • Choose number of conv blocks in each conformer layer

  • Squeeze-Excitation after depthwise-conv

  • Allows downsampling in time dimension

  • Allows choosing activation and layer normalization type

We call this Conformer+

This becomes a standard Transformer by setting conv_repeats=0, pos_enc_type=’abs’, ff_macaron=False.

in_feats

input features dimension

d_model

encoder blocks feature dimension

num_heads

number of heads

num_blocks

number of self attn blocks

att_type

string in [‘scaled-dot-prod-att-v1’, ‘local-scaled-dot-prod-att-v1’]

att_context

maximum context range for local attention

conv_repeats

number of conv blocks in each conformer block

conv_kernel_sizes

kernel size for conv blocks

conv_strides

stride for depth-wise conv in the first conv block of each conformer block

ff_type

string in [‘linear’, ‘conv1dx2’, ‘conv1d-linear’]

d_ff

dimension of middle layer in feed_forward block

ff_kernel_size

kernel size for convolutional versions of ff block

dropout_rate

dropout rate for ff and conv blocks

pos_dropout_rate

dropout rate for positional encoder

att_dropout_rate

dropout rate for attention block

in_layer_type

input layer block type in [‘linear’,’conv2d-sub’, ‘embed’, None]

pos_enc_type

type of positional encoder [‘no’, ‘abs’, ‘rel’]

causal_pos_enc

if True, use causal positional encodings (when rel_pos_enc=True), it assumes that query q_i only attents to key k_j when j<=i

no_pos_enc

if True, it doesn’t use positional encoder.

hid_act

hidden activations in ff and input blocks

conv_norm_layer

norm layer constructor or str for conv block, if None it uses BatchNorm1d

se_r

Squeeze-Excitation compression ratio, if None it doesn’t use Squeeze-Excitation

ff_macaron

if True, it uses macaron-net style ff layers, otherwise transformer style.

red_lnorms

it True, use redundant LNorm layers at the output of the conformer blocks as in the paper

concat_after
if True, if concats attention input and output and apply linear transform, i.e.,

y = x + linear(concat(x, att(x)))

if False, y = x + att(x)

padding_idx

padding idx for embed layer

in_time_dim

time dimension in the input Tensor

out_time_dim

dimension that we want to be time in the output tensor

rel_pos_enc

if True, use relative postional encodings, absolute encodings otherwise. (deprecated)

red_lnorm

(deprecated)

__init__(in_feats, d_model=256, num_heads=4, num_blocks=6, att_type='scaled-dot-prod-v1', att_context=25, conv_repeats=1, conv_kernel_sizes=31, conv_strides=1, ff_type='linear', d_ff=2048, ff_kernel_size=1, dropout_rate=0.1, pos_dropout_rate=0.1, att_dropout_rate=0.0, in_layer_type='conv2d-sub', pos_enc_type='rel', causal_pos_enc=False, hid_act='swish', conv_norm_layer=None, se_r=None, ff_macaron=True, red_lnorms=False, concat_after=False, padding_idx=- 1, in_time_dim=- 1, out_time_dim=1, rel_pos_enc=True, red_lnorm=False)[source]
forward(x, mask=None, target_shape=None)[source]

Forward pass function

Parameters
  • x – input tensor with size=(batch, time, num_feats)

  • mask – mask to indicate valid time steps for x (batch, time)

Returns

Tensor with output features Tensor with mask

get_config()[source]

Gets network config :returns: dictionary with config params

in_context()[source]
in_shape()[source]

Input shape for network

Returns

Tuple describing input shape

out_shape(in_shape=None)[source]

Infers the network output shape given the input shape

Parameters

in_shape – input shape tuple

Returns

Tuple with the output shape

static filter_args(**kwargs)[source]
Filters arguments correspondin to TransformerXVector

from args dictionary

Parameters

kwargs – args dictionary

Returns

args dictionary

static add_class_args(parser, prefix=None, in_feats=False)[source]

Adds Conformer config parameters to argparser

Parameters
  • parser – argparse object

  • prefix – prefix string to add to the argument names

static add_argparse_args(parser, prefix=None, in_feats=False)

Adds Conformer config parameters to argparser

Parameters
  • parser – argparse object

  • prefix – prefix string to add to the argument names

copy()
property device
freeze()
get_loss()
get_reg_loss()
in_dim()
classmethod load(file_path=None, cfg=None, state_dict=None)
out_dim()
save(file_path)
unfreeze()

Models

These include complex models created by connecting several network architectures.

x-Vectors

There are several variants of x-vector embeddings. They all derive from the same base class.

class hyperion.torch.models.xvectors.xvector.XVector(*args: Any, **kwargs: Any)[source]

x-Vector base class

__init__(encoder_net, num_classes, pool_net='mean+stddev', embed_dim=256, num_embed_layers=1, hid_act={'inplace': True, 'name': 'relu'}, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=0, num_subcenters=2, norm_layer=None, head_norm_layer=None, use_norm=True, norm_before=True, dropout_rate=0, embed_layer=0, in_feats=None, proj_feats=None)[source]
property pool_feats
property num_classes
property embed_dim
property num_embed_layers
property s
property margin
property margin_warmup_epochs
property num_subcenters
property loss_type
_make_pool_net(pool_net, enc_feats=None)[source]

Makes the pooling block

Parameters
  • pool_net – str or dict to pass to the pooling factory create function

  • enc_feats – dimension of the features coming from the encoder

Returns

GlobalPool1d object

update_loss_margin(epoch)[source]
Updates the value of the margin in AAM/AM-softmax losses

given the epoch number

Parameters

epoch – epoch which is about to start

forward(x, y=None, enc_layers=None, classif_layers=None, return_output=True)[source]
forward_output(x, y=None)[source]

Forward function

Parameters
  • x – input features tensor with shape=(batch, in_feats, time)

  • y – target classes torch.long tensor with shape=(batch,)

Returns

class posteriors tensor with shape=(batch, num_classes)

forward_hid_feats(x, y=None, enc_layers=None, classif_layers=None, return_output=False)[source]

forwards hidden representations in the x-vector network

extract_embed(x, chunk_length=0, embed_layer=None, detach_chunks=False)[source]
extract_embed_slidwin(x, win_length, win_shift, snip_edges=False, feat_frame_length=None, feat_frame_shift=None, chunk_length=0, embed_layer=None, detach_chunks=False)[source]
compute_slidwin_timestamps(num_windows, win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)[source]
compute_slidwin_left_padding(win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)[source]
get_config()[source]
classmethod load(file_path=None, cfg=None, state_dict=None)[source]
rebuild_output_layer(num_classes=None, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=10)[source]
freeze_preembed_layers()[source]
train_mode(mode='ft-embed-affine')[source]
static filter_args(**kwargs)[source]
static add_class_args(parser, prefix=None, skip={})[source]
static filter_finetune_args(**kwargs)[source]
static add_finetune_args(parser, prefix=None)[source]
static add_argparse_args(parser, prefix=None, skip={})
copy()
property device
freeze()
get_loss()
get_reg_loss()
save(file_path)
unfreeze()
static add_argparse_finetune_args(parser, prefix=None)

TDNN x-Vector

x-Vectors with TDNN, E-TDNN, Residual E-TDNN Encoders.

class hyperion.torch.models.xvectors.tdnn_xvector.TDNNXVector(*args: Any, **kwargs: Any)[source]
__init__(tdnn_type, num_enc_blocks, in_feats, num_classes, enc_hid_units, enc_expand_units=None, kernel_size=3, dilation=1, dilation_factor=1, pool_net='mean+stddev', embed_dim=256, num_embed_layers=1, hid_act={'inplace': True, 'name': 'relu6'}, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=0, num_subcenters=2, dropout_rate=0, norm_layer=None, head_norm_layer=None, use_norm=True, norm_before=False, in_norm=False, embed_layer=0, proj_feats=None)[source]
property num_enc_blocks
property enc_hid_units
property enc_expand_units
property kernel_size
property dilation
property dilation_factor
property in_norm
get_config()[source]
classmethod load(file_path=None, cfg=None, state_dict=None)[source]
filter_args()[source]
static add_class_args(parser, prefix=None)[source]
static add_argparse_args(parser, prefix=None)
_make_pool_net(pool_net, enc_feats=None)

Makes the pooling block

Parameters
  • pool_net – str or dict to pass to the pooling factory create function

  • enc_feats – dimension of the features coming from the encoder

Returns

GlobalPool1d object

static add_argparse_finetune_args(parser, prefix=None)
static add_finetune_args(parser, prefix=None)
compute_slidwin_left_padding(win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)
compute_slidwin_timestamps(num_windows, win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)
copy()
property device
property embed_dim
extract_embed(x, chunk_length=0, embed_layer=None, detach_chunks=False)
extract_embed_slidwin(x, win_length, win_shift, snip_edges=False, feat_frame_length=None, feat_frame_shift=None, chunk_length=0, embed_layer=None, detach_chunks=False)
static filter_finetune_args(**kwargs)
forward(x, y=None, enc_layers=None, classif_layers=None, return_output=True)
forward_hid_feats(x, y=None, enc_layers=None, classif_layers=None, return_output=False)

forwards hidden representations in the x-vector network

forward_output(x, y=None)

Forward function

Parameters
  • x – input features tensor with shape=(batch, in_feats, time)

  • y – target classes torch.long tensor with shape=(batch,)

Returns

class posteriors tensor with shape=(batch, num_classes)

freeze()
freeze_preembed_layers()
get_loss()
get_reg_loss()
property loss_type
property margin
property margin_warmup_epochs
property num_classes
property num_embed_layers
property num_subcenters
property pool_feats
rebuild_output_layer(num_classes=None, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=10)
property s
save(file_path)
train_mode(mode='ft-embed-affine')
unfreeze()
update_loss_margin(epoch)
Updates the value of the margin in AAM/AM-softmax losses

given the epoch number

Parameters

epoch – epoch which is about to start

ResNet x-Vector

x-Vectors with Cannonical ResNet, Res2Net Encoders.

class hyperion.torch.models.xvectors.resnet_xvector.ResNetXVector(*args: Any, **kwargs: Any)[source]
__init__(resnet_type, in_feats, num_classes, in_channels, conv_channels=64, base_channels=64, in_kernel_size=7, in_stride=1, zero_init_residual=False, groups=1, replace_stride_with_dilation=None, do_maxpool=False, pool_net='mean+stddev', embed_dim=256, num_embed_layers=1, hid_act={'inplace': True, 'name': 'relu'}, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=0, num_subcenters=2, dropout_rate=0, norm_layer=None, head_norm_layer=None, use_norm=True, norm_before=True, in_norm=False, embed_layer=0, proj_feats=None, se_r=16, res2net_scale=4, res2net_width_factor=1)[source]
property in_channels
property conv_channels
property base_channels
property in_kernel_size
property in_stride
property zero_init_residual
property groups
property replace_stride_with_dilation
property do_maxpool
property in_norm
property se_r
property res2net_scale
property res2net_width_factor
get_config()[source]
classmethod load(file_path=None, cfg=None, state_dict=None)[source]
filter_args()[source]
static add_class_args(parser, prefix=None)[source]
static add_argparse_args(parser, prefix=None)
_make_pool_net(pool_net, enc_feats=None)

Makes the pooling block

Parameters
  • pool_net – str or dict to pass to the pooling factory create function

  • enc_feats – dimension of the features coming from the encoder

Returns

GlobalPool1d object

static add_argparse_finetune_args(parser, prefix=None)
static add_finetune_args(parser, prefix=None)
compute_slidwin_left_padding(win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)
compute_slidwin_timestamps(num_windows, win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)
copy()
property device
property embed_dim
extract_embed(x, chunk_length=0, embed_layer=None, detach_chunks=False)
extract_embed_slidwin(x, win_length, win_shift, snip_edges=False, feat_frame_length=None, feat_frame_shift=None, chunk_length=0, embed_layer=None, detach_chunks=False)
static filter_finetune_args(**kwargs)
forward(x, y=None, enc_layers=None, classif_layers=None, return_output=True)
forward_hid_feats(x, y=None, enc_layers=None, classif_layers=None, return_output=False)

forwards hidden representations in the x-vector network

forward_output(x, y=None)

Forward function

Parameters
  • x – input features tensor with shape=(batch, in_feats, time)

  • y – target classes torch.long tensor with shape=(batch,)

Returns

class posteriors tensor with shape=(batch, num_classes)

freeze()
freeze_preembed_layers()
get_loss()
get_reg_loss()
property loss_type
property margin
property margin_warmup_epochs
property num_classes
property num_embed_layers
property num_subcenters
property pool_feats
rebuild_output_layer(num_classes=None, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=10)
property s
save(file_path)
train_mode(mode='ft-embed-affine')
unfreeze()
update_loss_margin(epoch)
Updates the value of the margin in AAM/AM-softmax losses

given the epoch number

Parameters

epoch – epoch which is about to start

SpineNet x-Vector

x-Vectors with SpineNet, Spine2Net Encoders.

class hyperion.torch.models.xvectors.spinenet_xvector.SpineNetXVector(*args: Any, **kwargs: Any)[source]
__init__(spinenet_type, in_feats, num_classes, in_channels, output_levels=[3, 4, 5, 6, 7], endpoints_num_filters=256, resample_alpha=0.5, block_repeats=1, filter_size_scale=1.0, conv_channels=64, base_channels=64, in_kernel_size=7, in_stride=1, zero_init_residual=False, groups=1, do_maxpool=False, pool_net='mean+stddev', embed_dim=256, num_embed_layers=1, hid_act={'inplace': True, 'name': 'relu'}, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=0, num_subcenters=2, dropout_rate=0, norm_layer=None, head_norm_layer=None, use_norm=True, norm_before=True, in_norm=False, embed_layer=0, proj_feats=None, se_r=16, res2net_scale=4, res2net_width_factor=1)[source]
property in_channels
property output_levels
property endpoints_num_filters
property resample_alpha
property block_repeats
property filter_size_scale
property conv_channels
property base_channels
property in_kernel_size
property in_stride
property zero_init_residual
property groups
property do_maxpool
property in_norm
property se_r
property res2net_scale
property res2net_width_factor
get_config()[source]
classmethod load(file_path=None, cfg=None, state_dict=None)[source]
filter_args()[source]
static add_class_args(parser, prefix=None)[source]
static add_argparse_args(parser, prefix=None)
_make_pool_net(pool_net, enc_feats=None)

Makes the pooling block

Parameters
  • pool_net – str or dict to pass to the pooling factory create function

  • enc_feats – dimension of the features coming from the encoder

Returns

GlobalPool1d object

static add_argparse_finetune_args(parser, prefix=None)
static add_finetune_args(parser, prefix=None)
compute_slidwin_left_padding(win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)
compute_slidwin_timestamps(num_windows, win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)
copy()
property device
property embed_dim
extract_embed(x, chunk_length=0, embed_layer=None, detach_chunks=False)
extract_embed_slidwin(x, win_length, win_shift, snip_edges=False, feat_frame_length=None, feat_frame_shift=None, chunk_length=0, embed_layer=None, detach_chunks=False)
static filter_finetune_args(**kwargs)
forward(x, y=None, enc_layers=None, classif_layers=None, return_output=True)
forward_hid_feats(x, y=None, enc_layers=None, classif_layers=None, return_output=False)

forwards hidden representations in the x-vector network

forward_output(x, y=None)

Forward function

Parameters
  • x – input features tensor with shape=(batch, in_feats, time)

  • y – target classes torch.long tensor with shape=(batch,)

Returns

class posteriors tensor with shape=(batch, num_classes)

freeze()
freeze_preembed_layers()
get_loss()
get_reg_loss()
property loss_type
property margin
property margin_warmup_epochs
property num_classes
property num_embed_layers
property num_subcenters
property pool_feats
rebuild_output_layer(num_classes=None, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=10)
property s
save(file_path)
train_mode(mode='ft-embed-affine')
unfreeze()
update_loss_margin(epoch)
Updates the value of the margin in AAM/AM-softmax losses

given the epoch number

Parameters

epoch – epoch which is about to start

ResNet 1d x-Vector

x-Vectors with ResNet, Res2Net 1d Encoders. It can be cofigured as ECAPA-TDNN

class hyperion.torch.models.xvectors.resnet1d_xvector.ResNet1dXVector(*args: Any, **kwargs: Any)[source]
__init__(resnet_enc, num_classes, pool_net='mean+stddev', embed_dim=256, num_embed_layers=1, hid_act={'inplace': True, 'name': 'relu'}, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=0, num_subcenters=2, dropout_rate=0, norm_layer=None, head_norm_layer=None, use_norm=True, norm_before=True, in_norm=False, embed_layer=0, proj_feats=None)[source]
get_config()[source]
classmethod load(file_path=None, cfg=None, state_dict=None)[source]
filter_args()[source]
static add_class_args(parser, prefix=None)[source]
static add_argparse_args(parser, prefix=None)
_make_pool_net(pool_net, enc_feats=None)

Makes the pooling block

Parameters
  • pool_net – str or dict to pass to the pooling factory create function

  • enc_feats – dimension of the features coming from the encoder

Returns

GlobalPool1d object

static add_argparse_finetune_args(parser, prefix=None)
static add_finetune_args(parser, prefix=None)
compute_slidwin_left_padding(win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)
compute_slidwin_timestamps(num_windows, win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)
copy()
property device
property embed_dim
extract_embed(x, chunk_length=0, embed_layer=None, detach_chunks=False)
extract_embed_slidwin(x, win_length, win_shift, snip_edges=False, feat_frame_length=None, feat_frame_shift=None, chunk_length=0, embed_layer=None, detach_chunks=False)
static filter_finetune_args(**kwargs)
forward(x, y=None, enc_layers=None, classif_layers=None, return_output=True)
forward_hid_feats(x, y=None, enc_layers=None, classif_layers=None, return_output=False)

forwards hidden representations in the x-vector network

forward_output(x, y=None)

Forward function

Parameters
  • x – input features tensor with shape=(batch, in_feats, time)

  • y – target classes torch.long tensor with shape=(batch,)

Returns

class posteriors tensor with shape=(batch, num_classes)

freeze()
freeze_preembed_layers()
get_loss()
get_reg_loss()
property loss_type
property margin
property margin_warmup_epochs
property num_classes
property num_embed_layers
property num_subcenters
property pool_feats
rebuild_output_layer(num_classes=None, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=10)
property s
save(file_path)
train_mode(mode='ft-embed-affine')
unfreeze()
update_loss_margin(epoch)
Updates the value of the margin in AAM/AM-softmax losses

given the epoch number

Parameters

epoch – epoch which is about to start

Transfomer x-Vector

x-Vectors based on Transformer Encoder

class hyperion.torch.models.xvectors.transformer_xvector_v1.TransformerXVectorV1(*args: Any, **kwargs: Any)[source]

x-Vector with Transformer encoder.

in_feats

input features dimension

num_classes

number of training classes

enc_d_model

encoder blocks feature dimension

num_enc_heads

number of heads

num_enc_blocks

number of self attn blocks

enc_att_type

string in [‘scaled-dot-prod-att-v1’, ‘local-scaled-dot-prod-att-v1’]

enc_att_context

maximum context range for local attention

enc_ff_type

string in [‘linear’, ‘conv1dx2’, ‘conv1d-linear’]

enc_d_ff

dimension of middle layer in feed_forward block

enc_ff_kernel_size

kernel size for convolutional versions of ff block

in_layer_type

input layer block type in [‘linear’,’conv2d-sub’, ‘embed’, None]

enc_concat_after
if True, if concats attention input and output and apply linear transform, i.e.,

y = x + linear(concat(x, att(x)))

if False, y = x + att(x)

pool_net

pooling block configuration string or dictionary of params

embed_dim

x-vector dimension

num_embed_layers

number of hidden layers in classification head

hid_act

hidden activation configuration string or dictionary

loss_type

sofmax losss type string in [‘softmax’, ‘arc-softmax’, ‘cos-softmax’]

s

s parameter in arc/cos-softmax losses

margin

margin in arc/cos-sofmtax losses

margin_warmup_epochs

number of epochs until we reach the maximum value for margin

dropout_rate

dropout rate for ff block and classification head

pos_dropout_rate

dropout rate for positional encoder

att_dropout_rate

dropout rate for attention block

use_norm

if True use batch/layer norm

norm_before

if True, use layer norm before layers, otherwise after

in_norm

add batchnorm at the input

embed_layer

which layer to use to extract x-vectors

proj_feats

add linear projection layer after the encoder to project feature dimension to proj_feats

__init__(in_feats, num_classes, enc_d_model=512, num_enc_heads=4, num_enc_blocks=6, enc_att_type='scaled-dot-prod-v1', enc_att_context=25, enc_ff_type='linear', enc_d_ff=2048, enc_ff_kernel_size=1, in_layer_type='conv2d-sub', enc_concat_after=False, pool_net='mean+stddev', embed_dim=256, num_embed_layers=1, hid_act={'inplace': True, 'name': 'relu6'}, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=0, num_subcenters=2, dropout_rate=0.1, pos_dropout_rate=0.1, att_dropout_rate=0.0, norm_layer=None, head_norm_layer=None, use_norm=True, norm_before=False, in_norm=False, embed_layer=0, proj_feats=None)[source]
property enc_d_model
property num_enc_heads
property num_enc_blocks
property enc_att_type
property enc_att_context
property enc_d_ff
property enc_ff_kernel_size
property pos_dropout_rate
property att_dropout_rate
property in_layer_type
property enc_concat_after
property enc_ff_type
get_config()[source]

Gets network config :returns: dictionary with config params

classmethod load(file_path=None, cfg=None, state_dict=None)[source]

Loads model from file

static filter_args(**kwargs)[source]
Filters arguments correspondin to TransformerXVector

from args dictionary

Parameters
  • prefix – prefix string

  • kwargs – args dictionary

Returns

args dictionary

static add_class_args(parser, prefix=None)[source]

Adds TransformerXVector config parameters to argparser

Parameters
  • parser – argparse object

  • prefix – prefix string to add to the argument names

_make_pool_net(pool_net, enc_feats=None)

Makes the pooling block

Parameters
  • pool_net – str or dict to pass to the pooling factory create function

  • enc_feats – dimension of the features coming from the encoder

Returns

GlobalPool1d object

static add_argparse_args(parser, prefix=None)

Adds TransformerXVector config parameters to argparser

Parameters
  • parser – argparse object

  • prefix – prefix string to add to the argument names

static add_argparse_finetune_args(parser, prefix=None)
static add_finetune_args(parser, prefix=None)
compute_slidwin_left_padding(win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)
compute_slidwin_timestamps(num_windows, win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)
copy()
property device
property embed_dim
extract_embed(x, chunk_length=0, embed_layer=None, detach_chunks=False)
extract_embed_slidwin(x, win_length, win_shift, snip_edges=False, feat_frame_length=None, feat_frame_shift=None, chunk_length=0, embed_layer=None, detach_chunks=False)
static filter_finetune_args(**kwargs)
forward(x, y=None, enc_layers=None, classif_layers=None, return_output=True)
forward_hid_feats(x, y=None, enc_layers=None, classif_layers=None, return_output=False)

forwards hidden representations in the x-vector network

forward_output(x, y=None)

Forward function

Parameters
  • x – input features tensor with shape=(batch, in_feats, time)

  • y – target classes torch.long tensor with shape=(batch,)

Returns

class posteriors tensor with shape=(batch, num_classes)

freeze()
freeze_preembed_layers()
get_loss()
get_reg_loss()
property loss_type
property margin
property margin_warmup_epochs
property num_classes
property num_embed_layers
property num_subcenters
property pool_feats
rebuild_output_layer(num_classes=None, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=10)
property s
save(file_path)
train_mode(mode='ft-embed-affine')
unfreeze()
update_loss_margin(epoch)
Updates the value of the margin in AAM/AM-softmax losses

given the epoch number

Parameters

epoch – epoch which is about to start

Auto-Encoder

class hyperion.torch.models.ae.ae.AE(*args: Any, **kwargs: Any)[source]

Basic Autoencoder class

encoder_net

NArch encoder network object

decoder_net

NArch decoder network object

z_dim

latent variable dimension (inferred from encoder_net output shape)

__init__(encoder_net, decoder_net)[source]
forward(x, x_target=None, use_amp=False)[source]
get_config()[source]
classmethod load(file_path=None, cfg=None, state_dict=None)[source]
copy()
property device
freeze()
get_loss()
get_reg_loss()
save(file_path)
unfreeze()

Variational Auto-Encoders

class hyperion.torch.models.vae.vae.VAE(*args: Any, **kwargs: Any)[source]
Variational Autoencoder class

From: https://arxiv.org/abs/1312.6114

encoder_net

NArch encoder network object

decoder_net

NArch decoder network object

z_dim

latent variable dimension

kldiv_weight

weight KL divergene when computing ELBO

qz_pdf

type of prob distribution of the approx. latent posterior

pz_pdf

type of prob distribution of the latent prior

px_pdf

type of prob distribution for the data likelihood

flatten_spatial

if True all time/spatial dimensions are generated from a single latent vector, if False, we have multiple latents depending on the data size.

spatial_shape

shape of the data, only needed if flatten_spatial=True

scale_invariant

for future use

data_scale = for future use
__init__(encoder_net, decoder_net, z_dim, kldiv_weight=1, qz_pdf='normal-glob-diag-cov', pz_pdf='std-normal', px_pdf='normal-glob-diag-cov', flatten_spatial=False, spatial_shape=None, scale_invariant=False, data_scale=None)[source]
property pz
forward(x, x_target=None, return_x_mean=False, return_x_sample=False, return_z_sample=False, return_px=False, return_qz=False, serialize_pdfs=True, use_amp=False)[source]
compute_qz(x)[source]
compute_px_given_z(z, x_shape=None)[source]
get_config()[source]
classmethod load(file_path=None, cfg=None, state_dict=None)[source]
static filter_args(**kwargs)[source]
static add_class_args(parser, prefix=None)[source]
static add_argparse_args(parser, prefix=None)
copy()
property device
freeze()
get_loss()
get_reg_loss()
save(file_path)
unfreeze()
class hyperion.torch.models.vae.vq_vae.VQVAE(*args: Any, **kwargs: Any)[source]
Vector Quantized Variational Autoencoder class

From: https://arxiv.org/abs/1711.00937

encoder_net

NArch encoder network object

decoder_net

NArch decoder network object

z_dim

latent variable dimension

kldiv_weight

weight KL divergene when computing ELBO

diversity_weight

weigth for log-perplexity of the codebook, it inteds to maximize the number of codewords used.

vq_type

type of vector quantizer

vq_gropus

number of vector quantization groups.

vq_clusters

number of codewords in each vq group

vq_commitment_cost

weigth of the commitmenet loss

vq_ema_gamma

exponential moving average decay coeff.

vq_ema_eps

Laplace smoothing parameter

px_pdf

type of prob distribution for the data likelihood

flatten_spatial

if True all time/spatial dimensions are generated from a single latent vector, if False, we have multiple latents depending on the data size.

spatial_shape

shape of the data, only needed if flatten_spatial=True

scale_invariant

for future use

data_scale = for future use
__init__(encoder_net, decoder_net, z_dim, kldiv_weight=1, diversity_weight=0.1, vq_type='multi-ema-k-means-vq', vq_groups=1, vq_clusters=64, vq_commitment_cost=0.25, vq_ema_gamma=0.99, vq_ema_eps=1e-05, px_pdf='normal-glob-diag-cov', flatten_spatial=False, spatial_shape=None, scale_invariant=False, data_scale=None)[source]
forward(x, x_target=None, return_x_mean=False, return_x_sample=False, return_z_sample=False, return_px=False, serialize_pdfs=True, use_amp=False)[source]
compute_z(x)[source]
compute_px_given_z(z, x_shape=None)[source]
get_config()[source]
classmethod load(file_path=None, cfg=None, state_dict=None)[source]
static filter_args(**kwargs)[source]
static add_class_args(parser, prefix=None)[source]
static add_argparse_args(parser, prefix=None)
copy()
property device
freeze()
get_loss()
get_reg_loss()
save(file_path)
unfreeze()

Losses

Custom loss classes

class hyperion.torch.losses.bce_with_llr.BCEWithLLR(p_tar=0.5)[source]
__init__(p_tar=0.5)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x, y)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

T_destination

alias of TypeVar(‘T_destination’, bound=Mapping[str, torch.Tensor])

_get_backward_hooks()

Returns the backward hooks for use in the call function. It returns two lists, one with the full backward hooks and one with the non-full backward hooks.

_load_from_state_dict(state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs)

Copies parameters and buffers from state_dict into only this module, but not its descendants. This is called on every submodule in load_state_dict(). Metadata saved for this module in input state_dict is provided as local_metadata. For state dicts without metadata, local_metadata is empty. Subclasses can achieve class-specific backward compatible loading using the version number at local_metadata.get(“version”, None).

Note

state_dict is not the same object as the input state_dict to load_state_dict(). So it can be modified.

Parameters
  • state_dict (dict) – a dict containing parameters and persistent buffers.

  • prefix (str) – the prefix for parameters and buffers used in this module

  • local_metadata (dict) – a dict containing the metadata for this module. See

  • strict (bool) – whether to strictly enforce that the keys in state_dict with prefix match the names of parameters and buffers in this module

  • missing_keys (list of str) – if strict=True, add missing keys to this list

  • unexpected_keys (list of str) – if strict=True, add unexpected keys to this list

  • error_msgs (list of str) – error messages should be added to this list, and will be reported together in load_state_dict()

_named_members(get_members_fn, prefix='', recurse=True)

Helper method for yielding various names + members of modules.

_register_load_state_dict_pre_hook(hook)

These hooks will be called with arguments: state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs, before loading state_dict into self. These arguments are exactly the same as those of _load_from_state_dict.

_register_state_dict_hook(hook)

These hooks will be called with arguments: self, state_dict, prefix, local_metadata, after the state_dict of self is set. Note that only parameters and buffers of self or its children are guaranteed to exist in state_dict. The hooks may modify state_dict inplace or return a new one.

_save_to_state_dict(destination, prefix, keep_vars)

Saves module state to destination dictionary, containing a state of the module, but not its descendants. This is called on every submodule in state_dict().

In rare cases, subclasses can achieve class-specific behavior by overriding this method with custom logic.

Parameters
  • destination (dict) – a dict where state will be stored

  • prefix (str) – the prefix for parameters and buffers used in this module

add_module(name: str, module: Optional[torch.nn.modules.module.Module]) None

Adds a child module to the current module.

The module can be accessed as an attribute using the given name.

Parameters
  • name (string) – name of the child module. The child module can be accessed from this module using the given name

  • module (Module) – child module to be added to the module.

apply(fn: Callable[[torch.nn.modules.module.Module], None]) torch.nn.modules.module.T

Applies fn recursively to every submodule (as returned by .children()) as well as self. Typical use includes initializing the parameters of a model (see also nn-init-doc).

Parameters

fn (Module -> None) – function to be applied to each submodule

Returns

self

Return type

Module

Example:

>>> @torch.no_grad()
>>> def init_weights(m):
>>>     print(m)
>>>     if type(m) == nn.Linear:
>>>         m.weight.fill_(1.0)
>>>         print(m.weight)
>>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
>>> net.apply(init_weights)
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[ 1.,  1.],
        [ 1.,  1.]])
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[ 1.,  1.],
        [ 1.,  1.]])
Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
bfloat16() torch.nn.modules.module.T

Casts all floating point parameters and buffers to bfloat16 datatype.

Returns

self

Return type

Module

buffers(recurse: bool = True) Iterator[torch.Tensor]

Returns an iterator over module buffers.

Parameters

recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.

Yields

torch.Tensor – module buffer

Example:

>>> for buf in model.buffers():
>>>     print(type(buf), buf.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)
children() Iterator[torch.nn.modules.module.Module]

Returns an iterator over immediate children modules.

Yields

Module – a child module

cpu() torch.nn.modules.module.T

Moves all model parameters and buffers to the CPU.

Returns

self

Return type

Module

cuda(device: Optional[Union[int, torch.device]] = None) torch.nn.modules.module.T

Moves all model parameters and buffers to the GPU.

This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.

Parameters

device (int, optional) – if specified, all parameters will be copied to that device

Returns

self

Return type

Module

double() torch.nn.modules.module.T

Casts all floating point parameters and buffers to double datatype.

Returns

self

Return type

Module

dump_patches: bool = False

This allows better BC support for load_state_dict(). In state_dict(), the version number will be saved as in the attribute _metadata of the returned state dict, and thus pickled. _metadata is a dictionary with keys that follow the naming convention of state dict. See _load_from_state_dict on how to use this information in loading.

If new parameters/buffers are added/removed from a module, this number shall be bumped, and the module’s _load_from_state_dict method can compare the version number and do appropriate changes if the state dict is from before the change.

eval() torch.nn.modules.module.T

Sets the module in evaluation mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

This is equivalent with self.train(False).

Returns

self

Return type

Module

extra_repr() str

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

float() torch.nn.modules.module.T

Casts all floating point parameters and buffers to float datatype.

Returns

self

Return type

Module

half() torch.nn.modules.module.T

Casts all floating point parameters and buffers to half datatype.

Returns

self

Return type

Module

load_state_dict(state_dict: OrderedDict[str, Tensor], strict: bool = True)

Copies parameters and buffers from state_dict into this module and its descendants. If strict is True, then the keys of state_dict must exactly match the keys returned by this module’s state_dict() function.

Parameters
  • state_dict (dict) – a dict containing parameters and persistent buffers.

  • strict (bool, optional) – whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict() function. Default: True

Returns

  • missing_keys is a list of str containing the missing keys

  • unexpected_keys is a list of str containing the unexpected keys

Return type

NamedTuple with missing_keys and unexpected_keys fields

modules() Iterator[torch.nn.modules.module.Module]

Returns an iterator over all modules in the network.

Yields

Module – a module in the network

Note

Duplicate modules are returned only once. In the following example, l will be returned only once.

Example:

>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.modules()):
        print(idx, '->', m)

0 -> Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
1 -> Linear(in_features=2, out_features=2, bias=True)
named_buffers(prefix: str = '', recurse: bool = True) Iterator[Tuple[str, torch.Tensor]]

Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

Parameters
  • prefix (str) – prefix to prepend to all buffer names.

  • recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.

Yields

(string, torch.Tensor) – Tuple containing the name and buffer

Example:

>>> for name, buf in self.named_buffers():
>>>    if name in ['running_var']:
>>>        print(buf.size())
named_children() Iterator[Tuple[str, torch.nn.modules.module.Module]]

Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

Yields

(string, Module) – Tuple containing a name and child module

Example:

>>> for name, module in model.named_children():
>>>     if name in ['conv4', 'conv5']:
>>>         print(module)
named_modules(memo: Optional[Set[torch.nn.modules.module.Module]] = None, prefix: str = '')

Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

Yields

(string, Module) – Tuple of name and module

Note

Duplicate modules are returned only once. In the following example, l will be returned only once.

Example:

>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.named_modules()):
        print(idx, '->', m)

0 -> ('', Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
))
1 -> ('0', Linear(in_features=2, out_features=2, bias=True))
named_parameters(prefix: str = '', recurse: bool = True) Iterator[Tuple[str, torch.nn.parameter.Parameter]]

Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

Parameters
  • prefix (str) – prefix to prepend to all parameter names.

  • recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.

Yields

(string, Parameter) – Tuple containing the name and parameter

Example:

>>> for name, param in self.named_parameters():
>>>    if name in ['bias']:
>>>        print(param.size())
parameters(recurse: bool = True) Iterator[torch.nn.parameter.Parameter]

Returns an iterator over module parameters.

This is typically passed to an optimizer.

Parameters

recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.

Yields

Parameter – module parameter

Example:

>>> for param in model.parameters():
>>>     print(type(param), param.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)
register_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) torch.utils.hooks.RemovableHandle

Registers a backward hook on the module.

This function is deprecated in favor of nn.Module.register_full_backward_hook() and the behavior of this function will change in future versions.

Returns

a handle that can be used to remove the added hook by calling handle.remove()

Return type

torch.utils.hooks.RemovableHandle

register_buffer(name: str, tensor: Optional[torch.Tensor], persistent: bool = True) None

Adds a buffer to the module.

This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s running_mean is not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by setting persistent to False. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’s state_dict.

Buffers can be accessed as attributes using given names.

Parameters
  • name (string) – name of the buffer. The buffer can be accessed from this module using the given name

  • tensor (Tensor) – buffer to be registered.

  • persistent (bool) – whether the buffer is part of this module’s state_dict.

Example:

>>> self.register_buffer('running_mean', torch.zeros(num_features))
register_forward_hook(hook: Callable[[...], None]) torch.utils.hooks.RemovableHandle

Registers a forward hook on the module.

The hook will be called every time after forward() has computed an output. It should have the following signature:

hook(module, input, output) -> None or modified output

The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the forward. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called after forward() is called.

Returns

a handle that can be used to remove the added hook by calling handle.remove()

Return type

torch.utils.hooks.RemovableHandle

register_forward_pre_hook(hook: Callable[[...], None]) torch.utils.hooks.RemovableHandle

Registers a forward pre-hook on the module.

The hook will be called every time before forward() is invoked. It should have the following signature:

hook(module, input) -> None or modified input

The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the forward. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple).

Returns

a handle that can be used to remove the added hook by calling handle.remove()

Return type

torch.utils.hooks.RemovableHandle

register_full_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) torch.utils.hooks.RemovableHandle

Registers a backward hook on the module.

The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature:

hook(module, grad_input, grad_output) -> tuple(Tensor) or None

The grad_input and grad_output are tuples that contain the gradients with respect to the inputs and outputs respectively. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place of grad_input in subsequent computations. grad_input will only correspond to the inputs given as positional arguments and all kwarg arguments are ignored. Entries in grad_input and grad_output will be None for all non-Tensor arguments.

Warning

Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error.

Returns

a handle that can be used to remove the added hook by calling handle.remove()

Return type

torch.utils.hooks.RemovableHandle

register_parameter(name: str, param: Optional[torch.nn.parameter.Parameter]) None

Adds a parameter to the module.

The parameter can be accessed as an attribute using given name.

Parameters
  • name (string) – name of the parameter. The parameter can be accessed from this module using the given name

  • param (Parameter) – parameter to be added to the module.

requires_grad_(requires_grad: bool = True) torch.nn.modules.module.T

Change if autograd should record operations on parameters in this module.

This method sets the parameters’ requires_grad attributes in-place.

This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training).

Parameters

requires_grad (bool) – whether autograd should record operations on parameters in this module. Default: True.

Returns

self

Return type

Module

share_memory() torch.nn.modules.module.T
state_dict(destination=None, prefix='', keep_vars=False)

Returns a dictionary containing a whole state of the module.

Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names.

Returns

a dictionary containing a whole state of the module

Return type

dict

Example:

>>> module.state_dict().keys()
['bias', 'weight']
to(*args, **kwargs)

Moves and/or casts the parameters and buffers.

This can be called as

to(device=None, dtype=None, non_blocking=False)
to(dtype, non_blocking=False)
to(tensor, non_blocking=False)
to(memory_format=torch.channels_last)

Its signature is similar to torch.Tensor.to(), but only accepts floating point or complex dtype`s. In addition, this method will only cast the floating point or complex parameters and buffers to :attr:`dtype (if given). The integral parameters and buffers will be moved device, if that is given, but with dtypes unchanged. When non_blocking is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.

See below for examples.

Note

This method modifies the module in-place.

Parameters
  • device (torch.device) – the desired device of the parameters and buffers in this module

  • dtype (torch.dtype) – the desired floating point or complex dtype of the parameters and buffers in this module

  • tensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module

  • memory_format (torch.memory_format) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)

Returns

self

Return type

Module

Examples:

>>> linear = nn.Linear(2, 2)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]])
>>> linear.to(torch.double)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]], dtype=torch.float64)
>>> gpu1 = torch.device("cuda:1")
>>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
>>> cpu = torch.device("cpu")
>>> linear.to(cpu)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16)

>>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble)
>>> linear.weight
Parameter containing:
tensor([[ 0.3741+0.j,  0.2382+0.j],
        [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128)
>>> linear(torch.ones(3, 2, dtype=torch.cdouble))
tensor([[0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)
train(mode: bool = True) torch.nn.modules.module.T

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns

self

Return type

Module

type(dst_type: Union[torch.dtype, str]) torch.nn.modules.module.T

Casts all parameters and buffers to dst_type.

Parameters

dst_type (type or string) – the desired type

Returns

self

Return type

Module

xpu(device: Optional[Union[int, torch.device]] = None) torch.nn.modules.module.T

Moves all model parameters and buffers to the XPU.

This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on XPU while being optimized.

Parameters

device (int, optional) – if specified, all parameters will be copied to that device

Returns

self

Return type

Module

zero_grad(set_to_none: bool = False) None

Sets gradients of all model parameters to zero. See similar function under torch.optim.Optimizer for more context.

Parameters

set_to_none (bool) – instead of setting to zero, set the grads to None. See torch.optim.Optimizer.zero_grad() for details.

training: bool

Adversarial Attacks

It contains classes to generate adversarial attacks for speaker recognition.

Attack Generation Classes

All the adv. attacks derive from the same base class:

class hyperion.torch.adv_attacks.adv_attack.AdvAttack(model, loss=None, targeted=True, range_min=None, range_max=None)[source]
__init__(model, loss=None, targeted=True, range_min=None, range_max=None)[source]
to(device)[source]
property attack_info
generate(input, target)[source]

FGSM

class hyperion.torch.adv_attacks.fgsm_attack.FGSMAttack(model, eps, loss=None, targeted=False, range_min=None, range_max=None)[source]
__init__(model, eps, loss=None, targeted=False, range_min=None, range_max=None)[source]
property attack_info
generate(input, target)[source]
to(device)
class hyperion.torch.adv_attacks.snr_fgsm_attack.SNRFGSMAttack(model, snr, loss=None, targeted=False, range_min=None, range_max=None)[source]
__init__(model, snr, loss=None, targeted=False, range_min=None, range_max=None)[source]
property attack_info
generate(input, target)[source]
to(device)
class hyperion.torch.adv_attacks.rand_fgsm_attack.RandFGSMAttack(model, eps, alpha, loss=None, targeted=False, range_min=None, range_max=None)[source]
__init__(model, eps, alpha, loss=None, targeted=False, range_min=None, range_max=None)[source]
property attack_info
generate(input, target)[source]
to(device)
class hyperion.torch.adv_attacks.iter_fgsm_attack.IterFGSMAttack(model, eps, alpha, loss=None, targeted=False, range_min=None, range_max=None)[source]
__init__(model, eps, alpha, loss=None, targeted=False, range_min=None, range_max=None)[source]
property attack_info
generate(input, target)[source]
to(device)

PGD

class hyperion.torch.adv_attacks.pgd_attack.PGDAttack(model, eps, alpha, norm, max_iter=10, random_eps=False, num_random_init=0, loss=None, norm_time=False, time_dim=None, targeted=False, range_min=None, range_max=None)[source]
__init__(model, eps, alpha, norm, max_iter=10, random_eps=False, num_random_init=0, loss=None, norm_time=False, time_dim=None, targeted=False, range_min=None, range_max=None)[source]
property attack_info
static _random_sphere(shape, eps, norm, dtype, device)[source]

We use Theorem 1 in https://arxiv.org/pdf/math/0503650.pdf to sample uniformly from l_p balls in R^n

generate(input, target)[source]
to(device)

Carlini-Wagner

Carlini-Wagner attacks derive from the same base class:

class hyperion.torch.adv_attacks.carlini_wagner.CarliniWagner(model, confidence=0.0, lr=0.01, max_iter=10000, abort_early=True, initial_c=0.001, norm_time=False, time_dim=None, use_snr=False, targeted=False, range_min=None, range_max=None)[source]
__init__(model, confidence=0.0, lr=0.01, max_iter=10000, abort_early=True, initial_c=0.001, norm_time=False, time_dim=None, use_snr=False, targeted=False, range_min=None, range_max=None)[source]
property attack_info
static atanh(x, eps=1e-06)[source]
x_w(w)[source]
w_x(x)[source]
f(z, target)[source]
generate(input, target)[source]
to(device)
class hyperion.torch.adv_attacks.carlini_wagner_l2.CarliniWagnerL2(model, confidence=0.0, lr=0.01, binary_search_steps=9, max_iter=10000, abort_early=True, initial_c=0.001, norm_time=False, time_dim=None, use_snr=False, targeted=False, range_min=None, range_max=None)[source]
__init__(model, confidence=0.0, lr=0.01, binary_search_steps=9, max_iter=10000, abort_early=True, initial_c=0.001, norm_time=False, time_dim=None, use_snr=False, targeted=False, range_min=None, range_max=None)[source]
property attack_info
generate(input, target)[source]
static atanh(x, eps=1e-06)
f(z, target)
to(device)
w_x(x)
x_w(w)
class hyperion.torch.adv_attacks.carlini_wagner_linf.CarliniWagnerLInf(model, confidence=0.0, lr=0.01, max_iter=10000, abort_early=True, initial_c=0.001, reduce_c=False, c_incr_factor=2, tau_decr_factor=0.9, targeted=False, range_min=None, range_max=None)[source]
__init__(model, confidence=0.0, lr=0.01, max_iter=10000, abort_early=True, initial_c=0.001, reduce_c=False, c_incr_factor=2, tau_decr_factor=0.9, targeted=False, range_min=None, range_max=None)[source]
property attack_info
generate(input, target)[source]
static atanh(x, eps=1e-06)
f(z, target)
to(device)
w_x(x)
x_w(w)
class hyperion.torch.adv_attacks.carlini_wagner_l0.CarliniWagnerL0(model, confidence=0.0, lr=0.01, max_iter=10000, abort_early=True, initial_c=0.001, reduce_c=False, c_incr_factor=2, indep_channels=False, targeted=False, range_min=None, range_max=None)[source]
__init__(model, confidence=0.0, lr=0.01, max_iter=10000, abort_early=True, initial_c=0.001, reduce_c=False, c_incr_factor=2, indep_channels=False, targeted=False, range_min=None, range_max=None)[source]
property attack_info
generate(input, target)[source]
static atanh(x, eps=1e-06)
f(z, target)
to(device)
w_x(x)
x_w(w)

Attack Generator Factories

These are factory classes that create attack generator objects. They create attacks from Hyperion or from the Adversarial Robustness Toolbox <https://github.com/Trusted-AI/adversarial-robustness-toolbox>

class hyperion.torch.adv_attacks.attack_factory.AttackFactory[source]
static create(model, attack_type, eps=0, snr=100, alpha=0, norm=inf, random_eps=False, num_random_init=0, confidence=0.0, lr=0.01, binary_search_steps=9, max_iter=10, abort_early=True, c=0.001, reduce_c=False, c_incr_factor=2, tau_decr_factor=0.9, indep_channels=False, norm_time=False, time_dim=None, use_snr=False, loss=None, targeted=False, range_min=None, range_max=None, eps_scale=1)[source]
static filter_args(**kwargs)[source]
static add_class_args(parser, prefix=None)[source]
static add_argparse_args(parser, prefix=None)
class hyperion.torch.adv_attacks.random_attack_factory.RandomAttackFactory(attack_types, min_eps=1e-05, max_eps=0.1, min_snr=30, max_snr=60, min_alpha=1e-05, max_alpha=0.02, norms=[inf], random_eps=False, min_num_random_init=0, max_num_random_init=3, min_confidence=0, max_confidence=1, min_lr=0.001, max_lr=0.01, min_binary_search_steps=9, max_binary_search_steps=9, min_iter=5, max_iter=10, abort_early=True, min_c=0.001, max_c=0.01, reduce_c=False, c_incr_factor=2, tau_decr_factor=0.9, indep_channels=False, norm_time=False, time_dim=None, use_snr=False, loss=None, targeted=False, range_min=None, range_max=None, eps_scale=1)[source]
__init__(attack_types, min_eps=1e-05, max_eps=0.1, min_snr=30, max_snr=60, min_alpha=1e-05, max_alpha=0.02, norms=[inf], random_eps=False, min_num_random_init=0, max_num_random_init=3, min_confidence=0, max_confidence=1, min_lr=0.001, max_lr=0.01, min_binary_search_steps=9, max_binary_search_steps=9, min_iter=5, max_iter=10, abort_early=True, min_c=0.001, max_c=0.01, reduce_c=False, c_incr_factor=2, tau_decr_factor=0.9, indep_channels=False, norm_time=False, time_dim=None, use_snr=False, loss=None, targeted=False, range_min=None, range_max=None, eps_scale=1)[source]
sample_attack(model=None)[source]
static filter_args(**kwargs)[source]
static add_class_args(parser, prefix=None)[source]
static add_argparse_args(parser, prefix=None)
class hyperion.torch.adv_attacks.art_attack_factory.ARTAttackFactory[source]
static create(model, attack_type, eps=0, delta=0.01, step_adapt=0.667, num_trial=25, sample_size=20, init_size=100, norm=inf, eps_step=0.1, num_random_init=0, minimal=False, random_eps=False, min_eps=None, beta=0.001, theta=0.1, gamma=1.0, etha=0.01, confidence=0.0, lr=0.01, lr_decay=0.5, lr_num_decay=20, momentum=0.8, binary_search_steps=9, max_iter=10, overshoot=1.1, num_grads=10, c=0.001, max_halving=5, max_doubling=5, decision_rule='EN', init_eval=100, max_eval=10000, num_parallel=128, variable_h=0.0001, use_importance=False, abort_early=True, th=None, sigma=0.5, lambda_tv=0.3, labmda_c=1.0, lambda_s=0.5, reg=3000, kernel_size=5, eps_factor=1.1, eps_iter=10, conj_sinkhorn_iter=400, proj_sinkhorn_iter=400, targeted=False, num_samples=1, eps_scale=1, batch_size=1)[source]
static filter_args(**kwargs)[source]
static add_class_args(parser, prefix=None)[source]
static add_argparse_args(parser, prefix=None)

Trainers

Generic Trainer

class hyperion.torch.trainers.torch_trainer.TorchTrainer(model, loss, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

Base Trainer class to train basic neural network models

model

model object.

loss

nn.Module loss class

optim

pytorch optimizer object or optimizer options dict

epochs

max. number of epochs

exp_path

experiment output path

cur_epoch

current epoch

grad_acc_steps

gradient accumulation steps to simulate larger batch size.

device

cpu/gpu device

metrics

extra metrics to compute besides cxe.

lrsched

learning rate scheduler object

loggers

LoggerList object, loggers write training progress to std. output and file.

ddp

if True use distributed data parallel training

ddp_type

type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)

train_mode

training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]

use_amp

uses mixed precision training.

log_interval

number of optim. steps between log outputs

use_tensorboard

use tensorboard logger

use_wandb

use wandb logger

wandb

wandb dictionary of options

grad_clip

norm to clip gradients, if 0 there is no clipping

grad_clip_norm

norm type to clip gradients

swa_start

epoch to start doing swa

swa_lr

SWA learning rate

swa_anneal_epochs

SWA learning rate anneal epochs

cpu_offload

CPU offload of gradients when using fully sharded ddp

__init__(model, loss, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
fit(train_data, val_data=None)[source]

Training function, it performs the training and validation epochs

Parameters
  • train_data – PyTorch data loader for the training loop

  • val_data – PyTorch data loader for the validation loop

set_train_mode()[source]
train_epoch(data_loader)[source]

Training epoch loop

Parameters

data_loader – PyTorch data loader return input/output pairs

validation_epoch(data_loader, swa_update_bn=False)[source]

Validation epoch loop

Parameters

data_loader – PyTorch data loader return input/output pairs

bn_update_epoch(data_loader)[source]
update_model()[source]
_default_loggers(log_interval, use_tensorboard, use_wandb, wandb)[source]

Creates the default data loaders

_get_lr()[source]

Returns the current learning rate to show in the loggers

checkpoint(logs=None)[source]

Creates a checkpoint of the training, to save and posterior recovery

Parameters

logs – logs containing the current value of the metrics.

save_checkpoint(logs=None)[source]

Saves a checkpoint of the training status

Parameters

logs – logs containing the current value of the metrics.

save_swa_model(logs=None)[source]

Saves a checkpoint of the training status

Parameters

logs – logs containing the current value of the metrics.

load_checkpoint(file_path)[source]

Loads a training checkpoint from file.

Parameters

file_path – checkpoint file path

load_last_checkpoint()[source]

Loads the last training checkpoint in the experiment dir.

static filter_args(**kwargs)[source]
static add_class_args(parser, prefix=None, skip=[])[source]
static add_argparse_args(parser, prefix=None, skip=[])

x-Vector Trainers

class hyperion.torch.trainers.xvector_trainer.XVectorTrainer(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', loss=None, train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

Trainer to train x-vector style models.

model

x-Vector model object.

optim

pytorch optimizer object or options dict

epochs

max. number of epochs

exp_path

experiment output path

cur_epoch

current epoch

grad_acc_steps

gradient accumulation steps to simulate larger batch size.

device

cpu/gpu device

metrics

extra metrics to compute besides cxe.

lrsched

learning rate scheduler object or options dict

loggers

LoggerList object, loggers write training progress to std. output and file. If None, it uses default loggers.

ddp

if True use distributed data parallel training

ddp_type

type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)

loss

if None, it uses cross-entropy

train_mode

training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]

use_amp

uses mixed precision training.

log_interval

number of optim. steps between log outputs

use_tensorboard

use tensorboard logger

use_wandb

use wandb logger

wandb

wandb dictionary of options

grad_clip

norm to clip gradients, if 0 there is no clipping

grad_clip_norm

norm type to clip gradients

swa_start

epoch to start doing swa

swa_lr

SWA learning rate

swa_anneal_epochs

SWA learning rate anneal epochs

cpu_offload

CPU offload of gradients when using fully sharded ddp

__init__(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', loss=None, train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
train_epoch(data_loader)[source]

Training epoch loop

Parameters

data_loader – pytorch data loader returning features and class labels.

_default_loggers(log_interval, use_tensorboard, use_wandb, wandb)

Creates the default data loaders

_get_lr()

Returns the current learning rate to show in the loggers

static add_argparse_args(parser, prefix=None, skip=[])
static add_class_args(parser, prefix=None, skip=[])
bn_update_epoch(data_loader)
checkpoint(logs=None)

Creates a checkpoint of the training, to save and posterior recovery

Parameters

logs – logs containing the current value of the metrics.

static filter_args(**kwargs)
fit(train_data, val_data=None)

Training function, it performs the training and validation epochs

Parameters
  • train_data – PyTorch data loader for the training loop

  • val_data – PyTorch data loader for the validation loop

load_checkpoint(file_path)

Loads a training checkpoint from file.

Parameters

file_path – checkpoint file path

load_last_checkpoint()

Loads the last training checkpoint in the experiment dir.

save_checkpoint(logs=None)

Saves a checkpoint of the training status

Parameters

logs – logs containing the current value of the metrics.

save_swa_model(logs=None)

Saves a checkpoint of the training status

Parameters

logs – logs containing the current value of the metrics.

set_train_mode()
update_model()
validation_epoch(data_loader, swa_update_bn=False)

Validation epoch loop

Parameters

data_loader – PyTorch data loader return input/output pairs

class hyperion.torch.trainers.xvector_trainer_from_wav.XVectorTrainerFromWav(model, feat_extractor, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', loss=None, train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

Trainer to train x-vector style models.

model

x-Vector model object.

feat_extractor

feature extractor nn.Module

optim

pytorch optimizer object or options dict

epochs

max. number of epochs

exp_path

experiment output path

cur_epoch

current epoch

grad_acc_steps

gradient accumulation steps to simulate larger batch size.

device

cpu/gpu device

metrics

extra metrics to compute besides cxe.

lrsched

learning rate scheduler object or options dict.

loggers

LoggerList object, loggers write training progress to std. output and file.

ddp

if True use distributed data parallel training

ddp_type

type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)

loss

if None, it uses cross-entropy

train_mode

training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]

use_amp

uses mixed precision training.

log_interval

number of optim. steps between log outputs

use_tensorboard

use tensorboard logger

use_wandb

use wandb logger

wandb

wandb dictionary of options

grad_clip

norm to clip gradients, if 0 there is no clipping

grad_clip_norm

norm type to clip gradients

swa_start

epoch to start doing swa

swa_lr

SWA learning rate

swa_anneal_epochs

SWA learning rate anneal epochs

cpu_offload

CPU offload of gradients when using fully sharded ddp

__init__(model, feat_extractor, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', loss=None, train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
train_epoch(data_loader)[source]

Training epoch loop

Parameters

data_loader – pytorch data loader returning features and class labels.

validation_epoch(data_loader, swa_update_bn=False)[source]

Validation epoch loop

Parameters

data_loader – PyTorch data loader return input/output pairs

_default_loggers(log_interval, use_tensorboard, use_wandb, wandb)

Creates the default data loaders

_get_lr()

Returns the current learning rate to show in the loggers

static add_argparse_args(parser, prefix=None, skip=[])
static add_class_args(parser, prefix=None, skip=[])
bn_update_epoch(data_loader)
checkpoint(logs=None)

Creates a checkpoint of the training, to save and posterior recovery

Parameters

logs – logs containing the current value of the metrics.

static filter_args(**kwargs)
fit(train_data, val_data=None)

Training function, it performs the training and validation epochs

Parameters
  • train_data – PyTorch data loader for the training loop

  • val_data – PyTorch data loader for the validation loop

load_checkpoint(file_path)

Loads a training checkpoint from file.

Parameters

file_path – checkpoint file path

load_last_checkpoint()

Loads the last training checkpoint in the experiment dir.

save_checkpoint(logs=None)

Saves a checkpoint of the training status

Parameters

logs – logs containing the current value of the metrics.

save_swa_model(logs=None)

Saves a checkpoint of the training status

Parameters

logs – logs containing the current value of the metrics.

set_train_mode()
update_model()
class hyperion.torch.trainers.xvector_trainer_deep_feat_reg.XVectorTrainerDeepFeatReg(model, prior_model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, reg_layers_enc=None, reg_layers_classif=None, reg_weight_enc=0.1, reg_weight_classif=0.1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', loss=None, reg_loss=None, train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

Trainer to train x-vector style models.

model

x-Vector model object that we want to fine-tune

prior_model

x-Vector model object that we use as regularizer

optim

pytorch optimizer object or options dict

epochs

max. number of epochs

exp_path

experiment output path

cur_epoch

current epoch

grad_acc_steps

gradient accumulation steps to simulate larger batch size.

reg_layers_enc

list of encoder layer indexes that we use for regularization

reg_layers_classif

list of classification head layer indexes that we use for regularization

reg_weight_enc

weight of the regularization loss for encoder hidden activations

reg_weight_classif

weight of the regularization loss for classification head hidden activations

device

cpu/gpu device

metrics

extra metrics to compute besides cxe.

lrsched

learning rate scheduler object or options dict.

loggers

LoggerList object, loggers write training progress to std. output and file.

ddp

if True use distributed data parallel training

ddp_type

type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)

loss

if None, it uses cross-entropy

reg_loss

nn.Module loss used for regularization, if None it uses L1 loss.

train_mode

training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]

use_amp

uses mixed precision training.

log_interval

number of optim. steps between log outputs

use_tensorboard

use tensorboard logger

use_wandb

use wandb logger

wandb

wandb dictionary of options

grad_clip

norm to clip gradients, if 0 there is no clipping

grad_clip_norm

norm type to clip gradients

swa_start

epoch to start doing swa

swa_lr

SWA learning rate

swa_anneal_epochs

SWA learning rate anneal epochs

cpu_offload

CPU offload of gradients when using fully sharded ddp

__init__(model, prior_model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, reg_layers_enc=None, reg_layers_classif=None, reg_weight_enc=0.1, reg_weight_classif=0.1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', loss=None, reg_loss=None, train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
train_epoch(data_loader)[source]

Training epoch loop

Parameters

data_loader – PyTorch data loader return input/output pairs

static filter_args(**kwargs)[source]
static add_class_args(parser, prefix=None, skip=[])[source]
_default_loggers(log_interval, use_tensorboard, use_wandb, wandb)

Creates the default data loaders

_get_lr()

Returns the current learning rate to show in the loggers

static add_argparse_args(parser, prefix=None, skip=[])
bn_update_epoch(data_loader)
checkpoint(logs=None)

Creates a checkpoint of the training, to save and posterior recovery

Parameters

logs – logs containing the current value of the metrics.

fit(train_data, val_data=None)

Training function, it performs the training and validation epochs

Parameters
  • train_data – PyTorch data loader for the training loop

  • val_data – PyTorch data loader for the validation loop

load_checkpoint(file_path)

Loads a training checkpoint from file.

Parameters

file_path – checkpoint file path

load_last_checkpoint()

Loads the last training checkpoint in the experiment dir.

save_checkpoint(logs=None)

Saves a checkpoint of the training status

Parameters

logs – logs containing the current value of the metrics.

save_swa_model(logs=None)

Saves a checkpoint of the training status

Parameters

logs – logs containing the current value of the metrics.

set_train_mode()
update_model()
validation_epoch(data_loader, swa_update_bn=False)

Validation epoch loop

Parameters

data_loader – PyTorch data loader return input/output pairs

class hyperion.torch.trainers.xvector_trainer_deep_feat_reg_from_wav.XVectorTrainerDeepFeatRegFromWav(model, feat_extractor, prior_model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, reg_layers_enc=None, reg_layers_classif=None, reg_weight_enc=0.1, reg_weight_classif=0.1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', loss=None, reg_loss=None, train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

Trainer to train x-vector style models.

model

x-Vector model object that we want to fine-tune

feat_extractor

feature extractor nn.Module

prior_model

x-Vector model object that we use as regularizer

optim

pytorch optimizer object or options dict

epochs

max. number of epochs

exp_path

experiment output path

cur_epoch

current epoch

grad_acc_steps

gradient accumulation steps to simulate larger batch size.

reg_layers_enc

list of encoder layer indexes that we use for regularization

reg_layers_classif

list of classification head layer indexes that we use for regularization

reg_weight_enc

weight of the regularization loss for encoder hidden activations

reg_weight_classif

weight of the regularization loss for classification head hidden activations

device

cpu/gpu device

lrsched

learning rate scheduler object or options dict.

loggers

LoggerList object, loggers write training progress to std. output and file.

ddp

if True use distributed data parallel training

ddp_type

type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)

loss

if None, it uses cross-entropy

reg_loss

nn.Module loss used for regularization, if None it uses L1 loss.

train_mode

training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]

use_amp

uses mixed precision training.

log_interval

number of optim. steps between log outputs

use_tensorboard

use tensorboard logger

use_wandb

use wandb logger

wandb

wandb dictionary of options

grad_clip

norm to clip gradients, if 0 there is no clipping

grad_clip_norm

norm type to clip gradients

swa_start

epoch to start doing swa

swa_lr

SWA learning rate

swa_anneal_epochs

SWA learning rate anneal epochs

cpu_offload

CPU offload of gradients when using fully sharded ddp

__init__(model, feat_extractor, prior_model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, reg_layers_enc=None, reg_layers_classif=None, reg_weight_enc=0.1, reg_weight_classif=0.1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', loss=None, reg_loss=None, train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
train_epoch(data_loader)[source]

Training epoch loop

Parameters

data_loader – PyTorch data loader return input/output pairs

validation_epoch(data_loader, swa_update_bn=False)[source]

Validation epoch loop

Parameters

data_loader – PyTorch data loader return input/output pairs

_default_loggers(log_interval, use_tensorboard, use_wandb, wandb)

Creates the default data loaders

_get_lr()

Returns the current learning rate to show in the loggers

static add_argparse_args(parser, prefix=None, skip=[])
static add_class_args(parser, prefix=None, skip=[])
bn_update_epoch(data_loader)
checkpoint(logs=None)

Creates a checkpoint of the training, to save and posterior recovery

Parameters

logs – logs containing the current value of the metrics.

static filter_args(**kwargs)
fit(train_data, val_data=None)

Training function, it performs the training and validation epochs

Parameters
  • train_data – PyTorch data loader for the training loop

  • val_data – PyTorch data loader for the validation loop

load_checkpoint(file_path)

Loads a training checkpoint from file.

Parameters

file_path – checkpoint file path

load_last_checkpoint()

Loads the last training checkpoint in the experiment dir.

save_checkpoint(logs=None)

Saves a checkpoint of the training status

Parameters

logs – logs containing the current value of the metrics.

save_swa_model(logs=None)

Saves a checkpoint of the training status

Parameters

logs – logs containing the current value of the metrics.

set_train_mode()
update_model()

Auto-encoder Trainer

class hyperion.torch.trainers.ae_trainer.AETrainer(model, loss, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

Auto-encoder trainer class

model

model object.

loss

nn.Module loss class

optim

pytorch optimizer object or optimizer options dict

epochs

max. number of epochs

exp_path

experiment output path

cur_epoch

current epoch

grad_acc_steps

gradient accumulation steps to simulate larger batch size.

device

cpu/gpu device

metrics

extra metrics to compute besides cxe.

lrsched

learning rate scheduler object

loggers

LoggerList object, loggers write training progress to std. output and file.

ddp

if True use distributed data parallel training

ddp_type

type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)

train_mode

training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]

use_amp

uses mixed precision training.

log_interval

number of optim. steps between log outputs

use_tensorboard

use tensorboard logger

use_wandb

use wandb logger

wandb

wandb dictionary of options

grad_clip

norm to clip gradients, if 0 there is no clipping

grad_clip_norm

norm type to clip gradients

swa_start

epoch to start doing swa

swa_lr

SWA learning rate

swa_anneal_epochs

SWA learning rate anneal epochs

cpu_offload

CPU offload of gradients when using fully sharded ddp

__init__(model, loss, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
train_epoch(data_loader)[source]

Training epoch loop

Parameters

data_loader – pytorch data loader returning features and class labels.

validation_epoch(data_loader, swa_update_bn=False)[source]

Validation epoch loop

Parameters

data_loader – PyTorch data loader return input/output pairs

_default_loggers(log_interval, use_tensorboard, use_wandb, wandb)

Creates the default data loaders

_get_lr()

Returns the current learning rate to show in the loggers

static add_argparse_args(parser, prefix=None, skip=[])
static add_class_args(parser, prefix=None, skip=[])
bn_update_epoch(data_loader)
checkpoint(logs=None)

Creates a checkpoint of the training, to save and posterior recovery

Parameters

logs – logs containing the current value of the metrics.

static filter_args(**kwargs)
fit(train_data, val_data=None)

Training function, it performs the training and validation epochs

Parameters
  • train_data – PyTorch data loader for the training loop

  • val_data – PyTorch data loader for the validation loop

load_checkpoint(file_path)

Loads a training checkpoint from file.

Parameters

file_path – checkpoint file path

load_last_checkpoint()

Loads the last training checkpoint in the experiment dir.

save_checkpoint(logs=None)

Saves a checkpoint of the training status

Parameters

logs – logs containing the current value of the metrics.

save_swa_model(logs=None)

Saves a checkpoint of the training status

Parameters

logs – logs containing the current value of the metrics.

set_train_mode()
update_model()

VAE Trainers

class hyperion.torch.trainers.vae_trainer.VAETrainer(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

Variational Auto-encoder trainer class

model

model object.

optim

pytorch optimizer object or optimizer options dict

epochs

max. number of epochs

exp_path

experiment output path

cur_epoch

current epoch

grad_acc_steps

gradient accumulation steps to simulate larger batch size.

device

cpu/gpu device

metrics

extra metrics to compute besides cxe.

lrsched

learning rate scheduler object

loggers

LoggerList object, loggers write training progress to std. output and file.

ddp

if True use distributed data parallel training

ddp_type

type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)

train_mode

training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]

use_amp

uses mixed precision training.

log_interval

number of optim. steps between log outputs

log_interval

number of optim. steps between log outputs

use_tensorboard

use tensorboard logger

use_wandb

use wandb logger

grad_clip

norm to clip gradients, if 0 there is no clipping

grad_clip_norm

norm type to clip gradients

swa_start

epoch to start doing swa

swa_lr

SWA learning rate

swa_anneal_epochs

SWA learning rate anneal epochs

cpu_offload

CPU offload of gradients when using fully sharded ddp

__init__(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
train_epoch(data_loader)[source]

Training epoch loop

Parameters

data_loader – PyTorch data loader return input/output pairs

validation_epoch(data_loader, swa_update_bn=False)[source]

Validation epoch loop

Parameters

data_loader – PyTorch data loader return input/output pairs

_default_loggers(log_interval, use_tensorboard, use_wandb, wandb)

Creates the default data loaders

_get_lr()

Returns the current learning rate to show in the loggers

static add_argparse_args(parser, prefix=None, skip=[])
static add_class_args(parser, prefix=None, skip=[])
bn_update_epoch(data_loader)
checkpoint(logs=None)

Creates a checkpoint of the training, to save and posterior recovery

Parameters

logs – logs containing the current value of the metrics.

static filter_args(**kwargs)
fit(train_data, val_data=None)

Training function, it performs the training and validation epochs

Parameters
  • train_data – PyTorch data loader for the training loop

  • val_data – PyTorch data loader for the validation loop

load_checkpoint(file_path)

Loads a training checkpoint from file.

Parameters

file_path – checkpoint file path

load_last_checkpoint()

Loads the last training checkpoint in the experiment dir.

save_checkpoint(logs=None)

Saves a checkpoint of the training status

Parameters

logs – logs containing the current value of the metrics.

save_swa_model(logs=None)

Saves a checkpoint of the training status

Parameters

logs – logs containing the current value of the metrics.

set_train_mode()
update_model()
class hyperion.torch.trainers.dvae_trainer.DVAETrainer(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

Denoising VAE trainer class

model

model object.

optim

pytorch optimizer object or optimizer options dict

epochs

max. number of epochs

exp_path

experiment output path

cur_epoch

current epoch

grad_acc_steps

gradient accumulation steps to simulate larger batch size.

device

cpu/gpu device

metrics

extra metrics to compute besides cxe.

lrsched

learning rate scheduler object

loggers

LoggerList object, loggers write training progress to std. output and file.

ddp

if True use distributed data parallel training

ddp_type

type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)

train_mode

training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]

use_amp

uses mixed precision training.

log_interval

number of optim. steps between log outputs

use_tensorboard

use tensorboard logger

use_wandb

use wandb logger

wandb

wandb dictionary of options

grad_clip

norm to clip gradients, if 0 there is no clipping

grad_clip_norm

norm type to clip gradients

swa_start

epoch to start doing swa

swa_lr

SWA learning rate

swa_anneal_epochs

SWA learning rate anneal epochs

cpu_offload

CPU offload of gradients when using fully sharded ddp

__init__(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
train_epoch(data_loader)[source]

Training epoch loop

Parameters

data_loader – pytorch data loader returning noisy and clean features

validation_epoch(data_loader, swa_update_bn=False)[source]

Validation epoch loop

Parameters

data_loader – PyTorch data loader return input/output pairs

_default_loggers(log_interval, use_tensorboard, use_wandb, wandb)

Creates the default data loaders

_get_lr()

Returns the current learning rate to show in the loggers

static add_argparse_args(parser, prefix=None, skip=[])
static add_class_args(parser, prefix=None, skip=[])
bn_update_epoch(data_loader)
checkpoint(logs=None)

Creates a checkpoint of the training, to save and posterior recovery

Parameters

logs – logs containing the current value of the metrics.

static filter_args(**kwargs)
fit(train_data, val_data=None)

Training function, it performs the training and validation epochs

Parameters
  • train_data – PyTorch data loader for the training loop

  • val_data – PyTorch data loader for the validation loop

load_checkpoint(file_path)

Loads a training checkpoint from file.

Parameters

file_path – checkpoint file path

load_last_checkpoint()

Loads the last training checkpoint in the experiment dir.

save_checkpoint(logs=None)

Saves a checkpoint of the training status

Parameters

logs – logs containing the current value of the metrics.

save_swa_model(logs=None)

Saves a checkpoint of the training status

Parameters

logs – logs containing the current value of the metrics.

set_train_mode()
update_model()

VQ-VAE Trainers

class hyperion.torch.trainers.vq_vae_trainer.VQVAETrainer(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

Vector Quantized Variational Auto-encoder trainer class

model

model object.

optim

pytorch optimizer object or optimizer options dict

epochs

max. number of epochs

exp_path

experiment output path

cur_epoch

current epoch

grad_acc_steps

gradient accumulation steps to simulate larger batch size.

device

cpu/gpu device

metrics

extra metrics to compute besides cxe.

lrsched

learning rate scheduler object

loggers

LoggerList object, loggers write training progress to std. output and file.

ddp

if True use distributed data parallel training

ddp_type

type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)

train_mode

training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]

use_amp

uses mixed precision training.

log_interval

number of optim. steps between log outputs

use_tensorboard

use tensorboard logger

use_wandb

use wandb logger

wandb

wandb dictionary of options

grad_clip

norm to clip gradients, if 0 there is no clipping

grad_clip_norm

norm type to clip gradients

swa_start

epoch to start doing swa

swa_lr

SWA learning rate

swa_anneal_epochs

SWA learning rate anneal epochs

cpu_offload

CPU offload of gradients when using fully sharded ddp

__init__(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
train_epoch(data_loader)[source]

Training epoch loop

Parameters

data_loader – PyTorch data loader return input/output pairs

validation_epoch(data_loader, swa_update_bn=False)[source]

Validation epoch loop

Parameters

data_loader – PyTorch data loader return input/output pairs

_default_loggers(log_interval, use_tensorboard, use_wandb, wandb)

Creates the default data loaders

_get_lr()

Returns the current learning rate to show in the loggers

static add_argparse_args(parser, prefix=None, skip=[])
static add_class_args(parser, prefix=None, skip=[])
bn_update_epoch(data_loader)
checkpoint(logs=None)

Creates a checkpoint of the training, to save and posterior recovery

Parameters

logs – logs containing the current value of the metrics.

static filter_args(**kwargs)
fit(train_data, val_data=None)

Training function, it performs the training and validation epochs

Parameters
  • train_data – PyTorch data loader for the training loop

  • val_data – PyTorch data loader for the validation loop

load_checkpoint(file_path)

Loads a training checkpoint from file.

Parameters

file_path – checkpoint file path

load_last_checkpoint()

Loads the last training checkpoint in the experiment dir.

save_checkpoint(logs=None)

Saves a checkpoint of the training status

Parameters

logs – logs containing the current value of the metrics.

save_swa_model(logs=None)

Saves a checkpoint of the training status

Parameters

logs – logs containing the current value of the metrics.

set_train_mode()
update_model()
class hyperion.torch.trainers.vq_dvae_trainer.VQDVAETrainer(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]

Vector Quantized Variational Auto-encoder trainer class

model

model object.

optim

pytorch optimizer object or optimizer options dict

epochs

max. number of epochs

exp_path

experiment output path

cur_epoch

current epoch

grad_acc_steps

gradient accumulation steps to simulate larger batch size.

device

cpu/gpu device

metrics

extra metrics to compute besides cxe.

lrsched

learning rate scheduler object

loggers

LoggerList object, loggers write training progress to std. output and file.

ddp

if True use distributed data parallel training

ddp_type

type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)

train_mode

training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]

use_amp

uses mixed precision training.

log_interval

number of optim. steps between log outputs

use_tensorboard

use tensorboard logger

use_wandb

use wandb logger

wandb

wandb dictionary of options

grad_clip

norm to clip gradients, if 0 there is no clipping

grad_clip_norm

norm type to clip gradients

swa_start

epoch to start doing swa

swa_lr

SWA learning rate

swa_anneal_epochs

SWA learning rate anneal epochs

cpu_offload

CPU offload of gradients when using fully sharded ddp

__init__(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
train_epoch(data_loader)[source]

Training epoch loop

Parameters

data_loader – pytorch data loader returning noisy and clean features

validation_epoch(data_loader, swa_update_bn=False)[source]

Validation epoch loop

Parameters

data_loader – PyTorch data loader return input/output pairs

_default_loggers(log_interval, use_tensorboard, use_wandb, wandb)

Creates the default data loaders

_get_lr()

Returns the current learning rate to show in the loggers

static add_argparse_args(parser, prefix=None, skip=[])
static add_class_args(parser, prefix=None, skip=[])
bn_update_epoch(data_loader)
checkpoint(logs=None)

Creates a checkpoint of the training, to save and posterior recovery

Parameters

logs – logs containing the current value of the metrics.

static filter_args(**kwargs)
fit(train_data, val_data=None)

Training function, it performs the training and validation epochs

Parameters
  • train_data – PyTorch data loader for the training loop

  • val_data – PyTorch data loader for the validation loop

load_checkpoint(file_path)

Loads a training checkpoint from file.

Parameters

file_path – checkpoint file path

load_last_checkpoint()

Loads the last training checkpoint in the experiment dir.

save_checkpoint(logs=None)

Saves a checkpoint of the training status

Parameters

logs – logs containing the current value of the metrics.

save_swa_model(logs=None)

Saves a checkpoint of the training status

Parameters

logs – logs containing the current value of the metrics.

set_train_mode()
update_model()

Datasets, Data Loaders and Samplers

Datasets

Audio Datasets

class hyperion.torch.data.audio_dataset.AudioDataset(audio_path, key_file, class_file=None, time_durs_file=None, min_chunk_length=1, max_chunk_length=None, aug_cfg=None, return_fullseqs=False, return_class=True, return_clean_aug_pair=False, transpose_input=False, wav_scale=32767, is_val=False)[source]
__init__(audio_path, key_file, class_file=None, time_durs_file=None, min_chunk_length=1, max_chunk_length=None, aug_cfg=None, return_fullseqs=False, return_class=True, return_clean_aug_pair=False, transpose_input=False, wav_scale=32767, is_val=False)[source]
property wav_scale
property num_seqs
property seq_lengths
property total_length
property min_chunk_length
property max_chunk_length
property min_seq_length
property max_seq_length
property var_chunk_length
get_random_chunk_length()[source]
static filter_args(**kwargs)[source]
static add_class_args(parser, prefix=None)[source]
static add_argparse_args(parser, prefix=None)

Feature Sequence Datasets

class hyperion.torch.data.feat_seq_dataset.FeatSeqDataset(rspecifier, key_file, class_file=None, num_frames_file=None, path_prefix=None, min_chunk_length=1, max_chunk_length=None, return_fullseqs=False, return_class=True, transpose_input=True, is_val=False)[source]
__init__(rspecifier, key_file, class_file=None, num_frames_file=None, path_prefix=None, min_chunk_length=1, max_chunk_length=None, return_fullseqs=False, return_class=True, transpose_input=True, is_val=False)[source]
property num_seqs
property seq_lengths
property total_length
property min_chunk_length
property max_chunk_length
property min_seq_length
property max_seq_length
property var_chunk_length
get_random_chunk_length()[source]
static filter_args(**kwargs)[source]
static add_class_args(parser, prefix=None)[source]
static add_argparse_args(parser, prefix=None)
class hyperion.torch.data.paired_feat_seq_dataset.PairedFeatSeqDataset(rspecifier, key_file, pairs_file, class_file=None, num_frames_file=None, path_prefix=None, min_chunk_length=1, max_chunk_length=None, return_fullseqs=False, return_class=True, transpose_input=True, is_val=False)[source]
__init__(rspecifier, key_file, pairs_file, class_file=None, num_frames_file=None, path_prefix=None, min_chunk_length=1, max_chunk_length=None, return_fullseqs=False, return_class=True, transpose_input=True, is_val=False)[source]
static add_argparse_args(parser, prefix=None)
static add_class_args(parser, prefix=None)
static filter_args(**kwargs)
get_random_chunk_length()
property max_chunk_length
property max_seq_length
property min_chunk_length
property min_seq_length
property num_seqs
property seq_lengths
property total_length
property var_chunk_length

Embedding Datasets

class hyperion.torch.data.embed_dataset.EmbedDataset(embeds=None, class_ids=None, class_weights=None, rspecifier=None, key_file=None, class_file=None, path_prefix=None, preload_embeds=False, return_class=True, is_val=False)[source]
__init__(embeds=None, class_ids=None, class_weights=None, rspecifier=None, key_file=None, class_file=None, path_prefix=None, preload_embeds=False, return_class=True, is_val=False)[source]

Samplers

Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.data.weighted_seq_sampler.ClassWeightedSeqSampler(dataset, batch_size=1, iters_per_epoch='auto', num_egs_per_class=1, num_egs_per_utt=1, var_batch_size=False)[source]
__init__(dataset, batch_size=1, iters_per_epoch='auto', num_egs_per_class=1, num_egs_per_utt=1, var_batch_size=False)[source]
static filter_args(**kwargs)[source]
static add_class_args(parser, prefix=None)[source]
static add_argparse_args(parser, prefix=None)

Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.data.weighted_embed_sampler.ClassWeightedEmbedSampler(dataset, batch_size=1, iters_per_epoch=1, num_egs_per_class=1)[source]
__init__(dataset, batch_size=1, iters_per_epoch=1, num_egs_per_class=1)[source]

Data Transformations

class hyperion.torch.transforms.reshape.Reshape(shape)[source]
__init__(shape)[source]

Optimizers

These are custom optimizers and a factory class to create optimizers from config params.

Custom Optimizers

class hyperion.torch.optim.radam.RAdam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, degenerated_to_sgd=True)[source]

Implements Rectified Adam optimzier (RAdam) from

Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han. “On the Variance of the Adaptive Learning Rate and Beyond.” arXiv preprint arXiv:1908.03265 (2019).

code taken from:

https://github.com/LiyuanLucasLiu/RAdam/blob/master/radam/radam.py

__init__(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, degenerated_to_sgd=True)[source]
step(closure=None)[source]

Performs a single optimization step (parameter update).

Parameters

closure (callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.

Note

Unless otherwise specified, this function should not modify the .grad field of the parameters.

add_param_group(param_group)

Add a param group to the Optimizer s param_groups.

This can be useful when fine tuning a pre-trained network as frozen layers can be made trainable and added to the Optimizer as training progresses.

Parameters
  • param_group (dict) – Specifies what Tensors should be optimized along with group

  • options. (specific optimization) –

load_state_dict(state_dict)

Loads the optimizer state.

Parameters

state_dict (dict) – optimizer state. Should be an object returned from a call to state_dict().

state_dict()

Returns the state of the optimizer as a dict.

It contains two entries:

  • state - a dict holding current optimization state. Its content

    differs between optimizer classes.

  • param_groups - a dict containing all parameter groups

zero_grad(set_to_none: bool = False)

Sets the gradients of all optimized torch.Tensor s to zero.

Parameters

set_to_none (bool) – instead of setting to zero, set the grads to None. This will in general have lower memory footprint, and can modestly improve performance. However, it changes certain behaviors. For example: 1. When the user tries to access a gradient and perform manual ops on it, a None attribute or a Tensor full of 0s will behave differently. 2. If the user requests zero_grad(set_to_none=True) followed by a backward pass, .grads are guaranteed to be None for params that did not receive a gradient. 3. torch.optim optimizers have a different behavior if the gradient is 0 or None (in one case it does the step with a gradient of 0 and in the other it skips the step altogether).

Optimizer Factory

class hyperion.torch.optim.factory.OptimizerFactory[source]
static create(params, opt_type, lr, momentum=0, beta1=0.9, beta2=0.99, rho=0.9, eps=1e-08, weight_decay=0, amsgrad=False, nesterov=False, lambd=0.0001, asgd_alpha=0.75, t0=1000000.0, rmsprop_alpha=0.99, centered=False, lr_decay=0, init_acc_val=0, max_iter=20, oss=False)[source]
static filter_args(**kwargs)[source]
static add_class_args(parser, prefix=None)[source]
static add_argparse_args(parser, prefix=None)

Learning Rate Schedulers

These are custom learning rate schedulers and a factory class to create schedulers from config params.

Custom LR Schedulers

class hyperion.torch.lr_schedulers.red_lr_on_plateau.ReduceLROnPlateau(optimizer, monitor='val_loss', mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, warmup_steps=0, eps=1e-08)[source]

Reduce learning rate when a metric has stopped improving. Models often benefit from reducing the learning rate by a factor of 2-10 once learning stagnates. This scheduler reads a metrics quantity and if no improvement is seen for a ‘patience’ number of epochs, the learning rate is reduced.

optimizer

optimizer.

Type

Optimizer

mode

One of min, max. In min mode, lr will be reduced when the quantity monitored has stopped decreasing; in max mode it will be reduced when the quantity monitored has stopped increasing. Default: ‘min’.

Type

str

factor

Factor by which the learning rate will be reduced. new_lr = lr * factor. Default: 0.1.

Type

float

patience

Number of epochs with no improvement after which learning rate will be reduced. For example, if patience = 2, then we will ignore the first 2 epochs with no improvement, and will only decrease the LR after the 3rd epoch if the loss still hasn’t improved then. Default: 10.

Type

int

threshold

Threshold for measuring the new optimum, to only focus on significant changes. Default: 1e-4.

Type

float

threshold_mode

One of rel, abs. In rel mode, dynamic_threshold = best * ( 1 + threshold ) in ‘max’ mode or best * ( 1 - threshold ) in min mode. In abs mode, dynamic_threshold = best + threshold in max mode or best - threshold in min mode. Default: ‘rel’.

Type

str

cooldown

Number of epochs to wait before resuming normal operation after lr has been reduced. Default: 0.

Type

int

min_lr

A scalar or a list of scalars. A lower bound on the learning rate of all param groups or each group respectively. Default: 0.

Type

float or list

eps

Minimal decay applied to lr. If the difference between new and old lr is smaller than eps, the update is ignored. Default: 1e-8.

Type

float

__init__(optimizer, monitor='val_loss', mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, warmup_steps=0, eps=1e-08)[source]
_reset()[source]

Resets num_bad_epochs counter and cooldown counter.

on_opt_step()[source]
on_epoch_begin(epoch=None)[source]
on_epoch_end(metrics=None)[source]
property in_cooldown
load_state_dict(state_dict)[source]

Loads the schedulers state.

Parameters

state_dict (dict) – scheduler state. Should be an object returned from a call to state_dict().

get_lr()
get_warmup_lr()
property in_warmup
state_dict()

Returns the state of the scheduler as a dict.

It contains an entry for every variable in self.__dict__ which is not the optimizer.

class hyperion.torch.lr_schedulers.exp_lr.ExponentialLR(optimizer, decay_rate, decay_steps, hold_steps, min_lr=0, warmup_steps=0, epoch=0, step=0, update_lr_on_opt_step=False)[source]

Exponential learning rate scheduler.

__init__(optimizer, decay_rate, decay_steps, hold_steps, min_lr=0, warmup_steps=0, epoch=0, step=0, update_lr_on_opt_step=False)[source]
get_lr(step)[source]
load_state_dict(state_dict)[source]

Loads the schedulers state.

Parameters

state_dict (dict) – scheduler state. Should be an object returned from a call to state_dict().

get_warmup_lr()
property in_warmup
on_epoch_begin(epoch=None, **kwargs)
on_epoch_end(metrics=None)
on_opt_step()
state_dict()

Returns the state of the scheduler as a dict.

It contains an entry for every variable in self.__dict__ which is not the optimizer.

class hyperion.torch.lr_schedulers.invpow_lr.InvPowLR(optimizer, power=0.5, hold_steps=0, min_lr=0, warmup_steps=0, epoch=0, step=0, update_lr_on_opt_step=False)[source]

inverse power learning rate scheduler.

__init__(optimizer, power=0.5, hold_steps=0, min_lr=0, warmup_steps=0, epoch=0, step=0, update_lr_on_opt_step=False)[source]
get_lr(step)[source]
load_state_dict(state_dict)[source]

Loads the schedulers state.

Parameters

state_dict (dict) – scheduler state. Should be an object returned from a call to state_dict().

get_warmup_lr()
property in_warmup
on_epoch_begin(epoch=None, **kwargs)
on_epoch_end(metrics=None)
on_opt_step()
state_dict()

Returns the state of the scheduler as a dict.

It contains an entry for every variable in self.__dict__ which is not the optimizer.

class hyperion.torch.lr_schedulers.cos_lr.CosineLR(optimizer, T, T_mul=1, min_lr=0, warmup_steps=0, warm_restarts=False, gamma=1, last_restart=0, num_restarts=0, epoch=0, step=0, update_lr_on_opt_step=False)[source]

Set the learning rate of each parameter group using a cosine annealing schedule, where \(\eta_{max}\) is set to the initial lr and \(T_{cur}\) is the number of epochs since the last restart in SGDR:

\[\eta_t = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})(1 + \cos(\frac{T_{cur}}{T_{max}}\pi))\]

When epoch=-1, sets initial lr as lr.

It has been proposed in SGDR: Stochastic Gradient Descent with Warm Restarts.

Parameters
  • optimizer (Optimizer) – Wrapped optimizer.

  • T_max (int) – Maximum number of iterations.

  • eta_min (float) – Minimum learning rate. Default: 0.

  • epoch (int) – The index of last epoch. Default: -1.

__init__(optimizer, T, T_mul=1, min_lr=0, warmup_steps=0, warm_restarts=False, gamma=1, last_restart=0, num_restarts=0, epoch=0, step=0, update_lr_on_opt_step=False)[source]
on_epoch_begin(epoch=None, epoch_updates=1, **kwargs)[source]
get_lr(step)[source]
get_warmup_lr()
property in_warmup
load_state_dict(state_dict)

Loads the schedulers state.

Parameters

state_dict (dict) – scheduler state. Should be an object returned from a call to state_dict().

on_epoch_end(metrics=None)
on_opt_step()
state_dict()

Returns the state of the scheduler as a dict.

It contains an entry for every variable in self.__dict__ which is not the optimizer.

LR Scheduler Factory

class hyperion.torch.lr_schedulers.factory.LRSchedulerFactory[source]
create(lrsch_type, decay_rate=0.01, decay_steps=100, power=0.5, hold_steps=10, t=10, t_mul=1, warm_restarts=False, gamma=1, monitor='val_loss', mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0, eps=1e-08, min_lr=0, warmup_steps=0, update_lr_on_opt_step=False)[source]
static filter_args(**kwargs)[source]
static add_class_args(parser, prefix=None)[source]
static add_argparse_args(parser, prefix=None)

Metrics

This are metric classes and functions that cannot be used as loss function.

Metric Classes

class hyperion.torch.metrics.metrics.TorchMetric(weight=None, reduction='mean')[source]

Base class for metrics that cannot be objective functions

__init__(weight=None, reduction='mean')[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

T_destination

alias of TypeVar(‘T_destination’, bound=Mapping[str, torch.Tensor])

_get_backward_hooks()

Returns the backward hooks for use in the call function. It returns two lists, one with the full backward hooks and one with the non-full backward hooks.

_load_from_state_dict(state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs)

Copies parameters and buffers from state_dict into only this module, but not its descendants. This is called on every submodule in load_state_dict(). Metadata saved for this module in input state_dict is provided as local_metadata. For state dicts without metadata, local_metadata is empty. Subclasses can achieve class-specific backward compatible loading using the version number at local_metadata.get(“version”, None).

Note

state_dict is not the same object as the input state_dict to load_state_dict(). So it can be modified.

Parameters
  • state_dict (dict) – a dict containing parameters and persistent buffers.

  • prefix (str) – the prefix for parameters and buffers used in this module

  • local_metadata (dict) – a dict containing the metadata for this module. See

  • strict (bool) – whether to strictly enforce that the keys in state_dict with prefix match the names of parameters and buffers in this module

  • missing_keys (list of str) – if strict=True, add missing keys to this list

  • unexpected_keys (list of str) – if strict=True, add unexpected keys to this list

  • error_msgs (list of str) – error messages should be added to this list, and will be reported together in load_state_dict()

_named_members(get_members_fn, prefix='', recurse=True)

Helper method for yielding various names + members of modules.

_register_load_state_dict_pre_hook(hook)

These hooks will be called with arguments: state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs, before loading state_dict into self. These arguments are exactly the same as those of _load_from_state_dict.

_register_state_dict_hook(hook)

These hooks will be called with arguments: self, state_dict, prefix, local_metadata, after the state_dict of self is set. Note that only parameters and buffers of self or its children are guaranteed to exist in state_dict. The hooks may modify state_dict inplace or return a new one.

_save_to_state_dict(destination, prefix, keep_vars)

Saves module state to destination dictionary, containing a state of the module, but not its descendants. This is called on every submodule in state_dict().

In rare cases, subclasses can achieve class-specific behavior by overriding this method with custom logic.

Parameters
  • destination (dict) – a dict where state will be stored

  • prefix (str) – the prefix for parameters and buffers used in this module

add_module(name: str, module: Optional[torch.nn.modules.module.Module]) None

Adds a child module to the current module.

The module can be accessed as an attribute using the given name.

Parameters
  • name (string) – name of the child module. The child module can be accessed from this module using the given name

  • module (Module) – child module to be added to the module.

apply(fn: Callable[[torch.nn.modules.module.Module], None]) torch.nn.modules.module.T

Applies fn recursively to every submodule (as returned by .children()) as well as self. Typical use includes initializing the parameters of a model (see also nn-init-doc).

Parameters

fn (Module -> None) – function to be applied to each submodule

Returns

self

Return type

Module

Example:

>>> @torch.no_grad()
>>> def init_weights(m):
>>>     print(m)
>>>     if type(m) == nn.Linear:
>>>         m.weight.fill_(1.0)
>>>         print(m.weight)
>>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
>>> net.apply(init_weights)
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[ 1.,  1.],
        [ 1.,  1.]])
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[ 1.,  1.],
        [ 1.,  1.]])
Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
bfloat16() torch.nn.modules.module.T

Casts all floating point parameters and buffers to bfloat16 datatype.

Returns

self

Return type

Module

buffers(recurse: bool = True) Iterator[torch.Tensor]

Returns an iterator over module buffers.

Parameters

recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.

Yields

torch.Tensor – module buffer

Example:

>>> for buf in model.buffers():
>>>     print(type(buf), buf.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)
children() Iterator[torch.nn.modules.module.Module]

Returns an iterator over immediate children modules.

Yields

Module – a child module

cpu() torch.nn.modules.module.T

Moves all model parameters and buffers to the CPU.

Returns

self

Return type

Module

cuda(device: Optional[Union[int, torch.device]] = None) torch.nn.modules.module.T

Moves all model parameters and buffers to the GPU.

This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.

Parameters

device (int, optional) – if specified, all parameters will be copied to that device

Returns

self

Return type

Module

double() torch.nn.modules.module.T

Casts all floating point parameters and buffers to double datatype.

Returns

self

Return type

Module

dump_patches: bool = False

This allows better BC support for load_state_dict(). In state_dict(), the version number will be saved as in the attribute _metadata of the returned state dict, and thus pickled. _metadata is a dictionary with keys that follow the naming convention of state dict. See _load_from_state_dict on how to use this information in loading.

If new parameters/buffers are added/removed from a module, this number shall be bumped, and the module’s _load_from_state_dict method can compare the version number and do appropriate changes if the state dict is from before the change.

eval() torch.nn.modules.module.T

Sets the module in evaluation mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

This is equivalent with self.train(False).

Returns

self

Return type

Module

extra_repr() str

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

float() torch.nn.modules.module.T

Casts all floating point parameters and buffers to float datatype.

Returns

self

Return type

Module

forward(*input: Any) None

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

half() torch.nn.modules.module.T

Casts all floating point parameters and buffers to half datatype.

Returns

self

Return type

Module

load_state_dict(state_dict: OrderedDict[str, Tensor], strict: bool = True)

Copies parameters and buffers from state_dict into this module and its descendants. If strict is True, then the keys of state_dict must exactly match the keys returned by this module’s state_dict() function.

Parameters
  • state_dict (dict) – a dict containing parameters and persistent buffers.

  • strict (bool, optional) – whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict() function. Default: True

Returns

  • missing_keys is a list of str containing the missing keys

  • unexpected_keys is a list of str containing the unexpected keys

Return type

NamedTuple with missing_keys and unexpected_keys fields

modules() Iterator[torch.nn.modules.module.Module]

Returns an iterator over all modules in the network.

Yields

Module – a module in the network

Note

Duplicate modules are returned only once. In the following example, l will be returned only once.

Example:

>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.modules()):
        print(idx, '->', m)

0 -> Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
1 -> Linear(in_features=2, out_features=2, bias=True)
named_buffers(prefix: str = '', recurse: bool = True) Iterator[Tuple[str, torch.Tensor]]

Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

Parameters
  • prefix (str) – prefix to prepend to all buffer names.

  • recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.

Yields

(string, torch.Tensor) – Tuple containing the name and buffer

Example:

>>> for name, buf in self.named_buffers():
>>>    if name in ['running_var']:
>>>        print(buf.size())
named_children() Iterator[Tuple[str, torch.nn.modules.module.Module]]

Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

Yields

(string, Module) – Tuple containing a name and child module

Example:

>>> for name, module in model.named_children():
>>>     if name in ['conv4', 'conv5']:
>>>         print(module)
named_modules(memo: Optional[Set[torch.nn.modules.module.Module]] = None, prefix: str = '')

Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

Yields

(string, Module) – Tuple of name and module

Note

Duplicate modules are returned only once. In the following example, l will be returned only once.

Example:

>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.named_modules()):
        print(idx, '->', m)

0 -> ('', Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
))
1 -> ('0', Linear(in_features=2, out_features=2, bias=True))
named_parameters(prefix: str = '', recurse: bool = True) Iterator[Tuple[str, torch.nn.parameter.Parameter]]

Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

Parameters
  • prefix (str) – prefix to prepend to all parameter names.

  • recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.

Yields

(string, Parameter) – Tuple containing the name and parameter

Example:

>>> for name, param in self.named_parameters():
>>>    if name in ['bias']:
>>>        print(param.size())
parameters(recurse: bool = True) Iterator[torch.nn.parameter.Parameter]

Returns an iterator over module parameters.

This is typically passed to an optimizer.

Parameters

recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.

Yields

Parameter – module parameter

Example:

>>> for param in model.parameters():
>>>     print(type(param), param.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)
register_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) torch.utils.hooks.RemovableHandle

Registers a backward hook on the module.

This function is deprecated in favor of nn.Module.register_full_backward_hook() and the behavior of this function will change in future versions.

Returns

a handle that can be used to remove the added hook by calling handle.remove()

Return type

torch.utils.hooks.RemovableHandle

register_buffer(name: str, tensor: Optional[torch.Tensor], persistent: bool = True) None

Adds a buffer to the module.

This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s running_mean is not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by setting persistent to False. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’s state_dict.

Buffers can be accessed as attributes using given names.

Parameters
  • name (string) – name of the buffer. The buffer can be accessed from this module using the given name

  • tensor (Tensor) – buffer to be registered.

  • persistent (bool) – whether the buffer is part of this module’s state_dict.

Example:

>>> self.register_buffer('running_mean', torch.zeros(num_features))
register_forward_hook(hook: Callable[[...], None]) torch.utils.hooks.RemovableHandle

Registers a forward hook on the module.

The hook will be called every time after forward() has computed an output. It should have the following signature:

hook(module, input, output) -> None or modified output

The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the forward. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called after forward() is called.

Returns

a handle that can be used to remove the added hook by calling handle.remove()

Return type

torch.utils.hooks.RemovableHandle

register_forward_pre_hook(hook: Callable[[...], None]) torch.utils.hooks.RemovableHandle

Registers a forward pre-hook on the module.

The hook will be called every time before forward() is invoked. It should have the following signature:

hook(module, input) -> None or modified input

The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the forward. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple).

Returns

a handle that can be used to remove the added hook by calling handle.remove()

Return type

torch.utils.hooks.RemovableHandle

register_full_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) torch.utils.hooks.RemovableHandle

Registers a backward hook on the module.

The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature:

hook(module, grad_input, grad_output) -> tuple(Tensor) or None

The grad_input and grad_output are tuples that contain the gradients with respect to the inputs and outputs respectively. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place of grad_input in subsequent computations. grad_input will only correspond to the inputs given as positional arguments and all kwarg arguments are ignored. Entries in grad_input and grad_output will be None for all non-Tensor arguments.

Warning

Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error.

Returns

a handle that can be used to remove the added hook by calling handle.remove()

Return type

torch.utils.hooks.RemovableHandle

register_parameter(name: str, param: Optional[torch.nn.parameter.Parameter]) None

Adds a parameter to the module.

The parameter can be accessed as an attribute using given name.

Parameters
  • name (string) – name of the parameter. The parameter can be accessed from this module using the given name

  • param (Parameter) – parameter to be added to the module.

requires_grad_(requires_grad: bool = True) torch.nn.modules.module.T

Change if autograd should record operations on parameters in this module.

This method sets the parameters’ requires_grad attributes in-place.

This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training).

Parameters

requires_grad (bool) – whether autograd should record operations on parameters in this module. Default: True.

Returns

self

Return type

Module

share_memory() torch.nn.modules.module.T
state_dict(destination=None, prefix='', keep_vars=False)

Returns a dictionary containing a whole state of the module.

Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names.

Returns

a dictionary containing a whole state of the module

Return type

dict

Example:

>>> module.state_dict().keys()
['bias', 'weight']
to(*args, **kwargs)

Moves and/or casts the parameters and buffers.

This can be called as

to(device=None, dtype=None, non_blocking=False)
to(dtype, non_blocking=False)
to(tensor, non_blocking=False)
to(memory_format=torch.channels_last)

Its signature is similar to torch.Tensor.to(), but only accepts floating point or complex dtype`s. In addition, this method will only cast the floating point or complex parameters and buffers to :attr:`dtype (if given). The integral parameters and buffers will be moved device, if that is given, but with dtypes unchanged. When non_blocking is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.

See below for examples.

Note

This method modifies the module in-place.

Parameters
  • device (torch.device) – the desired device of the parameters and buffers in this module

  • dtype (torch.dtype) – the desired floating point or complex dtype of the parameters and buffers in this module

  • tensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module

  • memory_format (torch.memory_format) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)

Returns

self

Return type

Module

Examples:

>>> linear = nn.Linear(2, 2)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]])
>>> linear.to(torch.double)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]], dtype=torch.float64)
>>> gpu1 = torch.device("cuda:1")
>>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
>>> cpu = torch.device("cpu")
>>> linear.to(cpu)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16)

>>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble)
>>> linear.weight
Parameter containing:
tensor([[ 0.3741+0.j,  0.2382+0.j],
        [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128)
>>> linear(torch.ones(3, 2, dtype=torch.cdouble))
tensor([[0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)
train(mode: bool = True) torch.nn.modules.module.T

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns

self

Return type

Module

type(dst_type: Union[torch.dtype, str]) torch.nn.modules.module.T

Casts all parameters and buffers to dst_type.

Parameters

dst_type (type or string) – the desired type

Returns

self

Return type

Module

xpu(device: Optional[Union[int, torch.device]] = None) torch.nn.modules.module.T

Moves all model parameters and buffers to the XPU.

This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on XPU while being optimized.

Parameters

device (int, optional) – if specified, all parameters will be copied to that device

Returns

self

Return type

Module

zero_grad(set_to_none: bool = False) None

Sets gradients of all model parameters to zero. See similar function under torch.optim.Optimizer for more context.

Parameters

set_to_none (bool) – instead of setting to zero, set the grads to None. See torch.optim.Optimizer.zero_grad() for details.

training: bool

Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.metrics.accuracy.CategoricalAccuracy(weight=None, reduction='mean')[source]
__init__(weight=None, reduction='mean')[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(input, target)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

T_destination

alias of TypeVar(‘T_destination’, bound=Mapping[str, torch.Tensor])

_get_backward_hooks()

Returns the backward hooks for use in the call function. It returns two lists, one with the full backward hooks and one with the non-full backward hooks.

_load_from_state_dict(state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs)

Copies parameters and buffers from state_dict into only this module, but not its descendants. This is called on every submodule in load_state_dict(). Metadata saved for this module in input state_dict is provided as local_metadata. For state dicts without metadata, local_metadata is empty. Subclasses can achieve class-specific backward compatible loading using the version number at local_metadata.get(“version”, None).

Note

state_dict is not the same object as the input state_dict to load_state_dict(). So it can be modified.

Parameters
  • state_dict (dict) – a dict containing parameters and persistent buffers.

  • prefix (str) – the prefix for parameters and buffers used in this module

  • local_metadata (dict) – a dict containing the metadata for this module. See

  • strict (bool) – whether to strictly enforce that the keys in state_dict with prefix match the names of parameters and buffers in this module

  • missing_keys (list of str) – if strict=True, add missing keys to this list

  • unexpected_keys (list of str) – if strict=True, add unexpected keys to this list

  • error_msgs (list of str) – error messages should be added to this list, and will be reported together in load_state_dict()

_named_members(get_members_fn, prefix='', recurse=True)

Helper method for yielding various names + members of modules.

_register_load_state_dict_pre_hook(hook)

These hooks will be called with arguments: state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs, before loading state_dict into self. These arguments are exactly the same as those of _load_from_state_dict.

_register_state_dict_hook(hook)

These hooks will be called with arguments: self, state_dict, prefix, local_metadata, after the state_dict of self is set. Note that only parameters and buffers of self or its children are guaranteed to exist in state_dict. The hooks may modify state_dict inplace or return a new one.

_save_to_state_dict(destination, prefix, keep_vars)

Saves module state to destination dictionary, containing a state of the module, but not its descendants. This is called on every submodule in state_dict().

In rare cases, subclasses can achieve class-specific behavior by overriding this method with custom logic.

Parameters
  • destination (dict) – a dict where state will be stored

  • prefix (str) – the prefix for parameters and buffers used in this module

add_module(name: str, module: Optional[torch.nn.modules.module.Module]) None

Adds a child module to the current module.

The module can be accessed as an attribute using the given name.

Parameters
  • name (string) – name of the child module. The child module can be accessed from this module using the given name

  • module (Module) – child module to be added to the module.

apply(fn: Callable[[torch.nn.modules.module.Module], None]) torch.nn.modules.module.T

Applies fn recursively to every submodule (as returned by .children()) as well as self. Typical use includes initializing the parameters of a model (see also nn-init-doc).

Parameters

fn (Module -> None) – function to be applied to each submodule

Returns

self

Return type

Module

Example:

>>> @torch.no_grad()
>>> def init_weights(m):
>>>     print(m)
>>>     if type(m) == nn.Linear:
>>>         m.weight.fill_(1.0)
>>>         print(m.weight)
>>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
>>> net.apply(init_weights)
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[ 1.,  1.],
        [ 1.,  1.]])
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[ 1.,  1.],
        [ 1.,  1.]])
Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
bfloat16() torch.nn.modules.module.T

Casts all floating point parameters and buffers to bfloat16 datatype.

Returns

self

Return type

Module

buffers(recurse: bool = True) Iterator[torch.Tensor]

Returns an iterator over module buffers.

Parameters

recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.

Yields

torch.Tensor – module buffer

Example:

>>> for buf in model.buffers():
>>>     print(type(buf), buf.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)
children() Iterator[torch.nn.modules.module.Module]

Returns an iterator over immediate children modules.

Yields

Module – a child module

cpu() torch.nn.modules.module.T

Moves all model parameters and buffers to the CPU.

Returns

self

Return type

Module

cuda(device: Optional[Union[int, torch.device]] = None) torch.nn.modules.module.T

Moves all model parameters and buffers to the GPU.

This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.

Parameters

device (int, optional) – if specified, all parameters will be copied to that device

Returns

self

Return type

Module

double() torch.nn.modules.module.T

Casts all floating point parameters and buffers to double datatype.

Returns

self

Return type

Module

dump_patches: bool = False

This allows better BC support for load_state_dict(). In state_dict(), the version number will be saved as in the attribute _metadata of the returned state dict, and thus pickled. _metadata is a dictionary with keys that follow the naming convention of state dict. See _load_from_state_dict on how to use this information in loading.

If new parameters/buffers are added/removed from a module, this number shall be bumped, and the module’s _load_from_state_dict method can compare the version number and do appropriate changes if the state dict is from before the change.

eval() torch.nn.modules.module.T

Sets the module in evaluation mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

This is equivalent with self.train(False).

Returns

self

Return type

Module

extra_repr() str

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

float() torch.nn.modules.module.T

Casts all floating point parameters and buffers to float datatype.

Returns

self

Return type

Module

half() torch.nn.modules.module.T

Casts all floating point parameters and buffers to half datatype.

Returns

self

Return type

Module

load_state_dict(state_dict: OrderedDict[str, Tensor], strict: bool = True)

Copies parameters and buffers from state_dict into this module and its descendants. If strict is True, then the keys of state_dict must exactly match the keys returned by this module’s state_dict() function.

Parameters
  • state_dict (dict) – a dict containing parameters and persistent buffers.

  • strict (bool, optional) – whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict() function. Default: True

Returns

  • missing_keys is a list of str containing the missing keys

  • unexpected_keys is a list of str containing the unexpected keys

Return type

NamedTuple with missing_keys and unexpected_keys fields

modules() Iterator[torch.nn.modules.module.Module]

Returns an iterator over all modules in the network.

Yields

Module – a module in the network

Note

Duplicate modules are returned only once. In the following example, l will be returned only once.

Example:

>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.modules()):
        print(idx, '->', m)

0 -> Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
1 -> Linear(in_features=2, out_features=2, bias=True)
named_buffers(prefix: str = '', recurse: bool = True) Iterator[Tuple[str, torch.Tensor]]

Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

Parameters
  • prefix (str) – prefix to prepend to all buffer names.

  • recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.

Yields

(string, torch.Tensor) – Tuple containing the name and buffer

Example:

>>> for name, buf in self.named_buffers():
>>>    if name in ['running_var']:
>>>        print(buf.size())
named_children() Iterator[Tuple[str, torch.nn.modules.module.Module]]

Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

Yields

(string, Module) – Tuple containing a name and child module

Example:

>>> for name, module in model.named_children():
>>>     if name in ['conv4', 'conv5']:
>>>         print(module)
named_modules(memo: Optional[Set[torch.nn.modules.module.Module]] = None, prefix: str = '')

Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

Yields

(string, Module) – Tuple of name and module

Note

Duplicate modules are returned only once. In the following example, l will be returned only once.

Example:

>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.named_modules()):
        print(idx, '->', m)

0 -> ('', Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
))
1 -> ('0', Linear(in_features=2, out_features=2, bias=True))
named_parameters(prefix: str = '', recurse: bool = True) Iterator[Tuple[str, torch.nn.parameter.Parameter]]

Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

Parameters
  • prefix (str) – prefix to prepend to all parameter names.

  • recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.

Yields

(string, Parameter) – Tuple containing the name and parameter

Example:

>>> for name, param in self.named_parameters():
>>>    if name in ['bias']:
>>>        print(param.size())
parameters(recurse: bool = True) Iterator[torch.nn.parameter.Parameter]

Returns an iterator over module parameters.

This is typically passed to an optimizer.

Parameters

recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.

Yields

Parameter – module parameter

Example:

>>> for param in model.parameters():
>>>     print(type(param), param.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)
register_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) torch.utils.hooks.RemovableHandle

Registers a backward hook on the module.

This function is deprecated in favor of nn.Module.register_full_backward_hook() and the behavior of this function will change in future versions.

Returns

a handle that can be used to remove the added hook by calling handle.remove()

Return type

torch.utils.hooks.RemovableHandle

register_buffer(name: str, tensor: Optional[torch.Tensor], persistent: bool = True) None

Adds a buffer to the module.

This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s running_mean is not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by setting persistent to False. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’s state_dict.

Buffers can be accessed as attributes using given names.

Parameters
  • name (string) – name of the buffer. The buffer can be accessed from this module using the given name

  • tensor (Tensor) – buffer to be registered.

  • persistent (bool) – whether the buffer is part of this module’s state_dict.

Example:

>>> self.register_buffer('running_mean', torch.zeros(num_features))
register_forward_hook(hook: Callable[[...], None]) torch.utils.hooks.RemovableHandle

Registers a forward hook on the module.

The hook will be called every time after forward() has computed an output. It should have the following signature:

hook(module, input, output) -> None or modified output

The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the forward. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called after forward() is called.

Returns

a handle that can be used to remove the added hook by calling handle.remove()

Return type

torch.utils.hooks.RemovableHandle

register_forward_pre_hook(hook: Callable[[...], None]) torch.utils.hooks.RemovableHandle

Registers a forward pre-hook on the module.

The hook will be called every time before forward() is invoked. It should have the following signature:

hook(module, input) -> None or modified input

The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the forward. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple).

Returns

a handle that can be used to remove the added hook by calling handle.remove()

Return type

torch.utils.hooks.RemovableHandle

register_full_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) torch.utils.hooks.RemovableHandle

Registers a backward hook on the module.

The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature:

hook(module, grad_input, grad_output) -> tuple(Tensor) or None

The grad_input and grad_output are tuples that contain the gradients with respect to the inputs and outputs respectively. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place of grad_input in subsequent computations. grad_input will only correspond to the inputs given as positional arguments and all kwarg arguments are ignored. Entries in grad_input and grad_output will be None for all non-Tensor arguments.

Warning

Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error.

Returns

a handle that can be used to remove the added hook by calling handle.remove()

Return type

torch.utils.hooks.RemovableHandle

register_parameter(name: str, param: Optional[torch.nn.parameter.Parameter]) None

Adds a parameter to the module.

The parameter can be accessed as an attribute using given name.

Parameters
  • name (string) – name of the parameter. The parameter can be accessed from this module using the given name

  • param (Parameter) – parameter to be added to the module.

requires_grad_(requires_grad: bool = True) torch.nn.modules.module.T

Change if autograd should record operations on parameters in this module.

This method sets the parameters’ requires_grad attributes in-place.

This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training).

Parameters

requires_grad (bool) – whether autograd should record operations on parameters in this module. Default: True.

Returns

self

Return type

Module

share_memory() torch.nn.modules.module.T
state_dict(destination=None, prefix='', keep_vars=False)

Returns a dictionary containing a whole state of the module.

Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names.

Returns

a dictionary containing a whole state of the module

Return type

dict

Example:

>>> module.state_dict().keys()
['bias', 'weight']
to(*args, **kwargs)

Moves and/or casts the parameters and buffers.

This can be called as

to(device=None, dtype=None, non_blocking=False)
to(dtype, non_blocking=False)
to(tensor, non_blocking=False)
to(memory_format=torch.channels_last)

Its signature is similar to torch.Tensor.to(), but only accepts floating point or complex dtype`s. In addition, this method will only cast the floating point or complex parameters and buffers to :attr:`dtype (if given). The integral parameters and buffers will be moved device, if that is given, but with dtypes unchanged. When non_blocking is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.

See below for examples.

Note

This method modifies the module in-place.

Parameters
  • device (torch.device) – the desired device of the parameters and buffers in this module

  • dtype (torch.dtype) – the desired floating point or complex dtype of the parameters and buffers in this module

  • tensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module

  • memory_format (torch.memory_format) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)

Returns

self

Return type

Module

Examples:

>>> linear = nn.Linear(2, 2)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]])
>>> linear.to(torch.double)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]], dtype=torch.float64)
>>> gpu1 = torch.device("cuda:1")
>>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
>>> cpu = torch.device("cpu")
>>> linear.to(cpu)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16)

>>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble)
>>> linear.weight
Parameter containing:
tensor([[ 0.3741+0.j,  0.2382+0.j],
        [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128)
>>> linear(torch.ones(3, 2, dtype=torch.cdouble))
tensor([[0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)
train(mode: bool = True) torch.nn.modules.module.T

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns

self

Return type

Module

type(dst_type: Union[torch.dtype, str]) torch.nn.modules.module.T

Casts all parameters and buffers to dst_type.

Parameters

dst_type (type or string) – the desired type

Returns

self

Return type

Module

xpu(device: Optional[Union[int, torch.device]] = None) torch.nn.modules.module.T

Moves all model parameters and buffers to the XPU.

This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on XPU while being optimized.

Parameters

device (int, optional) – if specified, all parameters will be copied to that device

Returns

self

Return type

Module

zero_grad(set_to_none: bool = False) None

Sets gradients of all model parameters to zero. See similar function under torch.optim.Optimizer for more context.

Parameters

set_to_none (bool) – instead of setting to zero, set the grads to None. See torch.optim.Optimizer.zero_grad() for details.

training: bool
class hyperion.torch.metrics.accuracy.BinaryAccuracy(weight=None, reduction='mean', thr=0.5)[source]
__init__(weight=None, reduction='mean', thr=0.5)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(input, target)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

T_destination

alias of TypeVar(‘T_destination’, bound=Mapping[str, torch.Tensor])

_get_backward_hooks()

Returns the backward hooks for use in the call function. It returns two lists, one with the full backward hooks and one with the non-full backward hooks.

_load_from_state_dict(state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs)

Copies parameters and buffers from state_dict into only this module, but not its descendants. This is called on every submodule in load_state_dict(). Metadata saved for this module in input state_dict is provided as local_metadata. For state dicts without metadata, local_metadata is empty. Subclasses can achieve class-specific backward compatible loading using the version number at local_metadata.get(“version”, None).

Note

state_dict is not the same object as the input state_dict to load_state_dict(). So it can be modified.

Parameters
  • state_dict (dict) – a dict containing parameters and persistent buffers.

  • prefix (str) – the prefix for parameters and buffers used in this module

  • local_metadata (dict) – a dict containing the metadata for this module. See

  • strict (bool) – whether to strictly enforce that the keys in state_dict with prefix match the names of parameters and buffers in this module

  • missing_keys (list of str) – if strict=True, add missing keys to this list

  • unexpected_keys (list of str) – if strict=True, add unexpected keys to this list

  • error_msgs (list of str) – error messages should be added to this list, and will be reported together in load_state_dict()

_named_members(get_members_fn, prefix='', recurse=True)

Helper method for yielding various names + members of modules.

_register_load_state_dict_pre_hook(hook)

These hooks will be called with arguments: state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs, before loading state_dict into self. These arguments are exactly the same as those of _load_from_state_dict.

_register_state_dict_hook(hook)

These hooks will be called with arguments: self, state_dict, prefix, local_metadata, after the state_dict of self is set. Note that only parameters and buffers of self or its children are guaranteed to exist in state_dict. The hooks may modify state_dict inplace or return a new one.

_save_to_state_dict(destination, prefix, keep_vars)

Saves module state to destination dictionary, containing a state of the module, but not its descendants. This is called on every submodule in state_dict().

In rare cases, subclasses can achieve class-specific behavior by overriding this method with custom logic.

Parameters
  • destination (dict) – a dict where state will be stored

  • prefix (str) – the prefix for parameters and buffers used in this module

add_module(name: str, module: Optional[torch.nn.modules.module.Module]) None

Adds a child module to the current module.

The module can be accessed as an attribute using the given name.

Parameters
  • name (string) – name of the child module. The child module can be accessed from this module using the given name

  • module (Module) – child module to be added to the module.

apply(fn: Callable[[torch.nn.modules.module.Module], None]) torch.nn.modules.module.T

Applies fn recursively to every submodule (as returned by .children()) as well as self. Typical use includes initializing the parameters of a model (see also nn-init-doc).

Parameters

fn (Module -> None) – function to be applied to each submodule

Returns

self

Return type

Module

Example:

>>> @torch.no_grad()
>>> def init_weights(m):
>>>     print(m)
>>>     if type(m) == nn.Linear:
>>>         m.weight.fill_(1.0)
>>>         print(m.weight)
>>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
>>> net.apply(init_weights)
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[ 1.,  1.],
        [ 1.,  1.]])
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[ 1.,  1.],
        [ 1.,  1.]])
Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
bfloat16() torch.nn.modules.module.T

Casts all floating point parameters and buffers to bfloat16 datatype.

Returns

self

Return type

Module

buffers(recurse: bool = True) Iterator[torch.Tensor]

Returns an iterator over module buffers.

Parameters

recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.

Yields

torch.Tensor – module buffer

Example:

>>> for buf in model.buffers():
>>>     print(type(buf), buf.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)
children() Iterator[torch.nn.modules.module.Module]

Returns an iterator over immediate children modules.

Yields

Module – a child module

cpu() torch.nn.modules.module.T

Moves all model parameters and buffers to the CPU.

Returns

self

Return type

Module

cuda(device: Optional[Union[int, torch.device]] = None) torch.nn.modules.module.T

Moves all model parameters and buffers to the GPU.

This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.

Parameters

device (int, optional) – if specified, all parameters will be copied to that device

Returns

self

Return type

Module

double() torch.nn.modules.module.T

Casts all floating point parameters and buffers to double datatype.

Returns

self

Return type

Module

dump_patches: bool = False

This allows better BC support for load_state_dict(). In state_dict(), the version number will be saved as in the attribute _metadata of the returned state dict, and thus pickled. _metadata is a dictionary with keys that follow the naming convention of state dict. See _load_from_state_dict on how to use this information in loading.

If new parameters/buffers are added/removed from a module, this number shall be bumped, and the module’s _load_from_state_dict method can compare the version number and do appropriate changes if the state dict is from before the change.

eval() torch.nn.modules.module.T

Sets the module in evaluation mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

This is equivalent with self.train(False).

Returns

self

Return type

Module

extra_repr() str

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

float() torch.nn.modules.module.T

Casts all floating point parameters and buffers to float datatype.

Returns

self

Return type

Module

half() torch.nn.modules.module.T

Casts all floating point parameters and buffers to half datatype.

Returns

self

Return type

Module

load_state_dict(state_dict: OrderedDict[str, Tensor], strict: bool = True)

Copies parameters and buffers from state_dict into this module and its descendants. If strict is True, then the keys of state_dict must exactly match the keys returned by this module’s state_dict() function.

Parameters
  • state_dict (dict) – a dict containing parameters and persistent buffers.

  • strict (bool, optional) – whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict() function. Default: True

Returns

  • missing_keys is a list of str containing the missing keys

  • unexpected_keys is a list of str containing the unexpected keys

Return type

NamedTuple with missing_keys and unexpected_keys fields

modules() Iterator[torch.nn.modules.module.Module]

Returns an iterator over all modules in the network.

Yields

Module – a module in the network

Note

Duplicate modules are returned only once. In the following example, l will be returned only once.

Example:

>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.modules()):
        print(idx, '->', m)

0 -> Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
1 -> Linear(in_features=2, out_features=2, bias=True)
named_buffers(prefix: str = '', recurse: bool = True) Iterator[Tuple[str, torch.Tensor]]

Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

Parameters
  • prefix (str) – prefix to prepend to all buffer names.

  • recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.

Yields

(string, torch.Tensor) – Tuple containing the name and buffer

Example:

>>> for name, buf in self.named_buffers():
>>>    if name in ['running_var']:
>>>        print(buf.size())
named_children() Iterator[Tuple[str, torch.nn.modules.module.Module]]

Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

Yields

(string, Module) – Tuple containing a name and child module

Example:

>>> for name, module in model.named_children():
>>>     if name in ['conv4', 'conv5']:
>>>         print(module)
named_modules(memo: Optional[Set[torch.nn.modules.module.Module]] = None, prefix: str = '')

Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

Yields

(string, Module) – Tuple of name and module

Note

Duplicate modules are returned only once. In the following example, l will be returned only once.

Example:

>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.named_modules()):
        print(idx, '->', m)

0 -> ('', Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
))
1 -> ('0', Linear(in_features=2, out_features=2, bias=True))
named_parameters(prefix: str = '', recurse: bool = True) Iterator[Tuple[str, torch.nn.parameter.Parameter]]

Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

Parameters
  • prefix (str) – prefix to prepend to all parameter names.

  • recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.

Yields

(string, Parameter) – Tuple containing the name and parameter

Example:

>>> for name, param in self.named_parameters():
>>>    if name in ['bias']:
>>>        print(param.size())
parameters(recurse: bool = True) Iterator[torch.nn.parameter.Parameter]

Returns an iterator over module parameters.

This is typically passed to an optimizer.

Parameters

recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.

Yields

Parameter – module parameter

Example:

>>> for param in model.parameters():
>>>     print(type(param), param.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)
register_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) torch.utils.hooks.RemovableHandle

Registers a backward hook on the module.

This function is deprecated in favor of nn.Module.register_full_backward_hook() and the behavior of this function will change in future versions.

Returns

a handle that can be used to remove the added hook by calling handle.remove()

Return type

torch.utils.hooks.RemovableHandle

register_buffer(name: str, tensor: Optional[torch.Tensor], persistent: bool = True) None

Adds a buffer to the module.

This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s running_mean is not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by setting persistent to False. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’s state_dict.

Buffers can be accessed as attributes using given names.

Parameters
  • name (string) – name of the buffer. The buffer can be accessed from this module using the given name

  • tensor (Tensor) – buffer to be registered.

  • persistent (bool) – whether the buffer is part of this module’s state_dict.

Example:

>>> self.register_buffer('running_mean', torch.zeros(num_features))
register_forward_hook(hook: Callable[[...], None]) torch.utils.hooks.RemovableHandle

Registers a forward hook on the module.

The hook will be called every time after forward() has computed an output. It should have the following signature:

hook(module, input, output) -> None or modified output

The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the forward. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called after forward() is called.

Returns

a handle that can be used to remove the added hook by calling handle.remove()

Return type

torch.utils.hooks.RemovableHandle

register_forward_pre_hook(hook: Callable[[...], None]) torch.utils.hooks.RemovableHandle

Registers a forward pre-hook on the module.

The hook will be called every time before forward() is invoked. It should have the following signature:

hook(module, input) -> None or modified input

The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the forward. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple).

Returns

a handle that can be used to remove the added hook by calling handle.remove()

Return type

torch.utils.hooks.RemovableHandle

register_full_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) torch.utils.hooks.RemovableHandle

Registers a backward hook on the module.

The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature:

hook(module, grad_input, grad_output) -> tuple(Tensor) or None

The grad_input and grad_output are tuples that contain the gradients with respect to the inputs and outputs respectively. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place of grad_input in subsequent computations. grad_input will only correspond to the inputs given as positional arguments and all kwarg arguments are ignored. Entries in grad_input and grad_output will be None for all non-Tensor arguments.

Warning

Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error.

Returns

a handle that can be used to remove the added hook by calling handle.remove()

Return type

torch.utils.hooks.RemovableHandle

register_parameter(name: str, param: Optional[torch.nn.parameter.Parameter]) None

Adds a parameter to the module.

The parameter can be accessed as an attribute using given name.

Parameters
  • name (string) – name of the parameter. The parameter can be accessed from this module using the given name

  • param (Parameter) – parameter to be added to the module.

requires_grad_(requires_grad: bool = True) torch.nn.modules.module.T

Change if autograd should record operations on parameters in this module.

This method sets the parameters’ requires_grad attributes in-place.

This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training).

Parameters

requires_grad (bool) – whether autograd should record operations on parameters in this module. Default: True.

Returns

self

Return type

Module

share_memory() torch.nn.modules.module.T
state_dict(destination=None, prefix='', keep_vars=False)

Returns a dictionary containing a whole state of the module.

Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names.

Returns

a dictionary containing a whole state of the module

Return type

dict

Example:

>>> module.state_dict().keys()
['bias', 'weight']
to(*args, **kwargs)

Moves and/or casts the parameters and buffers.

This can be called as

to(device=None, dtype=None, non_blocking=False)
to(dtype, non_blocking=False)
to(tensor, non_blocking=False)
to(memory_format=torch.channels_last)

Its signature is similar to torch.Tensor.to(), but only accepts floating point or complex dtype`s. In addition, this method will only cast the floating point or complex parameters and buffers to :attr:`dtype (if given). The integral parameters and buffers will be moved device, if that is given, but with dtypes unchanged. When non_blocking is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.

See below for examples.

Note

This method modifies the module in-place.

Parameters
  • device (torch.device) – the desired device of the parameters and buffers in this module

  • dtype (torch.dtype) – the desired floating point or complex dtype of the parameters and buffers in this module

  • tensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module

  • memory_format (torch.memory_format) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)

Returns

self

Return type

Module

Examples:

>>> linear = nn.Linear(2, 2)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]])
>>> linear.to(torch.double)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]], dtype=torch.float64)
>>> gpu1 = torch.device("cuda:1")
>>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
>>> cpu = torch.device("cpu")
>>> linear.to(cpu)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16)

>>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble)
>>> linear.weight
Parameter containing:
tensor([[ 0.3741+0.j,  0.2382+0.j],
        [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128)
>>> linear(torch.ones(3, 2, dtype=torch.cdouble))
tensor([[0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)
train(mode: bool = True) torch.nn.modules.module.T

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns

self

Return type

Module

type(dst_type: Union[torch.dtype, str]) torch.nn.modules.module.T

Casts all parameters and buffers to dst_type.

Parameters

dst_type (type or string) – the desired type

Returns

self

Return type

Module

xpu(device: Optional[Union[int, torch.device]] = None) torch.nn.modules.module.T

Moves all model parameters and buffers to the XPU.

This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on XPU while being optimized.

Parameters

device (int, optional) – if specified, all parameters will be copied to that device

Returns

self

Return type

Module

zero_grad(set_to_none: bool = False) None

Sets gradients of all model parameters to zero. See similar function under torch.optim.Optimizer for more context.

Parameters

set_to_none (bool) – instead of setting to zero, set the grads to None. See torch.optim.Optimizer.zero_grad() for details.

training: bool
class hyperion.torch.metrics.accuracy.BinaryAccuracyWithLogits(weight=None, reduction='mean', thr=0.0)[source]
__init__(weight=None, reduction='mean', thr=0.0)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(input, target)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

T_destination

alias of TypeVar(‘T_destination’, bound=Mapping[str, torch.Tensor])

_get_backward_hooks()

Returns the backward hooks for use in the call function. It returns two lists, one with the full backward hooks and one with the non-full backward hooks.

_load_from_state_dict(state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs)

Copies parameters and buffers from state_dict into only this module, but not its descendants. This is called on every submodule in load_state_dict(). Metadata saved for this module in input state_dict is provided as local_metadata. For state dicts without metadata, local_metadata is empty. Subclasses can achieve class-specific backward compatible loading using the version number at local_metadata.get(“version”, None).

Note

state_dict is not the same object as the input state_dict to load_state_dict(). So it can be modified.

Parameters
  • state_dict (dict) – a dict containing parameters and persistent buffers.

  • prefix (str) – the prefix for parameters and buffers used in this module

  • local_metadata (dict) – a dict containing the metadata for this module. See

  • strict (bool) – whether to strictly enforce that the keys in state_dict with prefix match the names of parameters and buffers in this module

  • missing_keys (list of str) – if strict=True, add missing keys to this list

  • unexpected_keys (list of str) – if strict=True, add unexpected keys to this list

  • error_msgs (list of str) – error messages should be added to this list, and will be reported together in load_state_dict()

_named_members(get_members_fn, prefix='', recurse=True)

Helper method for yielding various names + members of modules.

_register_load_state_dict_pre_hook(hook)

These hooks will be called with arguments: state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs, before loading state_dict into self. These arguments are exactly the same as those of _load_from_state_dict.

_register_state_dict_hook(hook)

These hooks will be called with arguments: self, state_dict, prefix, local_metadata, after the state_dict of self is set. Note that only parameters and buffers of self or its children are guaranteed to exist in state_dict. The hooks may modify state_dict inplace or return a new one.

_save_to_state_dict(destination, prefix, keep_vars)

Saves module state to destination dictionary, containing a state of the module, but not its descendants. This is called on every submodule in state_dict().

In rare cases, subclasses can achieve class-specific behavior by overriding this method with custom logic.

Parameters
  • destination (dict) – a dict where state will be stored

  • prefix (str) – the prefix for parameters and buffers used in this module

add_module(name: str, module: Optional[torch.nn.modules.module.Module]) None

Adds a child module to the current module.

The module can be accessed as an attribute using the given name.

Parameters
  • name (string) – name of the child module. The child module can be accessed from this module using the given name

  • module (Module) – child module to be added to the module.

apply(fn: Callable[[torch.nn.modules.module.Module], None]) torch.nn.modules.module.T

Applies fn recursively to every submodule (as returned by .children()) as well as self. Typical use includes initializing the parameters of a model (see also nn-init-doc).

Parameters

fn (Module -> None) – function to be applied to each submodule

Returns

self

Return type

Module

Example:

>>> @torch.no_grad()
>>> def init_weights(m):
>>>     print(m)
>>>     if type(m) == nn.Linear:
>>>         m.weight.fill_(1.0)
>>>         print(m.weight)
>>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
>>> net.apply(init_weights)
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[ 1.,  1.],
        [ 1.,  1.]])
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[ 1.,  1.],
        [ 1.,  1.]])
Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
bfloat16() torch.nn.modules.module.T

Casts all floating point parameters and buffers to bfloat16 datatype.

Returns

self

Return type

Module

buffers(recurse: bool = True) Iterator[torch.Tensor]

Returns an iterator over module buffers.

Parameters

recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.

Yields

torch.Tensor – module buffer

Example:

>>> for buf in model.buffers():
>>>     print(type(buf), buf.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)
children() Iterator[torch.nn.modules.module.Module]

Returns an iterator over immediate children modules.

Yields

Module – a child module

cpu() torch.nn.modules.module.T

Moves all model parameters and buffers to the CPU.

Returns

self

Return type

Module

cuda(device: Optional[Union[int, torch.device]] = None) torch.nn.modules.module.T

Moves all model parameters and buffers to the GPU.

This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.

Parameters

device (int, optional) – if specified, all parameters will be copied to that device

Returns

self

Return type

Module

double() torch.nn.modules.module.T

Casts all floating point parameters and buffers to double datatype.

Returns

self

Return type

Module

dump_patches: bool = False

This allows better BC support for load_state_dict(). In state_dict(), the version number will be saved as in the attribute _metadata of the returned state dict, and thus pickled. _metadata is a dictionary with keys that follow the naming convention of state dict. See _load_from_state_dict on how to use this information in loading.

If new parameters/buffers are added/removed from a module, this number shall be bumped, and the module’s _load_from_state_dict method can compare the version number and do appropriate changes if the state dict is from before the change.

eval() torch.nn.modules.module.T

Sets the module in evaluation mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

This is equivalent with self.train(False).

Returns

self

Return type

Module

extra_repr() str

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

float() torch.nn.modules.module.T

Casts all floating point parameters and buffers to float datatype.

Returns

self

Return type

Module

half() torch.nn.modules.module.T

Casts all floating point parameters and buffers to half datatype.

Returns

self

Return type

Module

load_state_dict(state_dict: OrderedDict[str, Tensor], strict: bool = True)

Copies parameters and buffers from state_dict into this module and its descendants. If strict is True, then the keys of state_dict must exactly match the keys returned by this module’s state_dict() function.

Parameters
  • state_dict (dict) – a dict containing parameters and persistent buffers.

  • strict (bool, optional) – whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict() function. Default: True

Returns

  • missing_keys is a list of str containing the missing keys

  • unexpected_keys is a list of str containing the unexpected keys

Return type

NamedTuple with missing_keys and unexpected_keys fields

modules() Iterator[torch.nn.modules.module.Module]

Returns an iterator over all modules in the network.

Yields

Module – a module in the network

Note

Duplicate modules are returned only once. In the following example, l will be returned only once.

Example:

>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.modules()):
        print(idx, '->', m)

0 -> Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
1 -> Linear(in_features=2, out_features=2, bias=True)
named_buffers(prefix: str = '', recurse: bool = True) Iterator[Tuple[str, torch.Tensor]]

Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

Parameters
  • prefix (str) – prefix to prepend to all buffer names.

  • recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.

Yields

(string, torch.Tensor) – Tuple containing the name and buffer

Example:

>>> for name, buf in self.named_buffers():
>>>    if name in ['running_var']:
>>>        print(buf.size())
named_children() Iterator[Tuple[str, torch.nn.modules.module.Module]]

Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

Yields

(string, Module) – Tuple containing a name and child module

Example:

>>> for name, module in model.named_children():
>>>     if name in ['conv4', 'conv5']:
>>>         print(module)
named_modules(memo: Optional[Set[torch.nn.modules.module.Module]] = None, prefix: str = '')

Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

Yields

(string, Module) – Tuple of name and module

Note

Duplicate modules are returned only once. In the following example, l will be returned only once.

Example:

>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.named_modules()):
        print(idx, '->', m)

0 -> ('', Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
))
1 -> ('0', Linear(in_features=2, out_features=2, bias=True))
named_parameters(prefix: str = '', recurse: bool = True) Iterator[Tuple[str, torch.nn.parameter.Parameter]]

Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

Parameters
  • prefix (str) – prefix to prepend to all parameter names.

  • recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.

Yields

(string, Parameter) – Tuple containing the name and parameter

Example:

>>> for name, param in self.named_parameters():
>>>    if name in ['bias']:
>>>        print(param.size())
parameters(recurse: bool = True) Iterator[torch.nn.parameter.Parameter]

Returns an iterator over module parameters.

This is typically passed to an optimizer.

Parameters

recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.

Yields

Parameter – module parameter

Example:

>>> for param in model.parameters():
>>>     print(type(param), param.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)
register_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) torch.utils.hooks.RemovableHandle

Registers a backward hook on the module.

This function is deprecated in favor of nn.Module.register_full_backward_hook() and the behavior of this function will change in future versions.

Returns

a handle that can be used to remove the added hook by calling handle.remove()

Return type

torch.utils.hooks.RemovableHandle

register_buffer(name: str, tensor: Optional[torch.Tensor], persistent: bool = True) None

Adds a buffer to the module.

This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s running_mean is not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by setting persistent to False. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’s state_dict.

Buffers can be accessed as attributes using given names.

Parameters
  • name (string) – name of the buffer. The buffer can be accessed from this module using the given name

  • tensor (Tensor) – buffer to be registered.

  • persistent (bool) – whether the buffer is part of this module’s state_dict.

Example:

>>> self.register_buffer('running_mean', torch.zeros(num_features))
register_forward_hook(hook: Callable[[...], None]) torch.utils.hooks.RemovableHandle

Registers a forward hook on the module.

The hook will be called every time after forward() has computed an output. It should have the following signature:

hook(module, input, output) -> None or modified output

The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the forward. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called after forward() is called.

Returns

a handle that can be used to remove the added hook by calling handle.remove()

Return type

torch.utils.hooks.RemovableHandle

register_forward_pre_hook(hook: Callable[[...], None]) torch.utils.hooks.RemovableHandle

Registers a forward pre-hook on the module.

The hook will be called every time before forward() is invoked. It should have the following signature:

hook(module, input) -> None or modified input

The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the forward. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple).

Returns

a handle that can be used to remove the added hook by calling handle.remove()

Return type

torch.utils.hooks.RemovableHandle

register_full_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) torch.utils.hooks.RemovableHandle

Registers a backward hook on the module.

The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature:

hook(module, grad_input, grad_output) -> tuple(Tensor) or None

The grad_input and grad_output are tuples that contain the gradients with respect to the inputs and outputs respectively. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place of grad_input in subsequent computations. grad_input will only correspond to the inputs given as positional arguments and all kwarg arguments are ignored. Entries in grad_input and grad_output will be None for all non-Tensor arguments.

Warning

Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error.

Returns

a handle that can be used to remove the added hook by calling handle.remove()

Return type

torch.utils.hooks.RemovableHandle

register_parameter(name: str, param: Optional[torch.nn.parameter.Parameter]) None

Adds a parameter to the module.

The parameter can be accessed as an attribute using given name.

Parameters
  • name (string) – name of the parameter. The parameter can be accessed from this module using the given name

  • param (Parameter) – parameter to be added to the module.

requires_grad_(requires_grad: bool = True) torch.nn.modules.module.T

Change if autograd should record operations on parameters in this module.

This method sets the parameters’ requires_grad attributes in-place.

This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training).

Parameters

requires_grad (bool) – whether autograd should record operations on parameters in this module. Default: True.

Returns

self

Return type

Module

share_memory() torch.nn.modules.module.T
state_dict(destination=None, prefix='', keep_vars=False)

Returns a dictionary containing a whole state of the module.

Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names.

Returns

a dictionary containing a whole state of the module

Return type

dict

Example:

>>> module.state_dict().keys()
['bias', 'weight']
to(*args, **kwargs)

Moves and/or casts the parameters and buffers.

This can be called as

to(device=None, dtype=None, non_blocking=False)
to(dtype, non_blocking=False)
to(tensor, non_blocking=False)
to(memory_format=torch.channels_last)

Its signature is similar to torch.Tensor.to(), but only accepts floating point or complex dtype`s. In addition, this method will only cast the floating point or complex parameters and buffers to :attr:`dtype (if given). The integral parameters and buffers will be moved device, if that is given, but with dtypes unchanged. When non_blocking is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.

See below for examples.

Note

This method modifies the module in-place.

Parameters
  • device (torch.device) – the desired device of the parameters and buffers in this module

  • dtype (torch.dtype) – the desired floating point or complex dtype of the parameters and buffers in this module

  • tensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module

  • memory_format (torch.memory_format) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)

Returns

self

Return type

Module

Examples:

>>> linear = nn.Linear(2, 2)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]])
>>> linear.to(torch.double)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]], dtype=torch.float64)
>>> gpu1 = torch.device("cuda:1")
>>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
>>> cpu = torch.device("cpu")
>>> linear.to(cpu)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16)

>>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble)
>>> linear.weight
Parameter containing:
tensor([[ 0.3741+0.j,  0.2382+0.j],
        [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128)
>>> linear(torch.ones(3, 2, dtype=torch.cdouble))
tensor([[0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)
train(mode: bool = True) torch.nn.modules.module.T

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns

self

Return type

Module

type(dst_type: Union[torch.dtype, str]) torch.nn.modules.module.T

Casts all parameters and buffers to dst_type.

Parameters

dst_type (type or string) – the desired type

Returns

self

Return type

Module

xpu(device: Optional[Union[int, torch.device]] = None) torch.nn.modules.module.T

Moves all model parameters and buffers to the XPU.

This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on XPU while being optimized.

Parameters

device (int, optional) – if specified, all parameters will be copied to that device

Returns

self

Return type

Module

zero_grad(set_to_none: bool = False) None

Sets gradients of all model parameters to zero. See similar function under torch.optim.Optimizer for more context.

Parameters

set_to_none (bool) – instead of setting to zero, set the grads to None. See torch.optim.Optimizer.zero_grad() for details.

training: bool

Metric Functions

Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

hyperion.torch.metrics.accuracy_functional.categorical_accuracy(input, target, weight=None, reduction='mean')[source]
hyperion.torch.metrics.accuracy_functional.binary_accuracy(input, target, weight=None, reduction='mean', thr=0.5)[source]
hyperion.torch.metrics.accuracy_functional.binary_accuracy_with_logits(input, target, weight=None, reduction='mean', thr=0)[source]

Loggers

The logger classes are used to write information to standard output, log files, tensorboard or WandB. The LoggerList class contains a set of loggers. When we log something to the LoggerList, the same is written in all the loggers contained in it. The loggers support multi-gpu training with DistributedDataParallel

Individual Loggers

class hyperion.torch.loggers.logger.Logger[source]

Base class for logger objects

params

training params dictionary

__init__()[source]
on_epoch_begin(epoch, logs, **kwargs)[source]

At the start of an epoch

Parameters
  • epoch – index of the epoch

  • logs – dictionary of logs

on_epoch_end(logs, **kwargs)[source]

At the end of an epoch

Parameters

logs – dictionary of logs

on_batch_begin(batch, logs, **kwargs)[source]

At the start of a batch

Parameters
  • batch – batch index within the epoch

  • logs – dictionary of logs

on_batch_end(logs, **kwargs)[source]

At the end of a batch

Parameters
  • batch – batch index within the epoch

  • logs – dictionary of logs

on_train_begin(logs, **kwargs)[source]

At the start of training

Parameters

logs – dictionary of logs

on_train_end(logs, **kwargs)[source]

At the end of training

Parameters
  • batch – batch index within the epoch

  • logs – dictionary of logs

class hyperion.torch.loggers.prog_logger.ProgLogger(metrics=None, interval=10)[source]

Logger that prints training progress to stdout

metrics

list of metrics

interval

number of batches between prints

__init__(metrics=None, interval=10)[source]
on_train_begin(logs=None, **kwargs)[source]

At the start of training

Parameters

logs – dictionary of logs

on_epoch_begin(epoch, logs=None, **kwargs)[source]

At the start of an epoch

Parameters
  • epoch – index of the epoch

  • logs – dictionary of logs

on_batch_begin(batch, logs=None, **kwargs)[source]

At the start of a batch

Parameters
  • batch – batch index within the epoch

  • logs – dictionary of logs

on_batch_end(logs=None, **kwargs)[source]

At the end of a batch

Parameters
  • batch – batch index within the epoch

  • logs – dictionary of logs

on_epoch_end(logs=None, **kwargs)[source]

At the end of an epoch

Parameters

logs – dictionary of logs

estimate_epoch_time()[source]
static sec2str(t)[source]
on_train_end(logs, **kwargs)

At the end of training

Parameters
  • batch – batch index within the epoch

  • logs – dictionary of logs

class hyperion.torch.loggers.csv_logger.CSVLogger(file_path, sep=',', append=False)[source]
Logger that prints metrics to csv file

at the end of each epoch

file_path

filenane of csv file.

sep

column separator for csv file

append

False, overwrite existing file, True, appends.

__init__(file_path, sep=',', append=False)[source]
on_train_begin(logs=None, **kwargs)[source]

At the start of training

Parameters

logs – dictionary of logs

on_epoch_end(logs=None, **kwargs)[source]

At the end of an epoch

Parameters

logs – dictionary of logs

on_train_end(logs=None, **kwargs)[source]

At the end of training

Parameters
  • batch – batch index within the epoch

  • logs – dictionary of logs

on_batch_begin(batch, logs, **kwargs)

At the start of a batch

Parameters
  • batch – batch index within the epoch

  • logs – dictionary of logs

on_batch_end(logs, **kwargs)

At the end of a batch

Parameters
  • batch – batch index within the epoch

  • logs – dictionary of logs

on_epoch_begin(epoch, logs, **kwargs)

At the start of an epoch

Parameters
  • epoch – index of the epoch

  • logs – dictionary of logs

class hyperion.torch.loggers.tensorboard_logger.TensorBoardLogger(tb_path, interval=10)[source]

Logger that sends training progress to tensorboard

tb_path

tensorboard output directory

__init__(tb_path, interval=10)[source]
on_train_begin(logs=None, **kwargs)[source]

At the start of training

Parameters

logs – dictionary of logs

on_epoch_begin(epoch, logs=None, **kwargs)[source]

At the start of an epoch

Parameters
  • epoch – index of the epoch

  • logs – dictionary of logs

on_batch_end(logs=None, **kwargs)[source]

At the end of a batch

Parameters
  • batch – batch index within the epoch

  • logs – dictionary of logs

on_epoch_end(logs=None, **kwargs)[source]

At the end of an epoch

Parameters

logs – dictionary of logs

on_train_end(logs=None, **kwargs)[source]

At the end of training

Parameters
  • batch – batch index within the epoch

  • logs – dictionary of logs

on_batch_begin(batch, logs, **kwargs)

At the start of a batch

Parameters
  • batch – batch index within the epoch

  • logs – dictionary of logs

class hyperion.torch.loggers.wandb_logger.WAndBLogger(project=None, group=None, name=None, path=None, mode='online', interval=10)[source]

Logger that sends training progress to weights and biases (wandb)

tb_path

tensorboard output directory

__init__(project=None, group=None, name=None, path=None, mode='online', interval=10)[source]
on_train_begin(logs=None, **kwargs)[source]

At the start of training

Parameters

logs – dictionary of logs

on_epoch_begin(epoch, logs=None, **kwargs)[source]

At the start of an epoch

Parameters
  • epoch – index of the epoch

  • logs – dictionary of logs

on_batch_end(logs=None, **kwargs)[source]

At the end of a batch

Parameters
  • batch – batch index within the epoch

  • logs – dictionary of logs

on_epoch_end(logs=None, **kwargs)[source]

At the end of an epoch

Parameters

logs – dictionary of logs

on_train_end(logs=None, **kwargs)[source]

At the end of training

Parameters
  • batch – batch index within the epoch

  • logs – dictionary of logs

on_batch_begin(batch, logs, **kwargs)

At the start of a batch

Parameters
  • batch – batch index within the epoch

  • logs – dictionary of logs

Logger List

class hyperion.torch.loggers.logger_list.LoggerList(loggers=None)[source]

Container for a list of logger callbacks

loggers

list of Logger objects

__init__(loggers=None)[source]
append(logger)[source]
property tensorboard_logger
property tensorboard_writer
on_epoch_begin(epoch, logs=None, **kwargs)[source]

At the start of an epoch

Parameters
  • epoch – index of the epoch

  • logs – dictionary of logs

on_epoch_end(logs=None, **kwargs)[source]

At the end of an epoch

Parameters
  • epoch – index of the epoch

  • logs – dictionary of logs

on_batch_begin(batch, logs=None, **kwargs)[source]

At the start of a batch

Parameters
  • batch – batch index within the epoch

  • logs – dictionary of logs

on_batch_end(logs=None, **kwargs)[source]

At the end of a batch

Parameters
  • batch – batch index within the epoch

  • logs – dictionary of logs

on_train_begin(logs=None, **kwargs)[source]

At the start of training

Parameters

logs – dictionary of logs

on_train_end(logs=None, **kwargs)[source]

At the end of training

Parameters
  • batch – batch index within the epoch

  • logs – dictionary of logs

Utils

Device Handling Utils

Utilities to handle GPU devices, like finding a free GPU in a shared server.

Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

hyperion.torch.utils.devices.open_device(num_gpus=1, gpu_ids=None, find_free_gpu=False)[source]
hyperion.torch.utils.devices.find_free_gpus(num_gpus)[source]

Distributed Data Parallel Utils

These contains utils to perform multigpu training with Distributed Data Paralell.

Copyright 2021 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

hyperion.torch.utils.ddp.add_ddp_args(parser)[source]
hyperion.torch.utils.ddp.filter_ddp_args(**kwargs)[source]
hyperion.torch.utils.ddp.ddp_init(gpu_id, num_gpus, node_id=0, num_nodes=1, master_addr='localhost', master_port=None)[source]
hyperion.torch.utils.ddp.ddp_cleanup()[source]
class hyperion.torch.utils.ddp.TorchDDP(*args: Any, **kwargs: Any)[source]
__init__(*args: Any, **kwargs: Any) None
class hyperion.torch.utils.ddp.FairShardedDDP(*args: Any, **kwargs: Any)[source]
__init__(module: torch.nn.Module, sharded_optimizer: Union[fairscale.optim.oss.OSS, List[fairscale.optim.oss.OSS]], process_group: Optional[Any] = None, broadcast_buffers: bool = True, sync_models_at_startup: bool = True, reduce_buffer_size: int = 8388608, auto_refresh_trainable: bool = True, reduce_fp16: bool = False)
_clear_counters() None

Reset all the grad reduce and call counters

_consume_work_handles() None

Consume all the futures which are tied to this optimizer’s buckets. We start from the first/older ones, since they are the most likely to be ready and non-blocking

_get_reduce_fn(index: int, param: torch.Tensor, dst_rank: int) Callable

Two possible backward hooks for a given parameter: either directly reduce to the appropriate rank, or contribute to a bucket and reduce when the bucket is full.

Either way a delayed action is necessary and is passed as a callback.

_passing_sync_batchnorm_handle(module: torch.nn.Module) None

Passes handle required for torch.nn.modules.SyncBatchNorm. Adapted from torch.nn.distributed.DistributedDataParallel.

_setup_backward_hooks() None

Attach a reduce function to each grad-requiring parameter. This makes the gradient reduction automatic whenever there’s a backward pass

_setup_bucket_strategy() None

Devise a bucketing strategy on a per-rank ownership level. These buckets will not be sharded, since the gradients would be re-allocated during the backward in that case. This method can be a slow for big models, but it it not typically called often (not for every forward for instance)

_sync_params_and_buffers() None

Sync the complete model states in between the ranks

_try_consume_work_handle() None

Try to consume the oldest future. This is non blocking, if not ready we’ll pass

forward(*inputs: Any, **kwargs: Any) Any

Module forward pass, handles any DDP-specific work in the background. Primes the backward pass for gradient reduction to the proper ranks.

no_sync() Generator

A context manager to disable gradient synchronization.

reduce() None

This does not need to be called, the gradient reduction is done automatically during the BW pass. Use this method to reduce the gradients manually

refresh_trainable() None

If the module trainability has changed, update all the assumptions

sync_buffers(blocking: bool = False) None

Sync all the param buffers in between ranks (including for instance batch norm statistics).

Parameters

blocking (bool) – wait for the operation to conclude.

to(device: Optional[torch.device], dtype: Optional[torch.dtype] = None, non_blocking: bool = False) fairscale.nn.data_parallel.sharded_ddp.ShardedDataParallel

Moves and/or casts the parameters and buffers.

Its signature is similar to torch.Tensor.to(), but only accepts floating point desired dtype s. In addition, this method will only cast the floating point parameters and buffers to dtype (if given). The integral parameters and buffers will be moved device, if that is given, but with dtypes unchanged. When non_blocking is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.

Note

This method modifies the module in-place.

Parameters
  • device (torch.device) – the desired device of the parameters and buffers in this module.

  • dtype (torch.dtype) – the desired floating point type of the floating point parameters and buffers.

  • non_blocking (bool) – make it an asynchronous call.

Returns

self.

Return type

Module

zero_grad(set_to_none: bool = False) None

Sets gradients of all model parameters to zero. See similar function under torch.optim.Optimizer for more context.

Parameters

set_to_none (bool) – instead of setting to zero, set the grads to None. See torch.optim.Optimizer.zero_grad() for details.

class hyperion.torch.utils.ddp.FairFullyShardedDDP(*args: Any, **kwargs: Any)[source]
__getstate__() Dict[str, str]

Serialize the state of the current FullyShardedDataParallel instance.

Some properties are not serializable (e.g., process groups, streams), so we remove them and try to reconstruct them in __setstate__().

__init__(module: torch.nn.Module, process_group: Optional[torch.distributed.ProcessGroup] = None, reshard_after_forward: bool = True, mixed_precision: bool = False, fp32_reduce_scatter: bool = False, flatten_parameters: bool = True, move_params_to_cpu: bool = False, compute_dtype: Optional[torch.dtype] = None, buffer_dtype: Optional[torch.dtype] = None, move_grads_to_cpu: Optional[bool] = None, bucket_cap_mb: int = 25, compute_device: Optional[torch.device] = None, no_broadcast_optim_state: Optional[bool] = False, state_dict_device: Optional[torch.device] = None, clear_autocast_cache: bool = False, force_input_to_fp32: bool = False, verbose: bool = False, cpu_offload: bool = False)
__setstate__(state: Dict[str, Any]) None

Intercept state setting and perform needed changes on params.

_broadcast_pad_info_to_r0() List[List[List[int]]]

Collect [x.numel_padded_per_param for x in self._fsdp_instances] from teach rank.

_cast_buffers(device: Optional[torch.device] = None, dtype: Optional[torch.dtype] = None, memo: Optional[Set] = None) None

Move all buffers to the given device and dtype.

If device or dtype are not given, then they will default to self.compute_device and self.buffer_dtype, respectively. In the case of nested FSDP instances, we will respect the child instance’s compute_device and buffer_dtype configuration.

Parameters
  • device (torch.device, Optional) – device to cast buffers to (defaults to compute_device)

  • dtype (torch.dtype, Optional) – dtype to cast buffers to (defaults to buffer_dtype)

  • memo (Set, Optional) – set of modules that have already been processed

_cast_fp32_param_shards_to_fp16(params: Optional[List[torch.nn.Parameter]] = None) None

Cast FP32 param shard to FP16 for a list of params.

_free_fp16_param_shard(params: Optional[List[torch.nn.Parameter]] = None) None

Free storage for FP16 shards for a list of params.

_free_full_params(params: Optional[List[torch.nn.Parameter]] = None) None

Free up storage for full parameters.

_gather_optim_state(sd_state: Dict[int, Dict[str, Any]]) Tuple[Dict[int, Dict[str, List]], Dict[int, Dict[str, List]]]

For each value in state[i], if the value is a tensor, collect it from the world. Else use rank 0’s entry.

_get_shard(tensor: torch.Tensor) Tuple[torch.Tensor, int]

Return the local shard of a full tensor.

_init_param_attributes(p: torch.nn.Parameter) None

We manage several attributes on each Parameter instance. The first two are set by _shard_parameters_():

_is_sharded: True if the Parameter is sharded or False

if the Parameter is intentionally not sharded (in which case we will all-reduce grads for this param).

_orig_size: the size of the original Parameter (before sharding)

The remaining attributes are set here:
_fp32_shard: a single shard of the parameters in full precision

(typically FP32, but this is dependent on the dtype of the model as it’s passed in by the user). This can be on CPU or GPU depending on the value of ``cpu_offload``.

_fp16_shard: if ``mixed_precision`` is True, this will be

a single shard of the parameters in FP16, used for all-gather.

_full_param_padded: the full weight (padded to be evenly

divisible by world_size), used for computation in the forward and backward pass. This will be resized in place and only materialized (via all-gather) as needed.

_lazy_init() None

Initialization steps that should happen lazily, typically right before the first forward pass.

_load_state_dict(state_dict: Union[Dict[str, torch.Tensor], OrderedDict[str, torch.Tensor]], strict: bool = True) NamedTuple

Load a whole (unsharded) state_dict.

Warning

This needs to be called on all ranks, since synchronization primitives will be used.

_post_backward_hook(param: torch.nn.Parameter, *unused: Any) None

At the start of _post_backward_hook(), param.grad contains the full gradient for the local batch. The reduce-scatter op will replace param.grad with a single shard of the summed gradient across all GPUs. This shard will align with the current GPU rank. For example:

before reduce_scatter:
    param.grad (GPU #0): [1, 2, 3, 4]
    param.grad (GPU #1): [5, 6, 7, 8]

after reduce_scatter:
    param.grad (GPU #0): [6, 8]    # 1+5, 2+6
    param.grad (GPU #1): [10, 12]  # 3+7, 4+8

The local GPU’s optim.step is responsible for updating a single shard of params, also corresponding to the current GPU’s rank. This alignment is created by _shard_parameters_(), which ensures that the local optimizer only sees the relevant parameter shard.

_post_reduction_hook(param: torch.nn.Parameter, reduced_grad: torch.Tensor) None

Hook to call on each param after the reduce-scatter.

_prep_grads_for_backward() None

Make sure p.grad has the correct size/device, otherwise set it to None.

_print_r0(msg: str, restart: bool = False) None

Debugging utility to print memory usage stats nicely on rank 0

_queue_wait_for_post_backward() None

Try to queue a wait_for_post_backward callback.

Only called on root and only queue one callback. But can be called by children FSDPs via a closure in case the root instance doesn’t own any params.

_rebuild_full_params(force_full_precision: bool = False) Optional[List[Tuple[torch.Tensor, bool]]]

Gather all shards of params.

Parameters

force_full_precision (bool, Optional) – by default params will be gathered in compute_dtype (e.g., FP16), unless force_full_precision is True, in which case they will be gathered in full precision (e.g., FP32), possibly in fresh storage. The parameter that’s being rebuilt will end up in full precision as well.

Returns

A list of tuples, where the first element is the full-sized param and the second element is a bool indicating if it’s safe for the caller to free the full-sized param. This will be None if force_full_precision=False and the full params are already gathered.

_register_post_backward_hooks() None

Register backward hooks to reshard params and reduce-scatter grads.

This is called during forward pass. The goal is to attach a hook on each of the parameter’s gradient generating function (grad_acc below) so that the hook is called after all gradients for that param are computed.

Goals:

1. We want the hook to fire once and only once after all gradients are accumulated for a param. 2. If it fires more than once, we end up incorrectly shard the grad multiple times. (could lead to dimension too small) 3. If it fires once but too early or doesn’t fire, we leave gradients unsharded. (could lead to dimension too large)

Due to multiple-pass forward, this function can be called on the same parameter multiple times in a single forward pass. If we register the hook multiple time, we end up getting called multiple times. We could try to get a new hook every time and delete the previous one registered. However, due to unknown reason (I have debugged it for a long time!), in mixed precision mode, we get two different grad_acc objects below during different calls of this function (in the same forward pass). If we keep the last one, the hook end up firing too early. In full precision mode, we luckily get the same grad_acc object, so deleting and re-registering still ensured the hook fire once after all gradients are generated.

Empirically, keep the first hook register per forward pass seems to work the best. We do need to remove the hook at the end of the backward pass. Otherwise, the next forward pass will not register a new hook, which is needed for a new forward pass.

_register_pre_backward_hooks(outputs: Any) Any

Register pre-backward hook to run before the wrapped module’s backward. Hooks should be attached to all outputs from the forward.

Returns

new outputs with hooks registered if they requires gradient.

Return type

outputs

_reset_lazy_init() None

Reset instance so _lazy_init() will run on the next forward.

_set_is_root() None

If True, implies that no other FullyShardedDataParallel instance wraps this one. Called once by _lazy_init(). Also sets self.children_share_process_group = True if all child instances share the same process group. If some child instances use a different process group, self.clip_grad_norm_ will raise an error.

_setup_streams() None

Create streams to overlap data transfer and computation.

_shard_parameters_() None

At initialization we wrap a module with full parameters and shard the parameters in-place. Sharding is implemented by viewing each parameter as a 1D Tensor and retaining only a single slice, where the slice size is determined by the number of data parallel workers.

Wrapping modules with many small parameters (or with a very large data parallel world size) will result in many small parameter shards and slow performance. In this case it’s better to set ``flatten_parameters`` to True, so that all of the small parameters in the module are combined into a single contiguous Tensor and sharded once.

After this initial sharding is complete, the user can initialize a torch.optim.Optimizer in the usual way, i.e.:

.. code-block:: python

optim = torch.optim.Adam(sharded_module.parameters(), lr=0.0001)

The optimizer will see only a single slice of parameters and will thus allocate less memory for optimizer state, avoiding redundancy across data parallel workers.

_use_fp32_param_shard(params: Optional[List[torch.nn.Parameter]] = None) None

Use FP32 shard for a list of params.

_use_full_params() None

Switch p.data pointers to use the full params.

Note: this assumes full params are already gathered.

_wait_for_post_backward() None

Wait for post-backward to finish. Only called on root instance.

_wait_for_previous_optim_step() None

The outer-most FullyShardedDataParallel instance (i.e., the root instance) needs to synchronize with the default stream to ensure the previous optimizer step is done.

apply(fn: Callable[[torch.nn.Module], None]) fairscale.nn.data_parallel.fully_sharded_data_parallel.FullyShardedDataParallel

Applies fn recursively to every submodule (as returned by .children()) as well as self. Typical use includes initializing the parameters of a model.

Compared to torch.nn.Module.apply, this version additionally gathers the full parameters before applying fn. It should not be called from within another summon_full_params context.

Parameters

fn (nn.Module) – function to be applied to each submodule

Returns

self

Return type

Module

assert_state(state: Union[fairscale.nn.data_parallel.fully_sharded_data_parallel.TrainingState, List[fairscale.nn.data_parallel.fully_sharded_data_parallel.TrainingState]]) None

Assert we are in the given state.

clip_grad_norm_(max_norm: Union[float, int], norm_type: Union[float, int] = 2.0) torch.Tensor

Clip all gradients at this point in time. The norm is computed over all gradients together, as if they were concatenated into a single vector. Gradients are modified in-place.

Parameters
  • max_norm (float or int) – max norm of the gradients

  • norm_type (float or int) – type of the used p-norm. Can be 'inf' for infinity norm.

Returns

Total norm of the parameters (viewed as a single vector).

Note

This is analogous to torch.nn.utils.clip_grad_norm_ but handles the partitioning and multiple devices per rank under the hood. The default torch util is not applicable here, because each rank only has a partial view of all the grads in the model, so calling it in the OSS context would lead to different scaling being applied per subset of model parameters.

Warning

This needs to be called on all ranks, since synchronization primitives will be used.

static consolidate_shard_weights(shard_weights: List[Dict[str, torch.Tensor]], shard_metadata: List[Dict[str, Any]], with_module_buffers: bool = True) Dict[str, torch.Tensor]

Given a list of weights and meta data associated to N shards, reconstruct the weights of an equivalent consolidated (non-sharded) model.

Module parameters are consolidated using the shard metadata.

Module buffers are taken from shard 0: this assumes that module buffers are either synchronized or that the shard 0 value is valid for all shards. If this behavior is not correct for your module (for instance if buffers needs to be reduced instead), you can disable it with with_module_buffers=False.

This method is used to re-assemble checkpoints of shards without having to instantiate FSDP wrappers with the world size originally used to save the shards.

property cpu_offload: bool
extra_repr() str
forward(*args: Any, **kwargs: Any) torch.Tensor
gather_full_optim_state_dict(optim: torch.optim.Optimizer, **ignored: Dict) Optional[Dict[str, Any]]

Return the last known global optimizer state. The returned state is compatible with Pytorch, in that the sharded properties are not exposed. Multiple parameter groups are not yet supported.

This should be called only on the root FSDP instance. Nested FSDP instances are supported as long as they have the same world_size as the parent or world_size=1.

Parameters

optim (Optimizer) – an optimizer instance for this FSDP rank. Its state_dict is used in the consolidation. However, its state is not modified.

Returns

  • A dict with four entries (On rank zero, other workers return None)
    • state - a dict holding gathered optimization state, 1 entry per unflat parameter

    • param_groups - a dict containing the 1 parameter group

    • param_id_map - global (unflat) to local (flat) id mapping

    • uncollected_local_ids - keys in the state dict that were not broadcast

get_shard_from_optim_state_dict(full_optim_state_dict: Dict[str, Any]) Dict[str, Any]

Get the portion of the optimizer state dict associated with the shard

This can be used to get the right sharded optimizer state to be loaded into the sharded optimizer for this FSDP rank.

Parameters

full_optim_state_dict (dict) – consolidated optimizer state returned by gather_full_optim_state, or loaded from a checkpoint.

Returns

a shard of the optimizer state.

Return type

(dict)

load_local_state_dict(state_dict: Union[Dict[str, torch.Tensor], OrderedDict[str, torch.Tensor]], strict: bool = True) NamedTuple

Load a local (sharded) state_dict.

load_state_dict(state_dict: Union[Dict[str, torch.Tensor], OrderedDict[str, torch.Tensor]], strict: bool = True) NamedTuple
local_metadata_dict() Dict[str, Any]

Get the information needed to reconstruct the model from shards offline.

local_state_dict(*args: Any, **kwargs: Any) Any

Returns the local (sharded) state of the module. Parameters are sharded, so the resulting state_dict can only be loaded after the Module has been wrapped with FullyShardedDataParallel.

property module: torch.nn.Module
no_sync() Generator

A context manager to disable gradient synchronizations across DDP processes. Within this context, gradients will be accumulated on module variables, which will later be synchronized in the first forward-backward pass after exiting the context.

Note

This may result in higher memory usage because we will accumulate the full model gradients (instead of gradient shards) until the eventual sync.

property params_with_grad: List[torch.nn.Parameter]

[p for p in self.parameters() if p.grad is not None]

set_gradient_divide_factors(pre: float, post: float, recursive: bool) None

Allowing user to override the pre and post divide factors.

Parameters
  • pre (float) – divide factor before the reduction.

  • post (float) – divide factor after the reduction.

  • recursive (bool) – recursively set it for all child FSDP instances or not.

state_dict(*args: Any, **kwargs: Any) Any

Returns the whole (unsharded) state of the module. Parameters are not sharded, so the resulting state_dict can be loaded directly by the wrapped Module without any sharding-specific logic. Returned tensors will be full precision (e.g., FP32).

Warning

This needs to be called on all ranks, since synchronization primitives will be used.

summon_full_params(recurse: bool = True, volatile: bool = False) Generator

A context manager to expose full params for the current FSDP instance. Can be useful after forward/backward for a model to get the params for additional processing or checking. Parameters will be gathered in full precision (e.g., FP32).

Note

This can be used on inner FSDPs.

Note

This can not be used within a forward or backward pass. Nor can forward and backward be started from within this context.

Note

The full parameters will be freed after the context manager exits; it is up to the caller to clone them if needed.

Note

The full parameters can be modified, but only the portion corresponding to the local param shard will persist after the context manager exits (unless volatile=True, in which case there are no guarantees about persistence).

Parameters
  • recurse (bool, Optional) – recursively summon all params for nested FSDP instances (default: True)

  • volatile (bool, Optional) – if True, modifications to params are not guaranteed to persist after the context manager exists; enabling this can be slightly more efficient (default: False)

Metric Accumulators

Tools to combine the metrics computed in multiple GPUs into a single metric

Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

class hyperion.torch.utils.metric_acc.MetricAcc(device=None)[source]

Class to accumulate metrics during an epoch.

__init__(device=None)[source]
reset()[source]

Resets the accumulators.

update(metrics, num_samples=1)[source]

Updates the values of the metric

It uses recursive formula, it may be more numerically stable

m^(i) = m^(i-1) + n^(i)/sum(n^(i)) (x^(i) - m^(i-1))

where i is the batch number, m^(i) is the accumulated average of the metric at batch i, x^(i) is the average of the metric at batch i, n^(i) is the batch_size at batch i.

Parameters
  • metrics – dictionary with metrics for current batch

  • num_samples – number of samples in current batch (batch_size)

property metrics

Returns metrics dictionary

Evaluation Utils

Functions that can be usefull when evaluating neural networks. For example, when a signal is too long to fit in memory and needs to be splitted into chunks

Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

hyperion.torch.utils.eval_utils.eval_nnet_by_chunks(x, nnet, chunk_length=0, detach_chunks=True, time_dim=- 1)[source]
hyperion.torch.utils.eval_utils.eval_nnet_overlap_add(x, nnet, chunk_length=0, chunk_overlap=None, detach_chunks=True, time_dim=- 1)[source]

Math Functions

Copyright 2020 Johns Hopkins University (Author: Jesus Villalba, Nanxin Chen) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

hyperion.torch.utils.math.invert_trimat(A, lower=False, right_inv=False, return_logdet=False, return_inv=False)[source]
Inversion of triangular matrices.

Returns lambda function f that multiplies the inverse of A times a vector.

Parameters
  • A – Triangular matrix.

  • lower – if True A is lower triangular, else A is upper triangular.

  • right_inv – If False, f(v)=A^{-1}v; if True f(v)=v’ A^{-1}

  • return_logdet – If True, it also returns the log determinant of A.

  • return_inv – If True, it also returns A^{-1}

Returns

Lambda function that multiplies A^{-1} times vector. Log determinant of A A^{-1}

Miscellaneous Functions

Copyright 2020 Johns Hopkins University (Author: Jesus Villalba, Nanxin Chen) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)

hyperion.torch.utils.misc.l2_norm(x, axis=- 1)[source]
hyperion.torch.utils.misc.compute_snr(x, n, axis=- 1)[source]
hyperion.torch.utils.misc.compute_stats_adv_attack(x, x_adv)[source]
hyperion.torch.utils.misc.get_selfsim_tarnon(y, return_mask=False)[source]