PyTorch Models and Tools
The module hyperion.torch provides utilities, dataloaders, neural architectures and models based on PyTorch
Layers
These include several custom neural network layers.
Activation Function Layers
These includes a factory class the creates activation layers from config parameters, and custom activation layers.
- class hyperion.torch.layers.activation_factory.ActivationFactory[source]
- static create(activation, **kwargs)[source]
Creates a non-linear activation object
- Parameters
activation – str with activation type, dictionary with name field indicating the activation type, and extra activation arguments None, then it returns None, Activation constructor
**kwargs – extra arguments for activation constructor
- Returns
Non-linear activation object
Normalization Layers
These includes a factory class the creates normalizaton layers from config parameters.
Copyright 2020 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- class hyperion.torch.layers.norm_layer_factory.NormLayer2dFactory[source]
- static create(norm_name, num_groups=None, momentum=0.1, eps=1e-05)[source]
Creates a layer-norm callabe constructor
- Parameters
norm_name –
str with normalization layer name, in [batch-norm, group-norm, instance-norm,
instance-norm-affine, layer-norm ]
num_groups – num_groups for group-norm
momentum – default momentum
eps – default epsilon for numerical stability
- Returns
Callable contructor to crate layer-norm layers
- class hyperion.torch.layers.norm_layer_factory.NormLayer1dFactory[source]
- static create(norm_name, num_groups=None, momentum=0.1, eps=1e-05)[source]
Creates a layer-norm callabe constructor
- Parameters
norm_name –
str with normalization layer name, in [batch-norm, group-norm, instance-norm,
instance-norm-affine, layer-norm ]
num_groups – num_groups for group-norm
momentum – default momentum
eps – default epsilon for numerical stability
- Returns
Callable contructor to crate layer-norm layers
Dropout Layers
These include custom dropout and drop-connect layers
Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- class hyperion.torch.layers.dropout.Dropout1d(*args: Any, **kwargs: Any)[source]
-
- __init__(*args: Any, **kwargs: Any) None
Attention Layers
Attention layers like the ones used in Transformers and Conformers.
Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- class hyperion.torch.layers.attention.ScaledDotProdAttV1(*args: Any, **kwargs: Any)[source]
Scaled dot product multihead attention layer
- in_feats
input feature dimension
- out_feats
output feature dimension
- num_heads
number of heads
- d_k
key/query projection dimension
- d_v
value projection dimension
- dropout_rate
dropout rate
- time_dim
time dimension in the input, default=1 meaning input dimensions are (batch, time, in_feats)
- property in_feats
- property out_feats
- forward(query, key, value, mask=None)[source]
Computes ‘Scaled Dot Product Attention’.
- Parameters
query – query with size=(batch, time1, in_feats), where time1 is the output time dimension
key – key with size=(batch, time2, in_feats) where time1 is the input time dimension
value – value with size=(batch, time2, in_feats)
mask – optional mask with size=(batch, time1, time2), to zero attention between some time steps or size=(batch, time) to make time1=time2
- Returns
Attention weigthed average of the value with size=(batch, time1, out_feats)
- class hyperion.torch.layers.attention.LocalScaledDotProdAttV1(*args: Any, **kwargs: Any)[source]
- Local Scaled dot product multihead attention layer
It calculates self-attention between time steps within a window of ‘context’ frames.
- in_feats
input feature dimension
- out_feats
output feature dimension
- num_heads
number of heads
- d_k
key/query projection dimension
- d_v
value projection dimension
- context
maximum attention temporal context.
- dropout_rate
dropout rate
- time_dim
time dimension in the input, default=1 meaning input dimensions are (batch, time, in_feats)
- __init__(in_feats, out_feats, num_heads, d_k, d_v, context=25, dropout_rate=0, time_dim=1)[source]
Construct an MultiHeadedAttention object.
- static _softmax(scores1, scores2, shift1, shift2, t1, t2)[source]
Computes softmax for block diagonal attention maps
- Parameters
scores1 – attention scores from block-diagonal score matrix with size=(batch, heads, blocks, t1, t2)
scores2 – attention scores from a shifted block-diagonal score matrix with size=(batch, heads, blocks-1, t1, t2)
shift1 – shift of diagonal blocks of scores2 wrt scores1 in time steps in the time dimension 1
shift2 – shift of diagonal blocks of scores2 wrt scores1 in time steps in the time dimension 2, with self-attention shift1=shift2
t1 – length of time dimension 1 (output time dimension)
t2 – length of time dimension 2 (input time dimension), with self-att t1=t2.
- Returns
- probs1: posterior attention scores for block-diagonal att. matrix
with size=(batch, heads, blocks, t1, t2)
- probs2: posterior attention scores for a shifted block-diagonal att. matrix
with size=(batch, heads, blocks-1, t1, t2)
- forward1(query, key, value, mask)[source]
Computes ‘Local Scaled Dot Product Attention’.
- Parameters
query – query with size=(batch, time1, in_feats), where time1 is the output time dimension
key – key with size=(batch, time2, in_feats) where time1 is the input time dimension
value – value with size=(batch, time2, in_feats)
mask –
- optional mask with size=(batch, time1, time2),
to zero attention between some time steps.
or (batch, time) if time1=time2
- Returns
Attention weigthed average of the values with size=(batch, time1, out_feats)
- forward2(query, key, value, mask)[source]
Computes ‘Local Scaled Dot Product Attention’.
- Parameters
query – query with size=(batch, time1, in_feats), where time1 is the output time dimension
key – key with size=(batch, time2, in_feats) where time1 is the input time dimension
value – value with size=(batch, time2, in_feats)
mask –
- optional mask with size=(batch, time1, time2),
to zero attention between some time steps.
or (batch, time) if time1=time2
- Returns
Attention weigthed average of the values with size=(batch, time1, out_feats)
- forward(query, key, value, mask)[source]
Computes ‘Local Scaled Dot Product Attention’.
- Parameters
query – query with size=(batch, time1, in_feats), where time1 is the output time dimension
key – key with size=(batch, time2, in_feats) where time1 is the input time dimension
value – value with size=(batch, time2, in_feats)
mask –
- optional mask with size=(batch, time1, time2),
to zero attention between some time steps.
or (batch, time) if time1=time2
- Returns
Attention weigthed average of the values with size=(batch, time1, out_feats)
- property in_feats
- property out_feats
- class hyperion.torch.layers.attention.ScaledDotProdAttRelPosEncV1(*args: Any, **kwargs: Any)[source]
- Scaled dot product multihead attention layer
with relative positional encoders as defined in https://arxiv.org/pdf/1901.02860.pdf
- in_feats
input feature dimension
- out_feats
output feature dimension
- num_heads
number of heads
- d_k
key/query projection dimension
- d_v
value projection dimension
- causal_pos_enc
positional encoder is 0 for attending future frames.
- dropout_rate
dropout rate
- time_dim
time dimension in the input, default=1 meaning input dimensions are (batch, time, in_feats)
- __init__(in_feats, out_feats, num_heads, d_k, d_v, causal_pos_enc=False, dropout_rate=0, time_dim=1)[source]
- _apply_tril(x)[source]
- Applies lower triangular mask to (Q + v^T) W R_{i-j} attention matrix
to keep causal attention points, i.e., i-j >= 0
E.g., if t1=3, t2=4 this will apply a mask [1 1 0 0;
1 1 1 0; 1 1 1 1 ]
- _apply_triu(x)[source]
- Applies upper triangular mask to (Q + v^T) W R_{i-j} attention matrix
to keep non-causal attention points, i.e., i-j < 0
E.g., if t1=3, t2=4 this will apply a mask [0 0 1 1;
0 0 0 1; 0 0 0 0 ]
- _left_shift(x)[source]
- Applies left shifts to the rows of x
to get scores with relative pos encodings R_{i-j} i-j >=0, causal attention
- E.g.
- [q0 R3, q0 R2, q0 R1, q0 R0;
q1 R3, q1 R2, q1 R1, q1 R0; q2 R3, q2 R2, q2 R1, q2 R0]
- becomes:
- [q0 R1, q0 R0, 0 , 0 ;
q1 R2, q1 R1, q1 R0, 0 ; q2 R3, q2 R2, q2 R1, q2 R0]
- _right_shift(x)[source]
- Applies right shifts to the rows of x
to get scores with relative pos encodings R_{i-j} i-j < 0, non-causal attention
- E.g.
- [q0 R_0, q0 R_{-1}, q0 R_{-2};
q1 R_0, q1 R_{-1}, q1 R_{-2}; q2 R_0, q1 R_{-1}, q2 R_{-2}]
- becomes:
- [ 0, q0 R_{-1}, q0 R_{-2};
0, 0 , q1 R_{-1}; 0, 0 , 0 ]
- forward(query, key, value, pos_emb=None, mask=None)[source]
Computes ‘Scaled Dot Product Attention’.
- Parameters
query – query with size=(batch, time1, in_feats), where time1 is the output time dimension
key – key with size=(batch, time2, in_feats) where time1 is the input time dimension
value – value with size=(batch, time2, in_feats)
pos_emb – positional embedding size=(batch, time2, in_feats) as R_{L-1}, …, R_0
mask – optional mask with size=(batch, time1, time2), to zero attention between some time steps or size=(batch, time) to make time1=time2
- Returns
Attention weigthed average of the value with size=(batch, time1, out_feats)
- property in_feats
- property out_feats
- class hyperion.torch.layers.attention.LocalScaledDotProdAttRelPosEncV1(*args: Any, **kwargs: Any)[source]
- Local Scaled dot product multihead attention layer
It calculates self-attention between time steps within a window of ‘context’ frames.
It uses relative positional encoders as defined in https://arxiv.org/pdf/1901.02860.pdf
- in_feats
input feature dimension
- out_feats
output feature dimension
- num_heads
number of heads
- d_k
key/query projection dimension
- d_v
value projection dimension
- context
maximum attention temporal context.
- causal_pos_enc
positional encoder is 0 for attending future frames.
- dropout_rate
dropout rate
- time_dim
time dimension in the input, default=1 meaning input dimensions are (batch, time, in_feats)
- __init__(in_feats, out_feats, num_heads, d_k, d_v, context=25, causal_pos_enc=False, dropout_rate=0, time_dim=1)[source]
Construct an MultiHeadedAttention object.
- _apply_tril(x)[source]
- Applies lower triangular mask to (Q + v^T) W R_{i-j} attention matrix
to keep causal attention points, i.e., i-j >= 0
E.g., if t1=3, t2=4 this will apply a mask [1 1 0 0;
1 1 1 0; 1 1 1 1 ]
- _apply_triu(x)[source]
- Applies upper triangular mask to (Q + v^T) W R_{i-j} attention matrix
to keep non-causal attention points, i.e., i-j < 0
E.g., if t1=3, t2=4 this will apply a mask [0 0 1 1;
0 0 0 1; 0 0 0 0 ]
- _left_shift(x, context, left_shift)[source]
- Applies left shifts to the rows of x
to get scores with relative pos encodings R_{i-j} i-j >=0, causal attention
- E.g.
- [q0 R3, q0 R2, q0 R1, q0 R0;
q1 R3, q1 R2, q1 R1, q1 R0; q2 R3, q2 R2, q2 R1, q2 R0]
- becomes:
- [q0 R1, q0 R0, 0 , 0 ;
q1 R2, q1 R1, q1 R0, 0 ; q2 R3, q2 R2, q2 R1, q2 R0]
- _right_shift(x, context, left_shift)[source]
- Applies right shifts to the rows of x
to get scores with relative pos encodings R_{i-j} i-j < 0, non-causal attention
- E.g.
- [q0 R_0, q0 R_{-1}, q0 R_{-2};
q1 R_0, q1 R_{-1}, q1 R_{-2}; q2 R_0, q1 R_{-1}, q2 R_{-2}]
- becomes:
- [ 0, q0 R_{-1}, q0 R_{-2};
0, 0 , q1 R_{-1}; 0, 0 , 0 ]
- forward(query, key, value, pos_emb=None, mask=None)[source]
Computes ‘Scaled Dot Product Attention’.
- Parameters
query – query with size=(batch, time1, in_feats), where time1 is the output time dimension
key – key with size=(batch, time2, in_feats) where time1 is the input time dimension
value – value with size=(batch, time2, in_feats)
pos_emb – positional embedding size=(batch, time2, in_feats) as R_{L-1}, …, R_0
mask – optional mask with size=(batch, time1, time2), to zero attention between some time steps or size=(batch, time) to make time1=time2
- Returns
Attention weigthed average of the value with size=(batch, time1, out_feats)
- static _softmax(scores1, scores2, shift1, shift2, t1, t2)
Computes softmax for block diagonal attention maps
- Parameters
scores1 – attention scores from block-diagonal score matrix with size=(batch, heads, blocks, t1, t2)
scores2 – attention scores from a shifted block-diagonal score matrix with size=(batch, heads, blocks-1, t1, t2)
shift1 – shift of diagonal blocks of scores2 wrt scores1 in time steps in the time dimension 1
shift2 – shift of diagonal blocks of scores2 wrt scores1 in time steps in the time dimension 2, with self-attention shift1=shift2
t1 – length of time dimension 1 (output time dimension)
t2 – length of time dimension 2 (input time dimension), with self-att t1=t2.
- Returns
- probs1: posterior attention scores for block-diagonal att. matrix
with size=(batch, heads, blocks, t1, t2)
- probs2: posterior attention scores for a shifted block-diagonal att. matrix
with size=(batch, heads, blocks-1, t1, t2)
- forward1(query, key, value, mask)
Computes ‘Local Scaled Dot Product Attention’.
- Parameters
query – query with size=(batch, time1, in_feats), where time1 is the output time dimension
key – key with size=(batch, time2, in_feats) where time1 is the input time dimension
value – value with size=(batch, time2, in_feats)
mask –
- optional mask with size=(batch, time1, time2),
to zero attention between some time steps.
or (batch, time) if time1=time2
- Returns
Attention weigthed average of the values with size=(batch, time1, out_feats)
- forward2(query, key, value, mask)
Computes ‘Local Scaled Dot Product Attention’.
- Parameters
query – query with size=(batch, time1, in_feats), where time1 is the output time dimension
key – key with size=(batch, time2, in_feats) where time1 is the input time dimension
value – value with size=(batch, time2, in_feats)
mask –
- optional mask with size=(batch, time1, time2),
to zero attention between some time steps.
or (batch, time) if time1=time2
- Returns
Attention weigthed average of the values with size=(batch, time1, out_feats)
- property in_feats
- property out_feats
Pooling Layers
These include custom pooling layers and factory class to create pooling layers from config parameters.
Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- class hyperion.torch.layers.pool_factory.GlobalPool1dFactory[source]
- static create(pool_type, in_feats=None, inner_feats=128, num_comp=64, dist_pow=2, use_bias=False, num_heads=8, d_k=256, d_v=256, bin_attn=False, use_global_context=True, norm_layer=None, dim=- 1, keepdim=False, **kwargs)[source]
- static add_argparse_args(parser, prefix=None, skip=[])
Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- hyperion.torch.layers.global_pool._conv1(in_channels, out_channels, bias=False)[source]
point-wise convolution
- class hyperion.torch.layers.global_pool.GlobalAvgPool1d(*args: Any, **kwargs: Any)[source]
Global average pooling in 1d
- dim
pooling dimension
- keepdim
it True keeps the same number of dimensions after pooling
- get_config()
- class hyperion.torch.layers.global_pool.GlobalMeanStdPool1d(*args: Any, **kwargs: Any)[source]
Global mean + standard deviation pooling in 1d
- dim
pooling dimension
- keepdim
it True keeps the same number of dimensions after pooling
- get_config()
- class hyperion.torch.layers.global_pool.GlobalMeanLogVarPool1d(*args: Any, **kwargs: Any)[source]
Global mean + log-variance pooling in 1d
- dim
pooling dimension
- keepdim
it True keeps the same number of dimensions after pooling
- forward_slidwin(x, win_length, win_shift)
- get_config()
- class hyperion.torch.layers.global_pool.LDEPool1d(*args: Any, **kwargs: Any)[source]
Learnable dictionary encoder pooling in 1d
- in_feats
input feature dimension
- num_comp
number of cluster components
- dist_pow
power for distance metric
- use_bias
use bias parameter when computing posterior responsibility
- dim
pooling dimension
- keepdim
it True keeps the same number of dimensions after pooling
- property num_comp
- property in_feats
- forward_slidwin(x, win_length, win_shift)
- class hyperion.torch.layers.global_pool.ScaledDotProdAttV1Pool1d(*args: Any, **kwargs: Any)[source]
-
- property in_feats
- forward_slidwin(x, win_length, win_shift)
- class hyperion.torch.layers.global_pool.GlobalChWiseAttMeanStdPool1d(*args: Any, **kwargs: Any)[source]
Attentive mean + stddev pooling for each channel
- __init__(in_feats, inner_feats=128, bin_attn=False, use_global_context=True, norm_layer=None, dim=- 1, keepdim=False)[source]
- forward_slidwin(x, win_length, win_shift)
Acoustic Feature Extraction Layers
These define several feature extraction layers that take wave as input and produce Spectrograms, Filter-banks, MFCC, etc. It also includes a factory class to create feature extraction layers from config params.
- class hyperion.torch.layers.audio_feats_factory.AudioFeatsFactory[source]
- static create(audio_feat, sample_frequency=16000, frame_length=25, frame_shift=10, fft_length=512, remove_dc_offset=True, preemphasis_coeff=0.97, window_type='povey', use_fft_mag=False, dither=1, fb_type='mel_kaldi', low_freq=20, high_freq=0, num_filters=23, norm_filters=False, num_ceps=13, snip_edges=True, center=False, cepstral_lifter=22, energy_floor=0, raw_energy=True, use_energy=True)[source]
- static filter_args(**kwargs)[source]
Filters MFCC args from arguments dictionary.
- Parameters
kwargs – Arguments dictionary.
- Returns
Dictionary with MFCC options.
- static add_class_args(parser, prefix=None)[source]
Adds MFCC options to parser.
- Parameters
parser – Arguments parser
prefix – Options prefix.
- static add_argparse_args(parser, prefix=None)
Adds MFCC options to parser.
- Parameters
parser – Arguments parser
prefix – Options prefix.
Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- hyperion.torch.layers.audio_feats._get_feature_window_function(window_type, window_size, blackman_coeff=0.42)[source]
Returns a window function with the given type and size
- hyperion.torch.layers.audio_feats._get_strided_batch(waveform, window_length, window_shift, snip_edges, center=False)[source]
Given a waveform (1D tensor of size
num_samples), it returns a 2D tensor (m,window_size) representing how the window is shifted along the waveform. Each row is a frame.- Parameters
waveform (torch.Tensor) – Tensor of size
num_sampleswindow_size (int) – Frame length
window_shift (int) – Frame shift
snip_edges (bool) – If True, end effects will be handled by outputting only frames that completely fit in the file, and the number of frames depends on the frame_length. If False, the number of frames depends only on the frame_shift, and we reflect the data at the ends.
center (bool) – If true, if puts the center of the frame at t*window_shift, starting at t=0, If overwrides snip_edges and set it to False
- Returns
3D tensor of size (m,
window_size) where each row is a frame- Return type
torch.Tensor
- hyperion.torch.layers.audio_feats._get_log_energy(x, energy_floor)[source]
Returns the log energy of size (m) for a strided_input (m,*)
- class hyperion.torch.layers.audio_feats.Wav2Win(*args: Any, **kwargs: Any)[source]
- class hyperion.torch.layers.audio_feats.Wav2FFT(*args: Any, **kwargs: Any)[source]
- __init__(fs=16000, frame_length=25, frame_shift=10, fft_length=512, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', dither=1, snip_edges=True, center=False, energy_floor=0, raw_energy=True, use_energy=True)[source]
- property fs
- property frame_length
- property frame_shift
- property remove_dc_offset
- property preemph_coeff
- property window_type
- property dither
- class hyperion.torch.layers.audio_feats.Wav2Spec(*args: Any, **kwargs: Any)[source]
- __init__(fs=16000, frame_length=25, frame_shift=10, fft_length=512, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', use_fft_mag=False, dither=1, snip_edges=True, center=False, energy_floor=0, raw_energy=True, use_energy=True)[source]
- property dither
- property frame_length
- property frame_shift
- property fs
- property preemph_coeff
- property remove_dc_offset
- property window_type
- class hyperion.torch.layers.audio_feats.Wav2LogSpec(*args: Any, **kwargs: Any)[source]
- __init__(fs=16000, frame_length=25, frame_shift=10, fft_length=512, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', use_fft_mag=False, dither=1, snip_edges=True, center=False, energy_floor=0, raw_energy=True, use_energy=True)[source]
- property dither
- property frame_length
- property frame_shift
- property fs
- property preemph_coeff
- property remove_dc_offset
- property window_type
- class hyperion.torch.layers.audio_feats.Wav2LogFilterBank(*args: Any, **kwargs: Any)[source]
- __init__(fs=16000, frame_length=25, frame_shift=10, fft_length=512, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', use_fft_mag=False, dither=1, fb_type='mel_kaldi', low_freq=20, high_freq=0, num_filters=23, norm_filters=False, snip_edges=True, center=False, energy_floor=0, raw_energy=True, use_energy=True)[source]
- property dither
- property frame_length
- property frame_shift
- property fs
- property preemph_coeff
- property remove_dc_offset
- property window_type
- class hyperion.torch.layers.audio_feats.Wav2MFCC(*args: Any, **kwargs: Any)[source]
- __init__(fs=16000, frame_length=25, frame_shift=10, fft_length=512, remove_dc_offset=True, preemph_coeff=0.97, window_type='povey', use_fft_mag=False, dither=1, fb_type='mel_kaldi', low_freq=20, high_freq=0, num_filters=23, norm_filters=False, num_ceps=13, snip_edges=True, center=False, cepstral_lifter=22, energy_floor=0, raw_energy=True, use_energy=True)[source]
- static make_lifter(N, Q)[source]
Makes the liftering function
- Parameters
N – Number of cepstral coefficients.
Q – Liftering parameter
- Returns
Liftering vector.
- property dither
- property frame_length
- property frame_shift
- property fs
- property preemph_coeff
- property remove_dc_offset
- property window_type
- class hyperion.torch.layers.audio_feats.Wav2KanBayashiLogFilterBank(*args: Any, **kwargs: Any)[source]
Class to replicate log-filter-banks used in Kan Bayashi’s ParallelWaveGAN repository: https://github.com/kan-bayashi/ParallelWaveGAN
- __init__(fs=16000, frame_length=64, frame_shift=16, fft_length=1024, remove_dc_offset=True, window_type='hanning', low_freq=80, high_freq=7600, num_filters=80, snip_edges=False, center=True)[source]
- property dither
- property frame_length
- property frame_shift
- property fs
- property preemph_coeff
- property remove_dc_offset
- property window_type
- class hyperion.torch.layers.audio_feats.Spec2LogFilterBank(fs=16000, fft_length=512, fb_type='mel_kaldi', low_freq=20, high_freq=0, num_filters=23, norm_filters=False)[source]
Feature Normalization Layers
Copyright 2020 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- class hyperion.torch.layers.mvn.MeanVarianceNorm(*args: Any, **kwargs: Any)[source]
-
- static filter_args(**kwargs)[source]
Filters ST-CMVN args from arguments dictionary.
- Parameters
kwargs – Arguments dictionary.
- Returns
Dictionary with ST-CMVN options.
- static add_class_args(parser, prefix=None)[source]
Adds ST-CMVN options to parser.
- Parameters
parser – Arguments parser
prefix – Options prefix.
- static add_argparse_args(parser, prefix=None)
Adds ST-CMVN options to parser.
- Parameters
parser – Arguments parser
prefix – Options prefix.
Feature Augmentation Layers
Copyright 2021 Johns Hopkins University (Author: Jesus Villalba, Nanxin Chen) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- class hyperion.torch.layers.spec_augment.AxisMasker(*args: Any, **kwargs: Any)[source]
Applies a mask to the spectrogram along time or freq dimension. Implementation based on espnet.
- mask_width_range
range for the width of the masks
- mask_num_range
range for the number of masks
- dim
axis where we apply the mask
- fill_value
masking value
- class hyperion.torch.layers.spec_augment.SpecWarper(*args: Any, **kwargs: Any)[source]
Warps the spectrogram along time or freq dimension. Implementation based on espnet.
- window
time warp parameter
- class hyperion.torch.layers.spec_augment.SpecAugment(*args: Any, **kwargs: Any)[source]
Implementation of SpecAugment.
- Reference:
Daniel S. Park et al. “SpecAugment: A Simple Data
Augmentation Method for Automatic Speech Recognition”
Attributes:
- __init__(time_warp_prob=0, time_warp_window=5, time_warp_mode='bicubic', time_mask_prob=0, time_mask_min_width=0, time_mask_max_width=100, time_mask_min_num_masks=1, time_mask_max_num_masks=2, freq_mask_prob=0, freq_mask_min_width=0, freq_mask_max_width=20, freq_mask_min_num_masks=1, freq_mask_max_num_masks=2, fill_value=0)[source]
Large Margin Losses Layers
These are output layers that are used to create large margin cross-entorpy losses.
Copyright 2019 Johns Hopkins University (Author: Jesus Villalba, Nanxin Chen) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
Prob Densitiy Function Layers
These are layers related to probability density functions used in VAEs
Copyright 2020 Johns Hopkins University (Author: Jesus Villalba, Nanxin Chen) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- class hyperion.torch.layers.pdf_storage.StdNormal(*args: Any, **kwargs: Any)[source]
Storage for Standard Normal distribution
- property pdf
Copyright 2020 Johns Hopkins University (Author: Jesus Villalba, Nanxin Chen) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- class hyperion.torch.layers.tensor2pdf.Tensor2PDF(*args: Any, **kwargs: Any)[source]
Base class for layers that create a prob distribution from an input tensor
- class hyperion.torch.layers.tensor2pdf.Tensor2NormalICov(*args: Any, **kwargs: Any)[source]
Transforms a Tensor into Normal distribution with identitiy variance
- class hyperion.torch.layers.tensor2pdf.Tensor2NormalGlobDiagCov(*args: Any, **kwargs: Any)[source]
Transforms a Tensor into Normal distribution
Input tensor will be the mean of the distribution and the standard deviation is a global trainable parameter.
- class hyperion.torch.layers.tensor2pdf.Tensor2NormalDiagCov(*args: Any, **kwargs: Any)[source]
Transforms a Tensor into Normal distribution
Applies two linear transformation to the tensors to obtain the mean and the log-variance.
- class hyperion.torch.layers.tensor2pdf.Tensor2BayNormalICovGivenNormalPrior(*args: Any, **kwargs: Any)[source]
Transforms a Tensor into Normal distribution with identitiy variance
Uses Bayesian interpolation between Gaussian prior and Maximum Likelihood estimation
- class hyperion.torch.layers.tensor2pdf.Tensor2BayNormalGlobDiagCovGivenNormalPrior(*args: Any, **kwargs: Any)[source]
Transforms a Tensor into Normal distribution
Input tensor will be the ML mean of the distribution and the ML standard deviation is a global trainable parameter.
Uses Bayesian interpolation between Gaussian prior and Maximum Likelihood estimation
- class hyperion.torch.layers.tensor2pdf.Tensor2BayNormalDiagCovGivenNormalPrior(*args: Any, **kwargs: Any)[source]
Transforms a Tensor into Normal distribution
Applies two linear transformation to the tensors to obtain the maximum likelihood mean and the log-variance.
Uses Bayesian interpolation between Gaussian prior and Maximum Likelihood estimation
Vector Quantization Layers
These are vector quantization layers like the ones used in VQ-VAEs
Copyright 2020 Johns Hopkins University (Author: Jesus Villalba, Nanxin Chen) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- class hyperion.torch.layers.vq.KMeansVectorQuantizer(*args: Any, **kwargs: Any)[source]
- class hyperion.torch.layers.vq.MultiKMeansVectorQuantizer(*args: Any, **kwargs: Any)[source]
- __init__(num_groups, num_embed, embed_feats, commitment_cost=0.25, project=True, in_feats=None, in_dim=None)[source]
- property commitment_cost
- class hyperion.torch.layers.vq.EMAKMeansVectorQuantizer(*args: Any, **kwargs: Any)[source]
Upsampling Layers
These include layers related to upsampling operations.
Copyright 2021 Johns Hopkins University (Author: Jesus Villalba, Nanxin Chen) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
Copyright 2020 Johns Hopkins University (Author: Jesus Villalba, Nanxin Chen) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- class hyperion.torch.layers.subpixel_convs.SubPixelConv1d(*args: Any, **kwargs: Any)[source]
- class hyperion.torch.layers.subpixel_convs.SubPixelConv2d(*args: Any, **kwargs: Any)[source]
- hyperion.torch.layers.subpixel_convs.ICNR2d(tensor, stride=2, initializer=torch.nn.init.kaiming_normal)[source]
Initialization method “Initialization to Convolution Nearest neighbours Resize (ICNR)” for subpixel convolutions described in described in “Andrew Aitken et al. (2017) Checkerboard artifact free sub-pixel convolution”
- Parameters
tensor – torch.Tensor containing the conv weights
stride – subpixel conv stride
initializer – initizializer to be used for sub_kernel inizialization
Examples
>>> conv = SubPixelConv2d(in_channels, out_channels, kernel_size=3, stride=upscale) >>> ICNR2d(conv_shuffle.weight, stride=upscale)
- hyperion.torch.layers.subpixel_convs.ICNR1d(tensor, stride=2, initializer=torch.nn.init.kaiming_normal)[source]
1d version of the initialization method “Initialization to Convolution Nearest neighbours Resize (ICNR)” for subpixel convolutions described in described in “Andrew Aitken et al. (2017) Checkerboard artifact free sub-pixel convolution”
- Parameters
tensor – torch.Tensor containing the conv weights
stride – subpixel conv stride
initializer – initizializer to be used for sub_kernel inizialization
Examples
>>> conv = SubPixelConv1d(in_channels, out_channels, kernel_size=3, stride=upscale) >>> ICNR1d(conv_shuffle.weight, stride=upscale)
Positional Encoders
These include layers that implement positional encoders used in transformers.
Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- class hyperion.torch.layers.pos_encoder.PosEncoder(*args: Any, **kwargs: Any)[source]
Positional encoding.
- num_feats
embedding dim
- dropout_rate
dropout rate
- class hyperion.torch.layers.pos_encoder.RelPosEncoder(*args: Any, **kwargs: Any)[source]
- Relative Positional encoding as defined in
https://arxiv.org/pdf/1901.02860.pdf
It returns the input and the positional encoder separtely so they are mixed in the attention block later.
- num_feats
embedding dim
- dropout_rate
dropout rate
- forward(x)[source]
Add positional encoding.
- Parameters
x – Input with shape=(batch, time, C)
- Returns
x-scaled, pos-encoding
- _pe(x, relative=False)
Reset the positional encodings.
Calibration
These are layers that are used to simulate the calibration block after the speaker recognition back-end
Copyright 2020 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
Layer Blocks
These are Torch modules that combine several layers. These are the building blocks used to create more complex architectures like ResNets, Transformers of EfficientNets.
Fully Connected Blocks
These are fully connected blocks used to create simple feed forward networks, classification heads, etc.
Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- class hyperion.torch.layer_blocks.fc_blocks.FCBlock(*args: Any, **kwargs: Any)[source]
Fully connected block
- in_feats
input feature dimension
- out_feats
output feature dimension
- activatoin
str/dict indicating the type of activation function
- norm_layer
normalization layer constructor, if None it uses batch-norm
- use_norm
if True, it applies the normalization layer, if False no normalization is applied
- norm_before
if True normalization layer is applied before the activation function, if False after
Deep Convolutional Blocks
Deep Convolutional 1d Blocks
These are blocks to create deep convolutional networks 1d without residuals.
Copyright 2020 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- class hyperion.torch.layer_blocks.dc1d_blocks.DC1dEncBlock(*args: Any, **kwargs: Any)[source]
Deep Convolutional 2d Blocks
These are blocks to create deep convolutional networks 2d without residuals.
Copyright 2020 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- class hyperion.torch.layer_blocks.dc2d_blocks.DC2dEncBlock(*args: Any, **kwargs: Any)[source]
TDNN Blocks
TDNN Blocks
TDNN blocks used to create TDNN x-vectors
Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
Extended TDNN Blocks
Extended TDNN blocks used to create E-TDNN x-vectors
Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
Residual Extended TDNN Blocks
Extended TDNN blocks with residual connections
Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
Squeeze-Excitation Blocks
Squeeze-Excitation Blocks 1d and 2d, which are added at the output ResNet blocks and other to create squeeze-excitation networks.
Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- class hyperion.torch.layer_blocks.se_blocks.TSEBlock2D(*args: Any, **kwargs: Any)[source]
From https://arxiv.org/abs/1709.01507 Modified to do pooling only in time dimension
- class hyperion.torch.layer_blocks.se_blocks.SEBlock1d(*args: Any, **kwargs: Any)[source]
1d Squeeze Excitation version of https://arxiv.org/abs/1709.01507
- hyperion.torch.layer_blocks.se_blocks.SEBlock2d
- hyperion.torch.layer_blocks.se_blocks.TSEBlock2d
Cannonical ResNet Blocks
These are blocks used to create cannonical ResNet, SE-ResNet, Res2Nets, etc.
ResNet Blocks
These blocks are used to create cannonical ResNets.
Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- hyperion.torch.layer_blocks.resnet_blocks._conv3x3(in_channels, out_channels, stride=1, groups=1, dilation=1, bias=False)[source]
3x3 convolution with padding
- hyperion.torch.layer_blocks.resnet_blocks._conv1x1(in_channels, out_channels, stride=1, bias=False)[source]
1x1 convolution
- class hyperion.torch.layer_blocks.resnet_blocks.ResNetInputBlock(*args: Any, **kwargs: Any)[source]
Input block for ResNet architecture
- Parameters
in_channels – input channels
out_channels – output channels
kernel_size – kernel size for conv
stride – stride for conv
activation – str/dict indicationg activation type and arguments
norm_layer – norm_layer object constructor, if None it uses BatchNorm2d
norm_before – if True it applies the norm_layer before the activation, if False, after the activation
do_maxpool – apply maxpooling 2x2 at the output
- class hyperion.torch.layer_blocks.resnet_blocks.ResNetBasicBlock(*args: Any, **kwargs: Any)[source]
- expansion = 1
- __init__(in_channels, channels, activation={'inplace': True, 'name': 'relu'}, stride=1, dropout_rate=0, groups=1, dilation=1, norm_layer=None, norm_before=True)[source]
- property out_channels
- class hyperion.torch.layer_blocks.resnet_blocks.ResNetBNBlock(*args: Any, **kwargs: Any)[source]
- expansion = 4
- __init__(in_channels, channels, activation={'inplace': True, 'name': 'relu'}, stride=1, dropout_rate=0, groups=1, dilation=1, norm_layer=None, norm_before=True)[source]
- property out_channels
SE-ResNet Blocks
These blocks are used to create cannonical Squeeze-Excitation ResNets
Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- class hyperion.torch.layer_blocks.seresnet_blocks.SEResNetBasicBlock(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, channels, activation={'inplace': True, 'name': 'relu'}, stride=1, dropout_rate=0, groups=1, dilation=1, norm_layer=None, norm_before=True, se_r=16, time_se=False, num_feats=None)[source]
- expansion = 1
- property out_channels
- class hyperion.torch.layer_blocks.seresnet_blocks.SEResNetBNBlock(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, channels, activation={'inplace': True, 'name': 'relu'}, stride=1, dropout_rate=0, groups=1, dilation=1, norm_layer=None, norm_before=True, se_r=16, time_se=False, num_feats=None)[source]
- expansion = 4
- property out_channels
SE-ResNet Blocks
These blocks are used to create cannonical Squeeze-Excitation ResNets.
Copyright 2020 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- hyperion.torch.layer_blocks.res2net_blocks._conv3x3(in_channels, out_channels, stride=1, groups=1, dilation=1, bias=False)[source]
3x3 convolution with padding
- hyperion.torch.layer_blocks.res2net_blocks._conv1x1(in_channels, out_channels, stride=1, bias=False)[source]
1x1 convolution
- class hyperion.torch.layer_blocks.res2net_blocks.Res2NetBasicBlock(*args: Any, **kwargs: Any)[source]
- expansion = 1
- __init__(in_channels, channels, activation={'inplace': True, 'name': 'relu'}, stride=1, dropout_rate=0, width_factor=1, scale=4, groups=1, dilation=1, norm_layer=None, norm_before=True, se_r=None, time_se=False, num_feats=None)[source]
- property out_channels
- class hyperion.torch.layer_blocks.res2net_blocks.Res2NetBNBlock(*args: Any, **kwargs: Any)[source]
- expansion = 4
- __init__(in_channels, channels, activation={'inplace': True, 'name': 'relu'}, stride=1, dropout_rate=0, width_factor=1, scale=4, groups=1, dilation=1, norm_layer=None, norm_before=True, se_r=None, time_se=False, num_feats=None)[source]
- property out_channels
SpineNet Blocks
These are some extra blocks needed to build SpineNet and Spine2Net.
Copyright 2020 Magdalena Rybicka Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- hyperion.torch.layer_blocks.spine_blocks._conv3x3(in_channels, out_channels, stride=1, groups=1, dilation=1, bias=False)[source]
3x3 convolution with padding
- hyperion.torch.layer_blocks.spine_blocks._conv1x1(in_channels, out_channels, stride=1, bias=False)[source]
1x1 convolution
- hyperion.torch.layer_blocks.spine_blocks._subpixel_conv1x1(in_channels, out_channels, stride=1, bias=False)[source]
point-wise subpixel convolution
- class hyperion.torch.layer_blocks.spine_blocks.SpineConv(*args: Any, **kwargs: Any)[source]
- class hyperion.torch.layer_blocks.spine_blocks.BlockSpec(level, block_fn, input_offsets, is_output)[source]
A container class that specifies the block configuration for SpineNet.
- class hyperion.torch.layer_blocks.spine_blocks.SpineEndpoints(*args: Any, **kwargs: Any)[source]
- class hyperion.torch.layer_blocks.spine_blocks.SpineResample(*args: Any, **kwargs: Any)[source]
MobileNet Blocks
These are blocks needed to build EfficientNet networks.
Copyright 2020 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- hyperion.torch.layer_blocks.mbconv_blocks._conv1x1(in_channels, out_channels, stride=1, bias=False)[source]
1x1 convolution
- hyperion.torch.layer_blocks.mbconv_blocks._dwconvkxk(channels, kernel_size=3, stride=1, bias=False)[source]
kxk depth-wise convolution with padding
Generic ResNet Blocks
ResNet 1d Blocks
These are blocks used to buld flexible ResNets based on 1d convs.
Copyright 2020 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- hyperion.torch.layer_blocks.resnet1d_blocks._convk(in_channels, out_channels, kernel_size=3, stride=1, groups=1, dilation=1, bias=False)[source]
kernel k convolution with padding
- hyperion.torch.layer_blocks.resnet1d_blocks._conv1(in_channels, out_channels, stride=1, bias=False)[source]
point-wise convolution
- hyperion.torch.layer_blocks.resnet1d_blocks._subpixel_conv1(in_channels, out_channels, stride=1, bias=False)[source]
point-wise subpixel convolution
- hyperion.torch.layer_blocks.resnet1d_blocks._subpixel_convk(in_channels, out_channels, kernel_size=3, stride=1, groups=1, dilation=1, bias=False)[source]
kernel k subpixel convolution with padding
- class hyperion.torch.layer_blocks.resnet1d_blocks.ResNet1dBasicBlock(*args: Any, **kwargs: Any)[source]
- expansion = 1
- __init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, drop_connect_rate=0, groups=1, dilation=1, use_norm=True, norm_layer=None, norm_before=True)[source]
- property out_channels
- class hyperion.torch.layer_blocks.resnet1d_blocks.ResNet1dBasicDecBlock(*args: Any, **kwargs: Any)[source]
- expansion = 1
- __init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, drop_connect_rate=0, groups=1, dilation=1, use_norm=True, norm_layer=None, norm_before=True)[source]
- property out_channels
- class hyperion.torch.layer_blocks.resnet1d_blocks.ResNet1dBNBlock(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, drop_connect_rate=0, groups=1, dilation=1, expansion=4, use_norm=True, norm_layer=None, norm_before=True)[source]
- property out_channels
- class hyperion.torch.layer_blocks.resnet1d_blocks.ResNet1dBNDecBlock(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, drop_connect_rate=0, groups=1, dilation=1, expansion=4, use_norm=True, norm_layer=None, norm_before=True)[source]
- property out_channels
- class hyperion.torch.layer_blocks.resnet1d_blocks.SEResNet1dBasicBlock(*args: Any, **kwargs: Any)[source]
- expansion = 1
- __init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, drop_connect_rate=0, groups=1, dilation=1, se_r=16, use_norm=True, norm_layer=None, norm_before=True)[source]
- property out_channels
- class hyperion.torch.layer_blocks.resnet1d_blocks.SEResNet1dBasicDecBlock(*args: Any, **kwargs: Any)[source]
- expansion = 1
- __init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, drop_connect_rate=0, groups=1, dilation=1, se_r=16, use_norm=True, norm_layer=None, norm_before=True)[source]
- property out_channels
- class hyperion.torch.layer_blocks.resnet1d_blocks.SEResNet1dBNBlock(*args: Any, **kwargs: Any)[source]
- property out_channels
- class hyperion.torch.layer_blocks.resnet1d_blocks.SEResNet1dBNDecBlock(*args: Any, **kwargs: Any)[source]
- property out_channels
- class hyperion.torch.layer_blocks.resnet1d_blocks.ResNet1dEndpoint(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, channels, in_scale, scale, upsampling_mode='nearest', activation={'inplace': True, 'name': 'relu6'}, norm_layer=None, norm_before=True)[source]
Class that connects the ouputs of the ResNet1d to the rest of the network when using multilevel feature aggregation
It converts the features of all the levels that we are going to aggregate to the same temporal scale
Res2Net 1d Blocks
These are blocks used to buld flexible Res2Nets based on 1d convs.
Copyright 2020 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- hyperion.torch.layer_blocks.res2net1d_blocks._convk(in_channels, out_channels, kernel_size=3, stride=1, groups=1, dilation=1, bias=False)[source]
kernel k convolution with padding
- hyperion.torch.layer_blocks.res2net1d_blocks._conv1(in_channels, out_channels, stride=1, bias=False)[source]
point-wise convolution
- class hyperion.torch.layer_blocks.res2net1d_blocks.Res2Net1dBasicBlock(*args: Any, **kwargs: Any)[source]
- expansion = 1
- __init__(in_channels, channels, kernel_size=3, activation={'inplace': True, 'name': 'relu6'}, stride=1, dropout_rate=0, drop_connect_rate=0, width_factor=1, scale=4, groups=1, dilation=1, use_norm=True, norm_layer=None, norm_before=True, se_r=None)[source]
- property out_channels
- class hyperion.torch.layer_blocks.res2net1d_blocks.Res2Net1dBNBlock(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, channels, kernel_size=3, activation={'inplace': True, 'name': 'relu6'}, stride=1, dropout_rate=0, drop_connect_rate=0, width_factor=1, scale=4, groups=1, dilation=1, use_norm=True, norm_layer=None, norm_before=True, se_r=None, num_feats=None)[source]
- property out_channels
- property expansion
ResNet 2d Blocks
These are blocks used to buld flexible ResNets based on 2d convs.
Copyright 2020 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- hyperion.torch.layer_blocks.resnet2d_blocks._convkxk(in_channels, out_channels, kernel_size=3, stride=1, groups=1, dilation=1, bias=False)[source]
kernel k convolution with padding
- hyperion.torch.layer_blocks.resnet2d_blocks._conv1x1(in_channels, out_channels, stride=1, bias=False)[source]
point-wise convolution
- hyperion.torch.layer_blocks.resnet2d_blocks._subpixel_conv1x1(in_channels, out_channels, stride=1, bias=False)[source]
point-wise subpixel convolution
- hyperion.torch.layer_blocks.resnet2d_blocks._subpixel_convkxk(in_channels, out_channels, kernel_size=3, stride=1, groups=1, dilation=1, bias=False)[source]
kernel k subpixel convolution with padding
- class hyperion.torch.layer_blocks.resnet2d_blocks.ResNet2dBasicBlock(*args: Any, **kwargs: Any)[source]
- expansion = 1
- __init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, groups=1, dilation=1, use_norm=True, norm_layer=None, norm_before=True)[source]
- property out_channels
- class hyperion.torch.layer_blocks.resnet2d_blocks.ResNet2dBasicDecBlock(*args: Any, **kwargs: Any)[source]
- expansion = 1
- __init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, groups=1, dilation=1, use_norm=True, norm_layer=None, norm_before=True)[source]
- property out_channels
- class hyperion.torch.layer_blocks.resnet2d_blocks.ResNet2dBNBlock(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, groups=1, dilation=1, expansion=4, use_norm=True, norm_layer=None, norm_before=True)[source]
- property out_channels
- class hyperion.torch.layer_blocks.resnet2d_blocks.ResNet2dBNDecBlock(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, groups=1, dilation=1, expansion=4, use_norm=True, norm_layer=None, norm_before=True)[source]
- property out_channels
- class hyperion.torch.layer_blocks.resnet2d_blocks.SEResNet2dBasicBlock(*args: Any, **kwargs: Any)[source]
- expansion = 1
- __init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, groups=1, dilation=1, se_r=16, use_norm=True, norm_layer=None, norm_before=True)[source]
- property out_channels
- class hyperion.torch.layer_blocks.resnet2d_blocks.SEResNet2dBasicDecBlock(*args: Any, **kwargs: Any)[source]
- expansion = 1
- __init__(in_channels, channels, kernel_size=3, activation='relu6', stride=1, dropout_rate=0, groups=1, dilation=1, se_r=16, use_norm=True, norm_layer=None, norm_before=True)[source]
- property out_channels
- class hyperion.torch.layer_blocks.resnet2d_blocks.SEResNet2dBNBlock(*args: Any, **kwargs: Any)[source]
- property out_channels
- class hyperion.torch.layer_blocks.resnet2d_blocks.SEResNet2dBNDecBlock(*args: Any, **kwargs: Any)[source]
- property out_channels
Res2Net 2d Blocks
These are blocks used to buld flexible Res2Nets based on 2d convs.
Copyright 2020 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- hyperion.torch.layer_blocks.res2net2d_blocks._convkxk(in_channels, out_channels, kernel_size=3, stride=1, groups=1, dilation=1, bias=False)[source]
kernel k convolution with padding
- hyperion.torch.layer_blocks.res2net2d_blocks._conv1x1(in_channels, out_channels, stride=1, bias=False)[source]
1x1 convolution
- class hyperion.torch.layer_blocks.res2net2d_blocks.Res2Net2dBasicBlock(*args: Any, **kwargs: Any)[source]
- expansion = 1
- __init__(in_channels, channels, kernel_size=3, activation={'inplace': True, 'name': 'relu6'}, stride=1, dropout_rate=0, width_factor=1, scale=4, groups=1, dilation=1, use_norm=True, norm_layer=None, norm_before=True, se_r=None, time_se=False, num_feats=None)[source]
- property out_channels
- class hyperion.torch.layer_blocks.res2net2d_blocks.Res2Net2dBNBlock(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, channels, kernel_size=3, activation={'inplace': True, 'name': 'relu6'}, stride=1, dropout_rate=0, width_factor=1, scale=4, groups=1, dilation=1, use_norm=True, norm_layer=None, norm_before=True, se_r=None, time_se=False, num_feats=None)[source]
- property out_channels
- property expansion
Transformer Blocks
These are blocks used to build Transformers.
Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- class hyperion.torch.layer_blocks.transformer_conv2d_subsampler.TransformerConv2dSubsampler(*args: Any, **kwargs: Any)[source]
Convolutional 2D subsampling (to 1/4 length) Tor transformer
- in_feats
input feature dimension
- out_feats
Transformer d_model
- hid_act
activation layer object
- pos_enc
positional encoder layer
- time_dim
indicates which is the time dimension in the input tensor
Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- class hyperion.torch.layer_blocks.transformer_encoder_v1.TransformerEncoderBlockV1(*args: Any, **kwargs: Any)[source]
Building block for transformer encoder.
- num_feats
input/output feat. dimension (aka d_model)
- self_attn
attention nn.Module or string in [‘scaled-dot-prod-att-v1’, ‘local-scaled-dot-prod-att-v1’]
- num_heads
number of heads
- feed_forward
position-wise feed-forward nn.Module or string in [‘linear’, ‘conv1dx2’, ‘conv1d-linear’]
- d_ff
dimension of middle layer in feed_forward block
- ff_kernel_size
kernel size for convolutional versions of ff block
- ff_act
ff block hidden activation
- ff_dropout_rate
dropout rate for ff block
- att_context
maximum context range for local attention
- att_dropout_rate
dropout rate for attention block
- rel_pos_enc
if True, use relative postional encodings, absolute encodings otherwise.
- causal_pos_enc
if True, use causal positional encodings (when rel_pos_enc=True), it assumes that query q_i only attents to key k_j when j<=i
- norm_before
if True, use layer norm before layers, otherwise after
- concat_after
- if True, if concats attention input and output and apply linear transform, i.e.,
y = x + linear(concat(x, att(x)))
if False, y = x + att(x)
- __init__(num_feats, self_attn, num_heads, feed_forward, d_ff, ff_kernel_size, ff_act='relu6', ff_dropout_rate=0, att_context=25, att_dropout_rate=0, rel_pos_enc=False, causal_pos_enc=False, norm_before=True, concat_after=False)[source]
- static _make_att(att_type, num_feats, num_heads, context, dropout_rate, rel_pos_enc, causal_pos_enc)[source]
Creates multihead attention block from att_type string
- Parameters
att_type – string in [‘scaled-dot-prod-att-v1’, ‘local-scaled-dot-prod-att-v1’]
num_feats – input/output feat. dimension (aka d_model)
num_heads – number of heads
dropout_rate – dropout rate for attention block
rel_pos_enc – if True, use relative postional encodings, absolute encodings otherwise.
causal_pos_enc – if True, use causal positional encodings (when rel_pos_enc=True), it assumes that query q_i only attents to key k_j when j<=i
- Returns
Attention nn.Module
- static _make_ff(ff_type, num_feats, hid_feats, kernel_size, activation, dropout_rate)[source]
Creates position-wise feed forward block from ff_type string
- Parameters
ff_type – string in [‘linear’, ‘conv1dx2’, ‘conv1d-linear’]
num_feats – input/output feat. dimension (aka d_model)
hid_feats – dimension of middle layer in feed_forward block
kernel_size – kernel size for convolutional versions of ff block
dropout_rate – dropout rate for ff block
activation – activation function for ff block
- Returns
Position-wise feed-forward nn.Module
- forward(x, pos_emb=None, mask=None)[source]
Forward pass function
- Parameters
x – input tensor with size=(batch, time, num_feats)
pos_emb – positional embedding size=(batch, time2, in_feats) as R_{L-1}, …, R_0, when using relative postional encoder, otherwise None
mask – mask to indicate valid time steps for x (batch, time)
- Returns
Tensor with output features Tensor with mask
Copyright 2019 Johns Hopkins University (Author: Jesus Villalba, Nanxin Chen) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- class hyperion.torch.layer_blocks.transformer_feedforward.PositionwiseFeedForward(*args: Any, **kwargs: Any)[source]
Positionwise feed forward layer for transfomer.
- num_feats
input/output dimenstion
- hid_feats
number of hidden units
- activation
activation function for hidden layers
- dropout_rate
dropout rate
- time_dim
time dimension in the input tensor
- class hyperion.torch.layer_blocks.transformer_feedforward.Conv1dx2(*args: Any, **kwargs: Any)[source]
Two layer Conv1d for transformer feed-forward block
Introduced in FastSpeech: Fast, Robust and Controllable Text to Speech. .. FastSpeech: Fast, Robust and Controllable Text to Speech:
- num_channels
input/output channels.
- hid_channels
hidden channels
- kernel_size
conv kernel size
- activation
activation function for hidden layers
- dropout_rate
dropout rate
- time_dim
indicates what is the time dimension in the input tensor.
- class hyperion.torch.layer_blocks.transformer_feedforward.Conv1dLinear(*args: Any, **kwargs: Any)[source]
Conv1D + Linear for Transformer block.
- num_channels
input/output channels.
- hid_channels
hidden channels
- kernel_size
conv kernel size
- activation
activation function for hidden layers
- dropout_rate
dropout rate
- time_dim
indicates what is the time dimension in the input tensor.
Conformer Blocks
Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- class hyperion.torch.layer_blocks.conformer_encoder_v1.ConformerEncoderBlockV1(*args: Any, **kwargs: Any)[source]
- Building block for conformer encoder introduced in
https://arxiv.org/pdf/2005.08100.pdf
This includes some optional extra features not included in the original paper:
Choose local-attention (attending only to close frames instead of all the frames in the sequence)
Choose number of conv blocks
Squeeze-Excitation after depthwise-conv
Allows downsampling in time dimension
Allows choosing activation and layer normalization type
We call this Conformer+
- num_feats
input/output feat. dimension (aka d_model)
- self_attn
attention module in [‘scaled-dot-prod-att-v1’, ‘local-scaled-dot-prod-att-v1’]
- num_heads
number of heads
- conv_repeats
number of conv blocks
- conv_kernel_size
kernel size for conv blocks
- conv_stride
stride for depth-wise conv in first conv block
- feed_forward
position-wise feed-forward string in [‘linear’, ‘conv1dx2’, ‘conv1d-linear’]
- d_ff
dimension of middle layer in feed_forward block
- ff_kernel_size
kernel size for convolutional versions of ff block
- hid_act
ff and conv block hidden activation
- dropout_rate
dropout rate for ff and conv blocks
- att_context
maximum context range for local attention
- att_dropout_rate
dropout rate for attention block
- causal_pos_enc
if True, use causal positional encodings (when rel_pos_enc=True), it assumes that query q_i only attents to key k_j when j<=i
- conv_norm_layer
norm layer constructor for conv block, if None it uses BatchNorm
- se_r
Squeeze-Excitation compression ratio, if None it doesn’t use Squeeze-Excitation
- ff_macaron
if True, it uses macaron-net style ff layers, otherwise transformer style.
- out_lnorm
if True, use LNorm layer at the output as in the conformer paper, we think that this layer is redundant and put it to False by default
- concat_after
- if True, if concats attention input and output and apply linear transform, i.e.,
y = x + linear(concat(x, att(x)))
if False, y = x + att(x)
- __init__(num_feats, self_attn, num_heads, conv_repeats=1, conv_kernel_size=31, conv_stride=1, feed_forward='linear', d_ff=2048, ff_kernel_size=3, hid_act='swish', dropout_rate=0, att_context=25, att_dropout_rate=0, pos_enc_type='rel', causal_pos_enc=False, conv_norm_layer=None, se_r=None, ff_macaron=True, out_lnorm=False, concat_after=False)[source]
- static _make_att(att_type, num_feats, num_heads, context, dropout_rate, pos_enc_type, causal_pos_enc)[source]
Creates multihead attention block from att_type string
- Parameters
att_type – string in [‘scaled-dot-prod-att-v1’, ‘local-scaled-dot-prod-att-v1’]
num_feats – input/output feat. dimension (aka d_model)
num_heads – number of heads
dropout_rate – dropout rate for attention block
pos_enc_type – type of positional encoder
causal_pos_enc – if True, use causal positional encodings (when rel_pos_enc=True), it assumes that query q_i only attents to key k_j when j<=i
- Returns
Attention nn.Module
- static _make_ff(ff_type, num_feats, hid_feats, kernel_size, activation, dropout_rate)[source]
Creates position-wise feed forward block from ff_type string
- Parameters
ff_type – string in [‘linear’, ‘conv1dx2’, ‘conv1d-linear’]
num_feats – input/output feat. dimension (aka d_model)
hid_feats – dimension of middle layer in feed_forward block
kernel_size – kernel size for convolutional versions of ff block
dropout_rate – dropout rate for ff block
activation – activation function for ff block
- Returns
Position-wise feed-forward nn.Module
- forward(x, pos_emb=None, mask=None)[source]
Forward pass function
- Parameters
x – input tensor with size=(batch, time, num_feats)
pos_emb – positional embedding size=(batch, time2, in_feats) as R_{L-1}, …, R_0, when using relative postional encoder, otherwise None
mask – mask to indicate valid time steps for x (batch, time)
- Returns
Tensor with output features Tensor with mask
Copyright 2020 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- hyperion.torch.layer_blocks.conformer_conv._conv1(in_channels, out_channels, bias=False)[source]
1x1 convolution
- hyperion.torch.layer_blocks.conformer_conv._dwconvk(channels, kernel_size, stride=1, bias=False)[source]
kxk depth-wise convolution with padding
- class hyperion.torch.layer_blocks.conformer_conv.ConformerConvBlock(*args: Any, **kwargs: Any)[source]
- Convolutional block for conformer introduced at
https://arxiv.org/pdf/2005.08100.pdf
This includes some optional extra features not included in the original paper:
Squeeze-Excitation after depthwise-conv
Allows downsampling in time dimension
Allows choosing activation and layer normalization type
- num_channels
number of input/output channels
- kernel_size
kernel_size for depth-wise conv
- stride
stride for depth-wise conv
- activation
activation function str or object
- norm_layer
norm layer constructor, if None it uses BatchNorm
- dropout_rate
dropout rate
- se_r
Squeeze-Excitation compression ratio, if None it doesn’t use Squeeze-Excitation
Torch Models and Model Loader
All PyTorch ML Neural Architectures and Models in Hyperion derive from the same base class
- class hyperion.torch.TorchModel(*args: Any, **kwargs: Any)[source]
-
- property device
- __init__(*args: Any, **kwargs: Any) None
The TorchModelLoader can load any model or network architecture from file.
Neural Architectures
All neural architectures derive from the NetArch class.
- class hyperion.torch.narchs.net_arch.NetArch(*args: Any, **kwargs: Any)[source]
-
- __init__(*args: Any, **kwargs: Any) None
- copy()
- property device
- freeze()
- get_config()
- get_loss()
- get_reg_loss()
- classmethod load(file_path=None, cfg=None, state_dict=None)
- save(file_path)
- unfreeze()
The TorchNALoader can load any network architecture from file.
Acoustic Features
- class hyperion.torch.narchs.audio_feats_mvn.AudioFeatsMVN(*args: Any, **kwargs: Any)[source]
Acoustic Feature Extractor + ST-MVN Optional SpecAugment
- property fs
- property frame_length
- property frame_shift
- copy()
- property device
- freeze()
- get_loss()
- get_reg_loss()
- in_context()
- in_dim()
- in_shape()
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
- save(file_path)
- unfreeze()
Fully Connected Network
Classification Head
- class hyperion.torch.narchs.classif_head.ClassifHead(*args: Any, **kwargs: Any)[source]
Classification Head for x-vector style networks
- in_feats
input features
- num_classes
number of output classes
- embed_dim
dimension of embedding layer
- num_embed_layers
number of hidden layers
- hid_act
str or dict hidden activation type in [‘relu’, ‘relu6’, ‘swish’, … ]
- loss_type
type of loss function that will be used with the x-vector in [‘softmax’, ‘cos-softmax’, ‘arc-softmax’], corresponding to standard cross-entorpy, additive margin softmax or additive angular margin softmax.
- s
scale parameter for cos-softmax and arc-softmax
- margin
margin parameter for cos-softmax and arc-softmax
- margin_warmup_epochs
number of epochs to anneal the margin from 0 to margin
- num_subcenters
number of subcenters in subcenter losses
- norm_layer
norm_layer object or str indicating type norm layer, if None it uses BatchNorm1d
- use_norm
it True it uses layer/batch-normalization
- norm_before
if True, layer-norm is before the activation function
- __init__(in_feats, num_classes, embed_dim=256, num_embed_layers=1, hid_act={'inplace': True, 'name': 'relu'}, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=0, num_subcenters=2, norm_layer=None, use_norm=True, norm_before=True, dropout_rate=0)[source]
- rebuild_output_layer(num_classes, loss_type, s, margin, margin_warmup_epochs, num_subcenters=2)[source]
- copy()
- property device
- freeze()
- get_loss()
- get_reg_loss()
- in_context()
- in_dim()
- in_shape()
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
- save(file_path)
- unfreeze()
- static add_argparse_args(parser, prefix=None)
Deep Convolutional Encoder/Decoders
These are Encoder/Decoders based on Deep Convolutional Networks 1d and 2d.
DC Encoder 1d
- class hyperion.torch.narchs.dc1d_encoder.DC1dEncoder(*args: Any, **kwargs: Any)[source]
- __init__(in_feats, in_conv_channels=128, in_kernel_size=3, in_stride=1, conv_repeats=[1, 1, 1], conv_channels=[128, 64, 32], conv_kernel_sizes=3, conv_strides=2, conv_dilations=1, head_channels=0, hid_act='relu6', head_act=None, dropout_rate=0, use_norm=True, norm_layer=None, norm_before=True)[source]
- static add_argparse_args(parser, prefix=None, head_channels=False, in_feats=False)
- copy()
- property device
- freeze()
- get_loss()
- get_reg_loss()
- in_dim()
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- save(file_path)
- unfreeze()
DC Decoder 1d
- class hyperion.torch.narchs.dc1d_decoder.DC1dDecoder(*args: Any, **kwargs: Any)[source]
- __init__(in_channels=32, in_conv_channels=32, in_kernel_size=3, in_stride=1, conv_repeats=[1, 1, 1], conv_channels=[64, 128, 128], conv_kernel_sizes=3, conv_strides=2, conv_dilations=1, head_channels=0, hid_act='relu6', head_act=None, dropout_rate=0, use_norm=True, norm_layer=None, norm_before=True)[source]
- static add_argparse_args(parser, prefix=None, head_channels=False)
- copy()
- property device
- freeze()
- get_loss()
- get_reg_loss()
- in_dim()
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- save(file_path)
- unfreeze()
DC Encoder 2d
- class hyperion.torch.narchs.dc2d_encoder.DC2dEncoder(*args: Any, **kwargs: Any)[source]
- __init__(in_channels=1, in_conv_channels=128, in_kernel_size=3, in_stride=1, conv_repeats=[1, 1, 1], conv_channels=[128, 64, 32], conv_kernel_sizes=3, conv_strides=2, conv_dilations=1, head_channels=0, hid_act='relu6', head_act=None, dropout_rate=0, use_norm=True, norm_layer=None, norm_before=True)[source]
- static add_argparse_args(parser, prefix=None, head_channels=False)
- copy()
- property device
- freeze()
- get_loss()
- get_reg_loss()
- in_dim()
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- save(file_path)
- unfreeze()
DC Decoder 2d
- class hyperion.torch.narchs.dc2d_decoder.DC2dDecoder(*args: Any, **kwargs: Any)[source]
- __init__(in_channels=32, in_conv_channels=32, in_kernel_size=3, in_stride=1, conv_repeats=[1, 1, 1], conv_channels=[64, 128, 128], conv_kernel_sizes=3, conv_strides=2, conv_dilations=1, head_channels=0, hid_act='relu6', head_act=None, dropout_rate=0, use_norm=True, norm_layer=None, norm_before=True)[source]
- static add_argparse_args(parser, prefix=None, head_channels=False)
- copy()
- property device
- freeze()
- get_loss()
- get_reg_loss()
- in_dim()
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- save(file_path)
- unfreeze()
TDNN Variants
These are variants of TDNNs. There is a factory class that creates TDNN networks from config params.
- class hyperion.torch.narchs.tdnn_factory.TDNNFactory[source]
- static create(tdnn_type, num_enc_blocks, in_feats, enc_hid_units, enc_expand_units=None, kernel_size=3, dilation=1, dilation_factor=1, hid_act={'inplace': True, 'name': 'relu6'}, out_units=0, out_act=None, dropout_rate=0, norm_layer=None, use_norm=True, norm_before=True, in_norm=True)[source]
- static add_argparse_args(parser, prefix=None)
TDNN
- class hyperion.torch.narchs.tdnn.TDNNV1(*args: Any, **kwargs: Any)[source]
- __init__(num_blocks, in_units, hid_units, out_units=0, kernel_size=3, dilation=1, dilation_factor=1, hid_act={'inplace': True, 'name': 'relu'}, out_act=None, dropout_rate=0, norm_layer=None, use_norm=True, norm_before=True, in_norm=True, pooling=None)[source]
- property in_context
- copy()
- property device
- freeze()
- get_loss()
- get_reg_loss()
- in_dim()
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- save(file_path)
- unfreeze()
E-TDNN
- class hyperion.torch.narchs.etdnn.ETDNNV1(*args: Any, **kwargs: Any)[source]
- __init__(num_blocks, in_units, hid_units, out_units=0, kernel_size=3, dilation=1, dilation_factor=1, hid_act={'inplace': True, 'name': 'relu'}, out_act=None, dropout_rate=0, norm_layer=None, use_norm=True, norm_before=True, in_norm=True, pooling=None)[source]
- property in_context
- copy()
- property device
- freeze()
- get_loss()
- get_reg_loss()
- in_dim()
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- save(file_path)
- unfreeze()
Residual E-TDNN
- class hyperion.torch.narchs.resetdnn.ResETDNNV1(*args: Any, **kwargs: Any)[source]
- __init__(num_blocks, in_units, hid_units, expand_units, out_units=0, kernel_size=3, dilation=1, dilation_factor=1, hid_act={'inplace': True, 'name': 'relu'}, out_act=None, dropout_rate=0, norm_layer=None, use_norm=True, norm_before=True, in_norm=True, pooling=None)[source]
- property in_context
- copy()
- property device
- freeze()
- get_loss()
- get_reg_loss()
- in_dim()
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- save(file_path)
- unfreeze()
Cannonical ResNets/SE-ResNets/Res2Nets
These classes can be used to build cannonical ResNets, SE-ResNets and Res2Nets. There is a factory class that creates ResNets from config params.
- class hyperion.torch.narchs.resnet_factory.ResNetFactory[source]
- static create(resnet_type, in_channels, conv_channels=64, base_channels=64, out_units=0, hid_act={'inplace': True, 'name': 'relu6'}, out_act=None, in_kernel_size=7, in_stride=2, zero_init_residual=False, groups=1, replace_stride_with_dilation=None, dropout_rate=0, norm_layer=None, norm_before=True, do_maxpool=True, in_norm=True, se_r=16, in_feats=None, res2net_scale=4, res2net_width_factor=1)[source]
- static add_argparse_args(parser, prefix=None)
Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- class hyperion.torch.narchs.resnet.ResNet(*args: Any, **kwargs: Any)[source]
ResNet2D base class
- block
resnet basic block type in [‘basic’, ‘bn’, ‘sebasic’, ‘sebn’], meaning basic resnet block, bottleneck resnet block, basic block with squeeze-excitation, and bottleneck block with squeeze-excitation
- num_layers
list with the number of layers in each of the 4 layer blocks that we find in resnets, after each layer block feature maps are downsmapled times 2 in each dimension and channels are upsampled times 2.
- in_channels
number of input channels
- conv_channels
number of output channels in first conv layer (stem)
- base_channels
number of channels in the first layer block
- out_units
number of logits in the output layer, if 0 there is no output layer and resnet is used just as feature extractor, for example for x-vector encoder.
- in_kernel_size
kernels size of first conv layer
- hid_act
str or dictionary describing hidden activations.
- out_act
output activation
- zero_init_residual
initializes batchnorm weights to zero so each residual block behaves as identitiy at the beggining. We observed worse results when using this option in x-vectors
- groups
number of groups in convolutions
- replace_stride_with_dilation
use dialted conv nets instead of downsammpling, we never tested this.
- dropout_rate
dropout rate
- norm_layer
norm_layer object or str indicating type layer-norm object, if None it uses BatchNorm2d
- do_maxpool
if False, removes the maxpooling layer at the stem of the network.
- in_norm
if True, adds another batch norm layer in the input
- se_r
squeeze-excitation dimension compression
- time_se
if True squeeze-excitation embedding is obtaining by averagin only in the time dimension, instead of time-freq dimension or HxW dimensions
- in_feats
input feature size (number of components in dimension of 2 of input tensor), this is only required when time_se=True to calculcate the size of the squeeze excitation matrices.
- __init__(block, num_layers, in_channels, conv_channels=64, base_channels=64, out_units=0, hid_act={'inplace': True, 'name': 'relu6'}, out_act=None, in_kernel_size=7, in_stride=2, zero_init_residual=False, multilevel=False, endpoint_channels=64, groups=1, replace_stride_with_dilation=None, dropout_rate=0, norm_layer=None, norm_before=True, do_maxpool=True, in_norm=True, se_r=16, time_se=False, in_feats=None, res2net_scale=4, res2net_width_factor=1)[source]
- _compute_out_size(in_size)[source]
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- out_shape(in_shape=None)[source]
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- _forward(x)[source]
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- forward_hid_feats(x, layers=None, return_output=False)[source]
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- copy()
- property device
- freeze()
- get_loss()
- get_reg_loss()
- in_dim()
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.ResNet18(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.ResNet34(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.ResNet50(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.ResNet101(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.ResNet152(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.ResNext50_32x4d(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.ResNext101_32x8d(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.WideResNet50(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.WideResNet101(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.LResNet18(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.LResNet34(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.LResNet50(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.LResNext50_4x4d(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.SEResNet18(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.SEResNet34(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.SEResNet50(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.SEResNet101(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.SEResNet152(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.SEResNext50_32x4d(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.SEResNext101_32x8d(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.SEWideResNet50(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.SEWideResNet101(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.SELResNet18(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.SELResNet34(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.SELResNet50(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.SELResNext50_4x4d(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.TSEResNet18(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.TSEResNet34(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.TSEResNet50(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.TSEResNet101(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.TSEResNet152(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.TSEResNext50_32x4d(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.TSEResNext101_32x8d(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.TSEWideResNet50(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.TSEWideResNet101(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.TSELResNet18(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.TSELResNet34(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.TSELResNet50(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.TSELResNext50_4x4d(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.Res2Net18(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.Res2Net34(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.Res2Net50(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.Res2Net101(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.Res2Net152(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.Res2Next50_32x4d(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.Res2Next101_32x8d(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.WideRes2Net50(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.WideRes2Net101(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.LRes2Net50(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.LRes2Next50_4x4d(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.SERes2Net18(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.SERes2Net34(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.SERes2Net50(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.SERes2Net101(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.SERes2Net152(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.SERes2Next50_32x4d(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.SERes2Next101_32x8d(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.SEWideRes2Net50(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.SEWideRes2Net101(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.SELRes2Net50(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.SELRes2Next50_4x4d(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.TSERes2Net18(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.TSERes2Net34(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.TSERes2Net50(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.TSERes2Net101(*args: Any, **kwargs: Any)[source]
-
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.TSERes2Net152(*args: Any, **kwargs: Any)[source]
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.TSERes2Next50_32x4d(*args: Any, **kwargs: Any)[source]
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.TSERes2Next101_32x8d(*args: Any, **kwargs: Any)[source]
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.TSEWideRes2Net50(*args: Any, **kwargs: Any)[source]
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.TSEWideRes2Net101(*args: Any, **kwargs: Any)[source]
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.TSELRes2Net50(*args: Any, **kwargs: Any)[source]
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.TSELRes2Next50_4x4d(*args: Any, **kwargs: Any)[source]
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.resnet.LResNet34_345(*args: Any, **kwargs: Any)[source]
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- forward(x, use_amp=False)
- forward_hid_feats(x, layers=None, return_output=False)
forward function which also returns intermediate hidden representations
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
layers – list of hidden layers to return hidden representations
return_output – if True if returns the output representations in a separate tensor.
- Returns
List of hidden representation tensors Tensor with output representations if return_output is True
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- Returns
Tuple (past, future) context required to predict one frame.
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape
- Parameters
in_shape – input shape
- Returns
Tuple describing output shape for the network
- save(file_path)
- unfreeze()
SpineNets/Spine2Nets
- class hyperion.torch.narchs.spinenet_factory.SpineNetFactory[source]
- static create(spinenet_type, in_channels, output_levels=[3, 4, 5, 6, 7], endpoints_num_filters=256, resample_alpha=0.5, block_repeats=1, filter_size_scale=1.0, conv_channels=64, base_channels=64, out_units=0, hid_act={'inplace': True, 'name': 'relu6'}, out_act=None, in_kernel_size=7, in_stride=2, zero_init_residual=False, groups=1, dropout_rate=0, norm_layer=None, norm_before=True, do_maxpool=True, in_norm=True, se_r=16, in_feats=None, res2net_scale=4, res2net_width_factor=1)[source]
- static add_argparse_args(parser, prefix=None)
Copyright 2020 Magdalena Rybicka Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- class hyperion.torch.narchs.spinenet.SpineNet(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, block_specs=None, output_levels=[3, 4, 5, 6, 7], endpoints_num_filters=256, resample_alpha=0.5, feature_output_level=None, block_repeats=1, filter_size_scale=1.0, conv_channels=64, base_channels=64, out_units=0, concat=False, do_endpoint_conv=True, concat_ax=3, upsampling_type='nearest', hid_act={'inplace': True, 'name': 'relu6'}, out_act=None, in_kernel_size=7, in_stride=2, zero_init_residual=False, groups=1, dropout_rate=0, norm_layer=None, norm_before=True, do_maxpool=True, in_norm=True, in_feats=None, se_r=16, time_se=False, has_se=False, is_res2net=False, res2net_scale=4, res2net_width_factor=1)[source]
Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027
- Parameters
in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block
is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)
- _make_permuted_connections(block_specs)[source]
Builds the cross-scale connections between the blocks.
- _make_endpoints()[source]
Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.
- _compute_max_context(in_context)[source]
Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.
- _compute_out_size(in_size)[source]
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _compute_channel_size()[source]
- Returns
If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.
- out_shape(in_shape=None)[source]
Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #
- _forward(x)[source]
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- copy()
- property device
- freeze()
- get_loss()
- get_reg_loss()
- in_context()
- in_dim()
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.spinenet.SpineNet49(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, **kwargs)[source]
Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027
- Parameters
in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block
is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)
- _compute_channel_size()
- Returns
If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.
- _compute_max_context(in_context)
Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- _make_endpoints()
Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.
- _make_permuted_blocks(block_specs)
Builds the blocks of the SpineNet structure.
- _make_permuted_connections(block_specs)
Builds the cross-scale connections between the blocks.
- _match_feat_shape(feat0, feat1)
Match shape between feats of the input connections.
- copy()
- property device
- forward(x, use_amp=False)
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.spinenet.SpineNet49S(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, **kwargs)[source]
Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027
- Parameters
in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block
is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)
- _compute_channel_size()
- Returns
If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.
- _compute_max_context(in_context)
Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- _make_endpoints()
Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.
- _make_permuted_blocks(block_specs)
Builds the blocks of the SpineNet structure.
- _make_permuted_connections(block_specs)
Builds the cross-scale connections between the blocks.
- _match_feat_shape(feat0, feat1)
Match shape between feats of the input connections.
- copy()
- property device
- forward(x, use_amp=False)
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.spinenet.SpineNet96(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, **kwargs)[source]
Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027
- Parameters
in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block
is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)
- _compute_channel_size()
- Returns
If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.
- _compute_max_context(in_context)
Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- _make_endpoints()
Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.
- _make_permuted_blocks(block_specs)
Builds the blocks of the SpineNet structure.
- _make_permuted_connections(block_specs)
Builds the cross-scale connections between the blocks.
- _match_feat_shape(feat0, feat1)
Match shape between feats of the input connections.
- copy()
- property device
- forward(x, use_amp=False)
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.spinenet.SpineNet143(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, **kwargs)[source]
Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027
- Parameters
in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block
is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)
- _compute_channel_size()
- Returns
If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.
- _compute_max_context(in_context)
Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- _make_endpoints()
Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.
- _make_permuted_blocks(block_specs)
Builds the blocks of the SpineNet structure.
- _make_permuted_connections(block_specs)
Builds the cross-scale connections between the blocks.
- _match_feat_shape(feat0, feat1)
Match shape between feats of the input connections.
- copy()
- property device
- forward(x, use_amp=False)
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.spinenet.SpineNet190(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, **kwargs)[source]
Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027
- Parameters
in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block
is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)
- _compute_channel_size()
- Returns
If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.
- _compute_max_context(in_context)
Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- _make_endpoints()
Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.
- _make_permuted_blocks(block_specs)
Builds the blocks of the SpineNet structure.
- _make_permuted_connections(block_specs)
Builds the cross-scale connections between the blocks.
- _match_feat_shape(feat0, feat1)
Match shape between feats of the input connections.
- copy()
- property device
- forward(x, use_amp=False)
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.spinenet.LSpineNet49(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, **kwargs)[source]
Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027
- Parameters
in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block
is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)
- _compute_channel_size()
- Returns
If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.
- _compute_max_context(in_context)
Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- _make_endpoints()
Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.
- _make_permuted_blocks(block_specs)
Builds the blocks of the SpineNet structure.
- _make_permuted_connections(block_specs)
Builds the cross-scale connections between the blocks.
- _match_feat_shape(feat0, feat1)
Match shape between feats of the input connections.
- copy()
- property device
- forward(x, use_amp=False)
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.spinenet.LSpineNet49_subpixel(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, **kwargs)[source]
Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027
- Parameters
in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block
is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)
- _compute_channel_size()
- Returns
If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.
- _compute_max_context(in_context)
Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- _make_endpoints()
Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.
- _make_permuted_blocks(block_specs)
Builds the blocks of the SpineNet structure.
- _make_permuted_connections(block_specs)
Builds the cross-scale connections between the blocks.
- _match_feat_shape(feat0, feat1)
Match shape between feats of the input connections.
- copy()
- property device
- forward(x, use_amp=False)
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.spinenet.LSpineNet49_bilinear(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, **kwargs)[source]
Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027
- Parameters
in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block
is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)
- _compute_channel_size()
- Returns
If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.
- _compute_max_context(in_context)
Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- _make_endpoints()
Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.
- _make_permuted_blocks(block_specs)
Builds the blocks of the SpineNet structure.
- _make_permuted_connections(block_specs)
Builds the cross-scale connections between the blocks.
- _match_feat_shape(feat0, feat1)
Match shape between feats of the input connections.
- copy()
- property device
- forward(x, use_amp=False)
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.spinenet.LSpineNet49_5(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, **kwargs)[source]
Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027
- Parameters
in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block
is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)
- _compute_channel_size()
- Returns
If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.
- _compute_max_context(in_context)
Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- _make_endpoints()
Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.
- _make_permuted_blocks(block_specs)
Builds the blocks of the SpineNet structure.
- _make_permuted_connections(block_specs)
Builds the cross-scale connections between the blocks.
- _match_feat_shape(feat0, feat1)
Match shape between feats of the input connections.
- copy()
- property device
- forward(x, use_amp=False)
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.spinenet.LSpine2Net49(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, **kwargs)[source]
Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027
- Parameters
in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block
is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)
- _compute_channel_size()
- Returns
If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.
- _compute_max_context(in_context)
Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- _make_endpoints()
Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.
- _make_permuted_blocks(block_specs)
Builds the blocks of the SpineNet structure.
- _make_permuted_connections(block_specs)
Builds the cross-scale connections between the blocks.
- _match_feat_shape(feat0, feat1)
Match shape between feats of the input connections.
- copy()
- property device
- forward(x, use_amp=False)
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.spinenet.SELSpine2Net49(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, **kwargs)[source]
Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027
- Parameters
in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block
is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)
- _compute_channel_size()
- Returns
If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.
- _compute_max_context(in_context)
Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- _make_endpoints()
Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.
- _make_permuted_blocks(block_specs)
Builds the blocks of the SpineNet structure.
- _make_permuted_connections(block_specs)
Builds the cross-scale connections between the blocks.
- _match_feat_shape(feat0, feat1)
Match shape between feats of the input connections.
- copy()
- property device
- forward(x, use_amp=False)
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.spinenet.TSELSpine2Net49(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, **kwargs)[source]
Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027
- Parameters
in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block
is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)
- _compute_channel_size()
- Returns
If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.
- _compute_max_context(in_context)
Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- _make_endpoints()
Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.
- _make_permuted_blocks(block_specs)
Builds the blocks of the SpineNet structure.
- _make_permuted_connections(block_specs)
Builds the cross-scale connections between the blocks.
- _match_feat_shape(feat0, feat1)
Match shape between feats of the input connections.
- copy()
- property device
- forward(x, use_amp=False)
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.spinenet.Spine2Net49(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, **kwargs)[source]
Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027
- Parameters
in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block
is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)
- _compute_channel_size()
- Returns
If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.
- _compute_max_context(in_context)
Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- _make_endpoints()
Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.
- _make_permuted_blocks(block_specs)
Builds the blocks of the SpineNet structure.
- _make_permuted_connections(block_specs)
Builds the cross-scale connections between the blocks.
- _match_feat_shape(feat0, feat1)
Match shape between feats of the input connections.
- copy()
- property device
- forward(x, use_amp=False)
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.spinenet.SESpine2Net49(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, **kwargs)[source]
Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027
- Parameters
in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block
is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)
- _compute_channel_size()
- Returns
If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.
- _compute_max_context(in_context)
Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- _make_endpoints()
Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.
- _make_permuted_blocks(block_specs)
Builds the blocks of the SpineNet structure.
- _make_permuted_connections(block_specs)
Builds the cross-scale connections between the blocks.
- _match_feat_shape(feat0, feat1)
Match shape between feats of the input connections.
- copy()
- property device
- forward(x, use_amp=False)
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.spinenet.TSESpine2Net49(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, **kwargs)[source]
Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027
- Parameters
in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block
is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)
- _compute_channel_size()
- Returns
If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.
- _compute_max_context(in_context)
Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- _make_endpoints()
Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.
- _make_permuted_blocks(block_specs)
Builds the blocks of the SpineNet structure.
- _make_permuted_connections(block_specs)
Builds the cross-scale connections between the blocks.
- _match_feat_shape(feat0, feat1)
Match shape between feats of the input connections.
- copy()
- property device
- forward(x, use_amp=False)
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.spinenet.Spine2Net49S(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, **kwargs)[source]
Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027
- Parameters
in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block
is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)
- _compute_channel_size()
- Returns
If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.
- _compute_max_context(in_context)
Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- _make_endpoints()
Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.
- _make_permuted_blocks(block_specs)
Builds the blocks of the SpineNet structure.
- _make_permuted_connections(block_specs)
Builds the cross-scale connections between the blocks.
- _match_feat_shape(feat0, feat1)
Match shape between feats of the input connections.
- copy()
- property device
- forward(x, use_amp=False)
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.spinenet.SESpine2Net49S(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, **kwargs)[source]
Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027
- Parameters
in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block
is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)
- _compute_channel_size()
- Returns
If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.
- _compute_max_context(in_context)
Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- _make_endpoints()
Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.
- _make_permuted_blocks(block_specs)
Builds the blocks of the SpineNet structure.
- _make_permuted_connections(block_specs)
Builds the cross-scale connections between the blocks.
- _match_feat_shape(feat0, feat1)
Match shape between feats of the input connections.
- copy()
- property device
- forward(x, use_amp=False)
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.spinenet.TSESpine2Net49S(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, **kwargs)[source]
Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027
- Parameters
in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block
is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)
- _compute_channel_size()
- Returns
If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.
- _compute_max_context(in_context)
Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- _make_endpoints()
Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.
- _make_permuted_blocks(block_specs)
Builds the blocks of the SpineNet structure.
- _make_permuted_connections(block_specs)
Builds the cross-scale connections between the blocks.
- _match_feat_shape(feat0, feat1)
Match shape between feats of the input connections.
- copy()
- property device
- forward(x, use_amp=False)
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.spinenet.LR0_SP53(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, **kwargs)[source]
Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027
- Parameters
in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block
is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)
- _compute_channel_size()
- Returns
If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.
- _compute_max_context(in_context)
Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- _make_endpoints()
Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.
- _make_permuted_blocks(block_specs)
Builds the blocks of the SpineNet structure.
- _make_permuted_connections(block_specs)
Builds the cross-scale connections between the blocks.
- _match_feat_shape(feat0, feat1)
Match shape between feats of the input connections.
- copy()
- property device
- forward(x, use_amp=False)
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.spinenet.R0_SP53(*args: Any, **kwargs: Any)[source]
- __init__(in_channels, **kwargs)[source]
Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027
- Parameters
in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block
is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)
- _compute_channel_size()
- Returns
If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.
- _compute_max_context(in_context)
Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- _make_endpoints()
Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.
- _make_permuted_blocks(block_specs)
Builds the blocks of the SpineNet structure.
- _make_permuted_connections(block_specs)
Builds the cross-scale connections between the blocks.
- _match_feat_shape(feat0, feat1)
Match shape between feats of the input connections.
- copy()
- property device
- forward(x, use_amp=False)
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #
- save(file_path)
- unfreeze()
- class hyperion.torch.narchs.spinenet.SpineNet49_concat_time(*args: Any, **kwargs: Any)[source]
- _compute_channel_size()
- Returns
If the 1x1 conv is not conducted in the endpoint blocks, the number of channels is equal to the sum of the nbr of channels of the output blocks.
- _compute_max_context(in_context)
Computes maximum possible context in the structure. The method may need a deeper revision. :param in_context: context from the input residual block.
- _compute_out_size(in_size)
- Computes output size given input size.
Output size is not the same as input size because of downsampling steps.
- Parameters
in_size – input size of the H or W dimensions
- Returns
output_size
- _forward(x)
forward function
- Parameters
x – input tensor of size=(batch, Cin, Hin, Win) for image or size=(batch, C, freq, time) for audio
- Returns
Tensor with output logits of size=(batch, out_units) if out_units>0, otherwise, it returns tensor of represeantions of size=(batch, Cout, Hout, Wout)
- _make_endpoints()
Builds the output endpoint blocks. In this part, the block outputs are forwarded through the 1x1 convs to the common number of channels (endpoints_num_filters) and feature maps are resized to the size of the feature_output_level.
- _make_permuted_blocks(block_specs)
Builds the blocks of the SpineNet structure.
- _make_permuted_connections(block_specs)
Builds the cross-scale connections between the blocks.
- _match_feat_shape(feat0, feat1)
Match shape between feats of the input connections.
- copy()
- property device
- forward(x, use_amp=False)
- freeze()
- get_config()
Gets network config :returns: dictionary with config params
- get_loss()
- get_reg_loss()
- in_context()
- in_dim()
- in_shape()
- Returns
Tuple describing input shape for the network
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- out_shape(in_shape=None)
Computes the output shape given the input shape # # Args: # in_shape: input shape # Returns: # Tuple describing output shape for the network #
- save(file_path)
- unfreeze()
- __init__(in_channels, **kwargs)[source]
Base class for the SpineNet structure. Based on the paper SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song https://arxiv.org/abs/1912.05027
- Parameters
in_channels – nbr of channels of the input
block_specs – specification of the building blocks: their type, input connections and information if block
is an output :param output_levels: the output levels of the blocks that are taken as an output of the SpineNet :param endpoints_num_filters: the base number of channels out the SpineNet output :param resample_alpha: parameter for resampling connections :param concat: bool that decides wheter the outputs are concatenated or averaged :param do_endpoint_conv: bool that decides whether to do the projection of the output blocks to the common number of channels (the value of the number is the endpoints_num_filters) :param concat_ax: the axis along which perform the concatenation (if the concatenation is chosen) :param feature_output_level: the level that the output feature map sizes are resampled to (by default the target size is the biggest feature map) :param filter_size_scale: SpineNet parameter, that additionally rescales the number of channels of the SpineNet blocks(needed for bigger structures like SpineNet96 and higher or for SpineNet49S)
ResNet Encoder/Decoders
These are Encoder/Decoders based on flexible ResNets 1d and 2d.
ResNet Encoder 1d
- class hyperion.torch.narchs.resnet1d_encoder.ResNet1dEncoder(*args: Any, **kwargs: Any)[source]
- __init__(in_feats, in_conv_channels=128, in_kernel_size=3, in_stride=1, resb_type='basic', resb_repeats=[1, 1, 1], resb_channels=128, resb_kernel_sizes=3, resb_strides=2, resb_dilations=1, resb_groups=1, head_channels=0, hid_act='relu6', head_act=None, dropout_rate=0, drop_connect_rate=0, se_r=16, res2net_width_factor=1, res2net_scale=4, multilayer=False, multilayer_concat=False, endpoint_channels=None, endpoint_layers=None, endpoint_scale_layer=- 1, use_norm=True, norm_layer=None, norm_before=True, upsampling_mode='nearest')[source]
- copy()
- property device
- freeze()
- get_loss()
- get_reg_loss()
- in_dim()
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- save(file_path)
- unfreeze()
- static add_argparse_args(parser, prefix=None, skip={'in_feats'})
ResNet Decoder 1d
- class hyperion.torch.narchs.resnet1d_decoder.ResNet1dDecoder(*args: Any, **kwargs: Any)[source]
- __init__(in_channels=128, in_conv_channels=128, in_kernel_size=3, in_stride=1, resb_type='basic', resb_repeats=[1, 1, 1], resb_channels=128, resb_kernel_sizes=3, resb_strides=2, resb_dilations=1, resb_groups=1, head_channels=0, hid_act='relu6', head_act=None, dropout_rate=0, se_r=16, use_norm=True, norm_layer=None, norm_before=True)[source]
- copy()
- property device
- freeze()
- get_loss()
- get_reg_loss()
- in_dim()
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- save(file_path)
- unfreeze()
- static add_argparse_args(parser, prefix=None)
ResNet Encoder 2d
- class hyperion.torch.narchs.resnet2d_encoder.ResNet2dEncoder(*args: Any, **kwargs: Any)[source]
- __init__(in_channels=1, in_conv_channels=64, in_kernel_size=3, in_stride=1, resb_type='basic', resb_repeats=[2, 2, 2, 2], resb_channels=[64, 128, 256, 512], resb_kernel_sizes=3, resb_strides=2, resb_dilations=1, resb_groups=1, head_channels=0, hid_act='relu6', head_act=None, dropout_rate=0, se_r=16, time_se=False, in_feats=None, res2net_width_factor=1, res2net_scale=4, use_norm=True, norm_layer=None, norm_before=True)[source]
- copy()
- property device
- freeze()
- get_loss()
- get_reg_loss()
- in_dim()
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- save(file_path)
- unfreeze()
- static add_argparse_args(parser, prefix=None, skip={})
ResNet Decoder 2d
- class hyperion.torch.narchs.resnet2d_decoder.ResNet2dDecoder(*args: Any, **kwargs: Any)[source]
- __init__(in_channels=512, in_conv_channels=512, in_kernel_size=3, in_stride=1, resb_type='basic', resb_repeats=[2, 2, 2, 2], resb_channels=[512, 256, 128, 64], resb_kernel_sizes=3, resb_strides=2, resb_dilations=1, resb_groups=1, head_channels=0, hid_act='relu6', head_act=None, dropout_rate=0, se_r=16, use_norm=True, norm_layer=None, norm_before=True)[source]
- copy()
- property device
- freeze()
- get_loss()
- get_reg_loss()
- in_dim()
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- save(file_path)
- unfreeze()
- static add_argparse_args(parser, prefix=None)
EfficientNet
Transformer
- class hyperion.torch.narchs.transformer_encoder_v1.TransformerEncoderV1(*args: Any, **kwargs: Any)[source]
Transformer encoder module.
- in_feats
input features dimension
- d_model
encoder blocks feature dimension
- num_heads
number of heads
- num_blocks
number of self attn blocks
- att_type
string in [‘scaled-dot-prod-att-v1’, ‘local-scaled-dot-prod-att-v1’]
- att_context
maximum context range for local attention
- ff_type
string in [‘linear’, ‘conv1dx2’, ‘conv1d-linear’]
- d_ff
dimension of middle layer in feed_forward block
- ff_kernel_size
kernel size for convolutional versions of ff block
- ff_dropout_rate
dropout rate for ff block
- pos_dropout_rate
dropout rate for positional encoder
- att_dropout_rate
dropout rate for attention block
- in_layer_type
input layer block type in [‘linear’,’conv2d-sub’, ‘embed’, None]
- rel_pos_enc
if True, use relative postional encodings, absolute encodings otherwise.
- causal_pos_enc
if True, use causal positional encodings (when rel_pos_enc=True), it assumes that query q_i only attents to key k_j when j<=i
- hid_act
hidden activations in ff and input blocks
- norm_before
if True, use layer norm before layers, otherwise after
- concat_after
- if True, if concats attention input and output and apply linear transform, i.e.,
y = x + linear(concat(x, att(x)))
if False, y = x + att(x)
- padding_idx
padding idx for embed layer
- in_time_dim
time dimension in the input Tensor
- out_time_dim
dimension that we want to be time in the output tensor
- __init__(in_feats, d_model=256, num_heads=4, num_blocks=6, att_type='scaled-dot-prod-v1', att_context=25, ff_type='linear', d_ff=2048, ff_kernel_size=1, ff_dropout_rate=0.1, pos_dropout_rate=0.1, att_dropout_rate=0.0, in_layer_type='conv2d-sub', rel_pos_enc=False, causal_pos_enc=False, hid_act='relu6', norm_before=True, concat_after=False, padding_idx=- 1, in_time_dim=- 1, out_time_dim=1)[source]
- _forward(x, mask=None, target_shape=None)[source]
Forward pass function
- Parameters
x – input tensor with size=(batch, time, num_feats)
mask – mask to indicate valid time steps for x (batch, time)
- Returns
Tensor with output features Tensor with mask
- out_shape(in_shape=None)[source]
Infers the network output shape given the input shape
- Parameters
in_shape – input shape tuple
- Returns
Tuple with the output shape
- static filter_args(**kwargs)[source]
- Filters arguments correspondin to TransformerXVector
from args dictionary
- Parameters
kwargs – args dictionary
- Returns
args dictionary
- static add_class_args(parser, prefix=None, in_feats=False)[source]
Adds Transformer config parameters to argparser
- Parameters
parser – argparse object
prefix – prefix string to add to the argument names
- static add_argparse_args(parser, prefix=None, in_feats=False)
Adds Transformer config parameters to argparser
- Parameters
parser – argparse object
prefix – prefix string to add to the argument names
- copy()
- property device
- freeze()
- get_loss()
- get_reg_loss()
- in_dim()
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- save(file_path)
- unfreeze()
Conformer
- class hyperion.torch.narchs.conformer_encoder_v1.ConformerEncoderV1(*args: Any, **kwargs: Any)[source]
- Conformer encoder introduced in
https://arxiv.org/pdf/2005.08100.pdf
This includes some optional extra features not included in the original paper:
Choose local-attention (attending only to close frames instead of all the frames in the sequence)
Choose number of conv blocks in each conformer layer
Squeeze-Excitation after depthwise-conv
Allows downsampling in time dimension
Allows choosing activation and layer normalization type
We call this Conformer+
This becomes a standard Transformer by setting conv_repeats=0, pos_enc_type=’abs’, ff_macaron=False.
- in_feats
input features dimension
- d_model
encoder blocks feature dimension
- num_heads
number of heads
- num_blocks
number of self attn blocks
- att_type
string in [‘scaled-dot-prod-att-v1’, ‘local-scaled-dot-prod-att-v1’]
- att_context
maximum context range for local attention
- conv_repeats
number of conv blocks in each conformer block
- conv_kernel_sizes
kernel size for conv blocks
- conv_strides
stride for depth-wise conv in the first conv block of each conformer block
- ff_type
string in [‘linear’, ‘conv1dx2’, ‘conv1d-linear’]
- d_ff
dimension of middle layer in feed_forward block
- ff_kernel_size
kernel size for convolutional versions of ff block
- dropout_rate
dropout rate for ff and conv blocks
- pos_dropout_rate
dropout rate for positional encoder
- att_dropout_rate
dropout rate for attention block
- in_layer_type
input layer block type in [‘linear’,’conv2d-sub’, ‘embed’, None]
- pos_enc_type
type of positional encoder [‘no’, ‘abs’, ‘rel’]
- causal_pos_enc
if True, use causal positional encodings (when rel_pos_enc=True), it assumes that query q_i only attents to key k_j when j<=i
- no_pos_enc
if True, it doesn’t use positional encoder.
- hid_act
hidden activations in ff and input blocks
- conv_norm_layer
norm layer constructor or str for conv block, if None it uses BatchNorm1d
- se_r
Squeeze-Excitation compression ratio, if None it doesn’t use Squeeze-Excitation
- ff_macaron
if True, it uses macaron-net style ff layers, otherwise transformer style.
- red_lnorms
it True, use redundant LNorm layers at the output of the conformer blocks as in the paper
- concat_after
- if True, if concats attention input and output and apply linear transform, i.e.,
y = x + linear(concat(x, att(x)))
if False, y = x + att(x)
- padding_idx
padding idx for embed layer
- in_time_dim
time dimension in the input Tensor
- out_time_dim
dimension that we want to be time in the output tensor
- rel_pos_enc
if True, use relative postional encodings, absolute encodings otherwise. (deprecated)
- red_lnorm
(deprecated)
- __init__(in_feats, d_model=256, num_heads=4, num_blocks=6, att_type='scaled-dot-prod-v1', att_context=25, conv_repeats=1, conv_kernel_sizes=31, conv_strides=1, ff_type='linear', d_ff=2048, ff_kernel_size=1, dropout_rate=0.1, pos_dropout_rate=0.1, att_dropout_rate=0.0, in_layer_type='conv2d-sub', pos_enc_type='rel', causal_pos_enc=False, hid_act='swish', conv_norm_layer=None, se_r=None, ff_macaron=True, red_lnorms=False, concat_after=False, padding_idx=- 1, in_time_dim=- 1, out_time_dim=1, rel_pos_enc=True, red_lnorm=False)[source]
- forward(x, mask=None, target_shape=None)[source]
Forward pass function
- Parameters
x – input tensor with size=(batch, time, num_feats)
mask – mask to indicate valid time steps for x (batch, time)
- Returns
Tensor with output features Tensor with mask
- out_shape(in_shape=None)[source]
Infers the network output shape given the input shape
- Parameters
in_shape – input shape tuple
- Returns
Tuple with the output shape
- static filter_args(**kwargs)[source]
- Filters arguments correspondin to TransformerXVector
from args dictionary
- Parameters
kwargs – args dictionary
- Returns
args dictionary
- static add_class_args(parser, prefix=None, in_feats=False)[source]
Adds Conformer config parameters to argparser
- Parameters
parser – argparse object
prefix – prefix string to add to the argument names
- static add_argparse_args(parser, prefix=None, in_feats=False)
Adds Conformer config parameters to argparser
- Parameters
parser – argparse object
prefix – prefix string to add to the argument names
- copy()
- property device
- freeze()
- get_loss()
- get_reg_loss()
- in_dim()
- classmethod load(file_path=None, cfg=None, state_dict=None)
- out_dim()
- save(file_path)
- unfreeze()
Models
These include complex models created by connecting several network architectures.
x-Vectors
There are several variants of x-vector embeddings. They all derive from the same base class.
- class hyperion.torch.models.xvectors.xvector.XVector(*args: Any, **kwargs: Any)[source]
x-Vector base class
- __init__(encoder_net, num_classes, pool_net='mean+stddev', embed_dim=256, num_embed_layers=1, hid_act={'inplace': True, 'name': 'relu'}, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=0, num_subcenters=2, norm_layer=None, head_norm_layer=None, use_norm=True, norm_before=True, dropout_rate=0, embed_layer=0, in_feats=None, proj_feats=None)[source]
- property pool_feats
- property num_classes
- property embed_dim
- property num_embed_layers
- property s
- property margin
- property margin_warmup_epochs
- property num_subcenters
- property loss_type
- _make_pool_net(pool_net, enc_feats=None)[source]
Makes the pooling block
- Parameters
pool_net – str or dict to pass to the pooling factory create function
enc_feats – dimension of the features coming from the encoder
- Returns
GlobalPool1d object
- update_loss_margin(epoch)[source]
- Updates the value of the margin in AAM/AM-softmax losses
given the epoch number
- Parameters
epoch – epoch which is about to start
- forward_output(x, y=None)[source]
Forward function
- Parameters
x – input features tensor with shape=(batch, in_feats, time)
y – target classes torch.long tensor with shape=(batch,)
- Returns
class posteriors tensor with shape=(batch, num_classes)
- forward_hid_feats(x, y=None, enc_layers=None, classif_layers=None, return_output=False)[source]
forwards hidden representations in the x-vector network
- extract_embed_slidwin(x, win_length, win_shift, snip_edges=False, feat_frame_length=None, feat_frame_shift=None, chunk_length=0, embed_layer=None, detach_chunks=False)[source]
- compute_slidwin_timestamps(num_windows, win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)[source]
- compute_slidwin_left_padding(win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)[source]
- rebuild_output_layer(num_classes=None, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=10)[source]
- static add_argparse_args(parser, prefix=None, skip={})
- copy()
- property device
- freeze()
- get_loss()
- get_reg_loss()
- save(file_path)
- unfreeze()
- static add_argparse_finetune_args(parser, prefix=None)
TDNN x-Vector
x-Vectors with TDNN, E-TDNN, Residual E-TDNN Encoders.
- class hyperion.torch.models.xvectors.tdnn_xvector.TDNNXVector(*args: Any, **kwargs: Any)[source]
- __init__(tdnn_type, num_enc_blocks, in_feats, num_classes, enc_hid_units, enc_expand_units=None, kernel_size=3, dilation=1, dilation_factor=1, pool_net='mean+stddev', embed_dim=256, num_embed_layers=1, hid_act={'inplace': True, 'name': 'relu6'}, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=0, num_subcenters=2, dropout_rate=0, norm_layer=None, head_norm_layer=None, use_norm=True, norm_before=False, in_norm=False, embed_layer=0, proj_feats=None)[source]
- property num_enc_blocks
- property enc_hid_units
- property enc_expand_units
- property kernel_size
- property dilation
- property dilation_factor
- property in_norm
- static add_argparse_args(parser, prefix=None)
- _make_pool_net(pool_net, enc_feats=None)
Makes the pooling block
- Parameters
pool_net – str or dict to pass to the pooling factory create function
enc_feats – dimension of the features coming from the encoder
- Returns
GlobalPool1d object
- static add_argparse_finetune_args(parser, prefix=None)
- static add_finetune_args(parser, prefix=None)
- compute_slidwin_left_padding(win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)
- compute_slidwin_timestamps(num_windows, win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)
- copy()
- property device
- property embed_dim
- extract_embed(x, chunk_length=0, embed_layer=None, detach_chunks=False)
- extract_embed_slidwin(x, win_length, win_shift, snip_edges=False, feat_frame_length=None, feat_frame_shift=None, chunk_length=0, embed_layer=None, detach_chunks=False)
- static filter_finetune_args(**kwargs)
- forward(x, y=None, enc_layers=None, classif_layers=None, return_output=True)
- forward_hid_feats(x, y=None, enc_layers=None, classif_layers=None, return_output=False)
forwards hidden representations in the x-vector network
- forward_output(x, y=None)
Forward function
- Parameters
x – input features tensor with shape=(batch, in_feats, time)
y – target classes torch.long tensor with shape=(batch,)
- Returns
class posteriors tensor with shape=(batch, num_classes)
- freeze()
- freeze_preembed_layers()
- get_loss()
- get_reg_loss()
- property loss_type
- property margin
- property margin_warmup_epochs
- property num_classes
- property num_embed_layers
- property num_subcenters
- property pool_feats
- rebuild_output_layer(num_classes=None, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=10)
- property s
- save(file_path)
- train_mode(mode='ft-embed-affine')
- unfreeze()
- update_loss_margin(epoch)
- Updates the value of the margin in AAM/AM-softmax losses
given the epoch number
- Parameters
epoch – epoch which is about to start
ResNet x-Vector
x-Vectors with Cannonical ResNet, Res2Net Encoders.
- class hyperion.torch.models.xvectors.resnet_xvector.ResNetXVector(*args: Any, **kwargs: Any)[source]
- __init__(resnet_type, in_feats, num_classes, in_channels, conv_channels=64, base_channels=64, in_kernel_size=7, in_stride=1, zero_init_residual=False, groups=1, replace_stride_with_dilation=None, do_maxpool=False, pool_net='mean+stddev', embed_dim=256, num_embed_layers=1, hid_act={'inplace': True, 'name': 'relu'}, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=0, num_subcenters=2, dropout_rate=0, norm_layer=None, head_norm_layer=None, use_norm=True, norm_before=True, in_norm=False, embed_layer=0, proj_feats=None, se_r=16, res2net_scale=4, res2net_width_factor=1)[source]
- property in_channels
- property conv_channels
- property base_channels
- property in_kernel_size
- property in_stride
- property zero_init_residual
- property groups
- property replace_stride_with_dilation
- property do_maxpool
- property in_norm
- property se_r
- property res2net_scale
- property res2net_width_factor
- static add_argparse_args(parser, prefix=None)
- _make_pool_net(pool_net, enc_feats=None)
Makes the pooling block
- Parameters
pool_net – str or dict to pass to the pooling factory create function
enc_feats – dimension of the features coming from the encoder
- Returns
GlobalPool1d object
- static add_argparse_finetune_args(parser, prefix=None)
- static add_finetune_args(parser, prefix=None)
- compute_slidwin_left_padding(win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)
- compute_slidwin_timestamps(num_windows, win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)
- copy()
- property device
- property embed_dim
- extract_embed(x, chunk_length=0, embed_layer=None, detach_chunks=False)
- extract_embed_slidwin(x, win_length, win_shift, snip_edges=False, feat_frame_length=None, feat_frame_shift=None, chunk_length=0, embed_layer=None, detach_chunks=False)
- static filter_finetune_args(**kwargs)
- forward(x, y=None, enc_layers=None, classif_layers=None, return_output=True)
- forward_hid_feats(x, y=None, enc_layers=None, classif_layers=None, return_output=False)
forwards hidden representations in the x-vector network
- forward_output(x, y=None)
Forward function
- Parameters
x – input features tensor with shape=(batch, in_feats, time)
y – target classes torch.long tensor with shape=(batch,)
- Returns
class posteriors tensor with shape=(batch, num_classes)
- freeze()
- freeze_preembed_layers()
- get_loss()
- get_reg_loss()
- property loss_type
- property margin
- property margin_warmup_epochs
- property num_classes
- property num_embed_layers
- property num_subcenters
- property pool_feats
- rebuild_output_layer(num_classes=None, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=10)
- property s
- save(file_path)
- train_mode(mode='ft-embed-affine')
- unfreeze()
- update_loss_margin(epoch)
- Updates the value of the margin in AAM/AM-softmax losses
given the epoch number
- Parameters
epoch – epoch which is about to start
SpineNet x-Vector
x-Vectors with SpineNet, Spine2Net Encoders.
- class hyperion.torch.models.xvectors.spinenet_xvector.SpineNetXVector(*args: Any, **kwargs: Any)[source]
- __init__(spinenet_type, in_feats, num_classes, in_channels, output_levels=[3, 4, 5, 6, 7], endpoints_num_filters=256, resample_alpha=0.5, block_repeats=1, filter_size_scale=1.0, conv_channels=64, base_channels=64, in_kernel_size=7, in_stride=1, zero_init_residual=False, groups=1, do_maxpool=False, pool_net='mean+stddev', embed_dim=256, num_embed_layers=1, hid_act={'inplace': True, 'name': 'relu'}, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=0, num_subcenters=2, dropout_rate=0, norm_layer=None, head_norm_layer=None, use_norm=True, norm_before=True, in_norm=False, embed_layer=0, proj_feats=None, se_r=16, res2net_scale=4, res2net_width_factor=1)[source]
- property in_channels
- property output_levels
- property endpoints_num_filters
- property resample_alpha
- property block_repeats
- property filter_size_scale
- property conv_channels
- property base_channels
- property in_kernel_size
- property in_stride
- property zero_init_residual
- property groups
- property do_maxpool
- property in_norm
- property se_r
- property res2net_scale
- property res2net_width_factor
- static add_argparse_args(parser, prefix=None)
- _make_pool_net(pool_net, enc_feats=None)
Makes the pooling block
- Parameters
pool_net – str or dict to pass to the pooling factory create function
enc_feats – dimension of the features coming from the encoder
- Returns
GlobalPool1d object
- static add_argparse_finetune_args(parser, prefix=None)
- static add_finetune_args(parser, prefix=None)
- compute_slidwin_left_padding(win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)
- compute_slidwin_timestamps(num_windows, win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)
- copy()
- property device
- property embed_dim
- extract_embed(x, chunk_length=0, embed_layer=None, detach_chunks=False)
- extract_embed_slidwin(x, win_length, win_shift, snip_edges=False, feat_frame_length=None, feat_frame_shift=None, chunk_length=0, embed_layer=None, detach_chunks=False)
- static filter_finetune_args(**kwargs)
- forward(x, y=None, enc_layers=None, classif_layers=None, return_output=True)
- forward_hid_feats(x, y=None, enc_layers=None, classif_layers=None, return_output=False)
forwards hidden representations in the x-vector network
- forward_output(x, y=None)
Forward function
- Parameters
x – input features tensor with shape=(batch, in_feats, time)
y – target classes torch.long tensor with shape=(batch,)
- Returns
class posteriors tensor with shape=(batch, num_classes)
- freeze()
- freeze_preembed_layers()
- get_loss()
- get_reg_loss()
- property loss_type
- property margin
- property margin_warmup_epochs
- property num_classes
- property num_embed_layers
- property num_subcenters
- property pool_feats
- rebuild_output_layer(num_classes=None, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=10)
- property s
- save(file_path)
- train_mode(mode='ft-embed-affine')
- unfreeze()
- update_loss_margin(epoch)
- Updates the value of the margin in AAM/AM-softmax losses
given the epoch number
- Parameters
epoch – epoch which is about to start
ResNet 1d x-Vector
x-Vectors with ResNet, Res2Net 1d Encoders. It can be cofigured as ECAPA-TDNN
- class hyperion.torch.models.xvectors.resnet1d_xvector.ResNet1dXVector(*args: Any, **kwargs: Any)[source]
- __init__(resnet_enc, num_classes, pool_net='mean+stddev', embed_dim=256, num_embed_layers=1, hid_act={'inplace': True, 'name': 'relu'}, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=0, num_subcenters=2, dropout_rate=0, norm_layer=None, head_norm_layer=None, use_norm=True, norm_before=True, in_norm=False, embed_layer=0, proj_feats=None)[source]
- static add_argparse_args(parser, prefix=None)
- _make_pool_net(pool_net, enc_feats=None)
Makes the pooling block
- Parameters
pool_net – str or dict to pass to the pooling factory create function
enc_feats – dimension of the features coming from the encoder
- Returns
GlobalPool1d object
- static add_argparse_finetune_args(parser, prefix=None)
- static add_finetune_args(parser, prefix=None)
- compute_slidwin_left_padding(win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)
- compute_slidwin_timestamps(num_windows, win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)
- copy()
- property device
- property embed_dim
- extract_embed(x, chunk_length=0, embed_layer=None, detach_chunks=False)
- extract_embed_slidwin(x, win_length, win_shift, snip_edges=False, feat_frame_length=None, feat_frame_shift=None, chunk_length=0, embed_layer=None, detach_chunks=False)
- static filter_finetune_args(**kwargs)
- forward(x, y=None, enc_layers=None, classif_layers=None, return_output=True)
- forward_hid_feats(x, y=None, enc_layers=None, classif_layers=None, return_output=False)
forwards hidden representations in the x-vector network
- forward_output(x, y=None)
Forward function
- Parameters
x – input features tensor with shape=(batch, in_feats, time)
y – target classes torch.long tensor with shape=(batch,)
- Returns
class posteriors tensor with shape=(batch, num_classes)
- freeze()
- freeze_preembed_layers()
- get_loss()
- get_reg_loss()
- property loss_type
- property margin
- property margin_warmup_epochs
- property num_classes
- property num_embed_layers
- property num_subcenters
- property pool_feats
- rebuild_output_layer(num_classes=None, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=10)
- property s
- save(file_path)
- train_mode(mode='ft-embed-affine')
- unfreeze()
- update_loss_margin(epoch)
- Updates the value of the margin in AAM/AM-softmax losses
given the epoch number
- Parameters
epoch – epoch which is about to start
Transfomer x-Vector
x-Vectors based on Transformer Encoder
- class hyperion.torch.models.xvectors.transformer_xvector_v1.TransformerXVectorV1(*args: Any, **kwargs: Any)[source]
x-Vector with Transformer encoder.
- in_feats
input features dimension
- num_classes
number of training classes
- enc_d_model
encoder blocks feature dimension
- num_enc_heads
number of heads
- num_enc_blocks
number of self attn blocks
- enc_att_type
string in [‘scaled-dot-prod-att-v1’, ‘local-scaled-dot-prod-att-v1’]
- enc_att_context
maximum context range for local attention
- enc_ff_type
string in [‘linear’, ‘conv1dx2’, ‘conv1d-linear’]
- enc_d_ff
dimension of middle layer in feed_forward block
- enc_ff_kernel_size
kernel size for convolutional versions of ff block
- in_layer_type
input layer block type in [‘linear’,’conv2d-sub’, ‘embed’, None]
- enc_concat_after
- if True, if concats attention input and output and apply linear transform, i.e.,
y = x + linear(concat(x, att(x)))
if False, y = x + att(x)
- pool_net
pooling block configuration string or dictionary of params
- embed_dim
x-vector dimension
- num_embed_layers
number of hidden layers in classification head
- hid_act
hidden activation configuration string or dictionary
- loss_type
sofmax losss type string in [‘softmax’, ‘arc-softmax’, ‘cos-softmax’]
- s
s parameter in arc/cos-softmax losses
- margin
margin in arc/cos-sofmtax losses
- margin_warmup_epochs
number of epochs until we reach the maximum value for margin
- dropout_rate
dropout rate for ff block and classification head
- pos_dropout_rate
dropout rate for positional encoder
- att_dropout_rate
dropout rate for attention block
- use_norm
if True use batch/layer norm
- norm_before
if True, use layer norm before layers, otherwise after
- in_norm
add batchnorm at the input
- embed_layer
which layer to use to extract x-vectors
- proj_feats
add linear projection layer after the encoder to project feature dimension to proj_feats
- __init__(in_feats, num_classes, enc_d_model=512, num_enc_heads=4, num_enc_blocks=6, enc_att_type='scaled-dot-prod-v1', enc_att_context=25, enc_ff_type='linear', enc_d_ff=2048, enc_ff_kernel_size=1, in_layer_type='conv2d-sub', enc_concat_after=False, pool_net='mean+stddev', embed_dim=256, num_embed_layers=1, hid_act={'inplace': True, 'name': 'relu6'}, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=0, num_subcenters=2, dropout_rate=0.1, pos_dropout_rate=0.1, att_dropout_rate=0.0, norm_layer=None, head_norm_layer=None, use_norm=True, norm_before=False, in_norm=False, embed_layer=0, proj_feats=None)[source]
- property enc_d_model
- property num_enc_heads
- property num_enc_blocks
- property enc_att_type
- property enc_att_context
- property enc_d_ff
- property enc_ff_kernel_size
- property pos_dropout_rate
- property att_dropout_rate
- property in_layer_type
- property enc_concat_after
- property enc_ff_type
- static filter_args(**kwargs)[source]
- Filters arguments correspondin to TransformerXVector
from args dictionary
- Parameters
prefix – prefix string
kwargs – args dictionary
- Returns
args dictionary
- static add_class_args(parser, prefix=None)[source]
Adds TransformerXVector config parameters to argparser
- Parameters
parser – argparse object
prefix – prefix string to add to the argument names
- _make_pool_net(pool_net, enc_feats=None)
Makes the pooling block
- Parameters
pool_net – str or dict to pass to the pooling factory create function
enc_feats – dimension of the features coming from the encoder
- Returns
GlobalPool1d object
- static add_argparse_args(parser, prefix=None)
Adds TransformerXVector config parameters to argparser
- Parameters
parser – argparse object
prefix – prefix string to add to the argument names
- static add_argparse_finetune_args(parser, prefix=None)
- static add_finetune_args(parser, prefix=None)
- compute_slidwin_left_padding(win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)
- compute_slidwin_timestamps(num_windows, win_length, win_shift, snip_edges=False, feat_frame_length=25, feat_frame_shift=10, feat_snip_edges=False)
- copy()
- property device
- property embed_dim
- extract_embed(x, chunk_length=0, embed_layer=None, detach_chunks=False)
- extract_embed_slidwin(x, win_length, win_shift, snip_edges=False, feat_frame_length=None, feat_frame_shift=None, chunk_length=0, embed_layer=None, detach_chunks=False)
- static filter_finetune_args(**kwargs)
- forward(x, y=None, enc_layers=None, classif_layers=None, return_output=True)
- forward_hid_feats(x, y=None, enc_layers=None, classif_layers=None, return_output=False)
forwards hidden representations in the x-vector network
- forward_output(x, y=None)
Forward function
- Parameters
x – input features tensor with shape=(batch, in_feats, time)
y – target classes torch.long tensor with shape=(batch,)
- Returns
class posteriors tensor with shape=(batch, num_classes)
- freeze()
- freeze_preembed_layers()
- get_loss()
- get_reg_loss()
- property loss_type
- property margin
- property margin_warmup_epochs
- property num_classes
- property num_embed_layers
- property num_subcenters
- property pool_feats
- rebuild_output_layer(num_classes=None, loss_type='arc-softmax', s=64, margin=0.3, margin_warmup_epochs=10)
- property s
- save(file_path)
- train_mode(mode='ft-embed-affine')
- unfreeze()
- update_loss_margin(epoch)
- Updates the value of the margin in AAM/AM-softmax losses
given the epoch number
- Parameters
epoch – epoch which is about to start
Auto-Encoder
- class hyperion.torch.models.ae.ae.AE(*args: Any, **kwargs: Any)[source]
Basic Autoencoder class
- encoder_net
NArch encoder network object
- decoder_net
NArch decoder network object
- z_dim
latent variable dimension (inferred from encoder_net output shape)
- copy()
- property device
- freeze()
- get_loss()
- get_reg_loss()
- save(file_path)
- unfreeze()
Variational Auto-Encoders
- class hyperion.torch.models.vae.vae.VAE(*args: Any, **kwargs: Any)[source]
- Variational Autoencoder class
- encoder_net
NArch encoder network object
- decoder_net
NArch decoder network object
- z_dim
latent variable dimension
- kldiv_weight
weight KL divergene when computing ELBO
- qz_pdf
type of prob distribution of the approx. latent posterior
- pz_pdf
type of prob distribution of the latent prior
- px_pdf
type of prob distribution for the data likelihood
- flatten_spatial
if True all time/spatial dimensions are generated from a single latent vector, if False, we have multiple latents depending on the data size.
- spatial_shape
shape of the data, only needed if flatten_spatial=True
- scale_invariant
for future use
- data_scale = for future use
- __init__(encoder_net, decoder_net, z_dim, kldiv_weight=1, qz_pdf='normal-glob-diag-cov', pz_pdf='std-normal', px_pdf='normal-glob-diag-cov', flatten_spatial=False, spatial_shape=None, scale_invariant=False, data_scale=None)[source]
- property pz
- forward(x, x_target=None, return_x_mean=False, return_x_sample=False, return_z_sample=False, return_px=False, return_qz=False, serialize_pdfs=True, use_amp=False)[source]
- static add_argparse_args(parser, prefix=None)
- copy()
- property device
- freeze()
- get_loss()
- get_reg_loss()
- save(file_path)
- unfreeze()
- class hyperion.torch.models.vae.vq_vae.VQVAE(*args: Any, **kwargs: Any)[source]
- Vector Quantized Variational Autoencoder class
- encoder_net
NArch encoder network object
- decoder_net
NArch decoder network object
- z_dim
latent variable dimension
- kldiv_weight
weight KL divergene when computing ELBO
- diversity_weight
weigth for log-perplexity of the codebook, it inteds to maximize the number of codewords used.
- vq_type
type of vector quantizer
- vq_gropus
number of vector quantization groups.
- vq_clusters
number of codewords in each vq group
- vq_commitment_cost
weigth of the commitmenet loss
- vq_ema_gamma
exponential moving average decay coeff.
- vq_ema_eps
Laplace smoothing parameter
- px_pdf
type of prob distribution for the data likelihood
- flatten_spatial
if True all time/spatial dimensions are generated from a single latent vector, if False, we have multiple latents depending on the data size.
- spatial_shape
shape of the data, only needed if flatten_spatial=True
- scale_invariant
for future use
- data_scale = for future use
- __init__(encoder_net, decoder_net, z_dim, kldiv_weight=1, diversity_weight=0.1, vq_type='multi-ema-k-means-vq', vq_groups=1, vq_clusters=64, vq_commitment_cost=0.25, vq_ema_gamma=0.99, vq_ema_eps=1e-05, px_pdf='normal-glob-diag-cov', flatten_spatial=False, spatial_shape=None, scale_invariant=False, data_scale=None)[source]
- forward(x, x_target=None, return_x_mean=False, return_x_sample=False, return_z_sample=False, return_px=False, serialize_pdfs=True, use_amp=False)[source]
- static add_argparse_args(parser, prefix=None)
- copy()
- property device
- freeze()
- get_loss()
- get_reg_loss()
- save(file_path)
- unfreeze()
Losses
Custom loss classes
- class hyperion.torch.losses.bce_with_llr.BCEWithLLR(p_tar=0.5)[source]
- __init__(p_tar=0.5)[source]
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x, y)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- T_destination
alias of TypeVar(‘T_destination’, bound=
Mapping[str,torch.Tensor])
- _get_backward_hooks()
Returns the backward hooks for use in the call function. It returns two lists, one with the full backward hooks and one with the non-full backward hooks.
- _load_from_state_dict(state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs)
Copies parameters and buffers from
state_dictinto only this module, but not its descendants. This is called on every submodule inload_state_dict(). Metadata saved for this module in inputstate_dictis provided aslocal_metadata. For state dicts without metadata,local_metadatais empty. Subclasses can achieve class-specific backward compatible loading using the version number at local_metadata.get(“version”, None).Note
state_dictis not the same object as the inputstate_dicttoload_state_dict(). So it can be modified.- Parameters
state_dict (dict) – a dict containing parameters and persistent buffers.
prefix (str) – the prefix for parameters and buffers used in this module
local_metadata (dict) – a dict containing the metadata for this module. See
strict (bool) – whether to strictly enforce that the keys in
state_dictwithprefixmatch the names of parameters and buffers in this modulemissing_keys (list of str) – if
strict=True, add missing keys to this listunexpected_keys (list of str) – if
strict=True, add unexpected keys to this listerror_msgs (list of str) – error messages should be added to this list, and will be reported together in
load_state_dict()
- _named_members(get_members_fn, prefix='', recurse=True)
Helper method for yielding various names + members of modules.
- _register_load_state_dict_pre_hook(hook)
These hooks will be called with arguments: state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs, before loading state_dict into self. These arguments are exactly the same as those of _load_from_state_dict.
- _register_state_dict_hook(hook)
These hooks will be called with arguments: self, state_dict, prefix, local_metadata, after the state_dict of self is set. Note that only parameters and buffers of self or its children are guaranteed to exist in state_dict. The hooks may modify state_dict inplace or return a new one.
- _save_to_state_dict(destination, prefix, keep_vars)
Saves module state to destination dictionary, containing a state of the module, but not its descendants. This is called on every submodule in
state_dict().In rare cases, subclasses can achieve class-specific behavior by overriding this method with custom logic.
- Parameters
destination (dict) – a dict where state will be stored
prefix (str) – the prefix for parameters and buffers used in this module
- add_module(name: str, module: Optional[torch.nn.modules.module.Module]) None
Adds a child module to the current module.
The module can be accessed as an attribute using the given name.
- Parameters
name (string) – name of the child module. The child module can be accessed from this module using the given name
module (Module) – child module to be added to the module.
- apply(fn: Callable[[torch.nn.modules.module.Module], None]) torch.nn.modules.module.T
Applies
fnrecursively to every submodule (as returned by.children()) as well as self. Typical use includes initializing the parameters of a model (see also nn-init-doc).- Parameters
fn (
Module-> None) – function to be applied to each submodule- Returns
self
- Return type
Module
Example:
>>> @torch.no_grad() >>> def init_weights(m): >>> print(m) >>> if type(m) == nn.Linear: >>> m.weight.fill_(1.0) >>> print(m.weight) >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2)) >>> net.apply(init_weights) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )
- bfloat16() torch.nn.modules.module.T
Casts all floating point parameters and buffers to
bfloat16datatype.- Returns
self
- Return type
Module
- buffers(recurse: bool = True) Iterator[torch.Tensor]
Returns an iterator over module buffers.
- Parameters
recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.
- Yields
torch.Tensor – module buffer
Example:
>>> for buf in model.buffers(): >>> print(type(buf), buf.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
- children() Iterator[torch.nn.modules.module.Module]
Returns an iterator over immediate children modules.
- Yields
Module – a child module
- cpu() torch.nn.modules.module.T
Moves all model parameters and buffers to the CPU.
- Returns
self
- Return type
Module
- cuda(device: Optional[Union[int, torch.device]] = None) torch.nn.modules.module.T
Moves all model parameters and buffers to the GPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.
- Parameters
device (int, optional) – if specified, all parameters will be copied to that device
- Returns
self
- Return type
Module
- double() torch.nn.modules.module.T
Casts all floating point parameters and buffers to
doubledatatype.- Returns
self
- Return type
Module
- dump_patches: bool = False
This allows better BC support for
load_state_dict(). Instate_dict(), the version number will be saved as in the attribute _metadata of the returned state dict, and thus pickled. _metadata is a dictionary with keys that follow the naming convention of state dict. See_load_from_state_dicton how to use this information in loading.If new parameters/buffers are added/removed from a module, this number shall be bumped, and the module’s _load_from_state_dict method can compare the version number and do appropriate changes if the state dict is from before the change.
- eval() torch.nn.modules.module.T
Sets the module in evaluation mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout,BatchNorm, etc.This is equivalent with
self.train(False).- Returns
self
- Return type
Module
- extra_repr() str
Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- float() torch.nn.modules.module.T
Casts all floating point parameters and buffers to float datatype.
- Returns
self
- Return type
Module
- half() torch.nn.modules.module.T
Casts all floating point parameters and buffers to
halfdatatype.- Returns
self
- Return type
Module
- load_state_dict(state_dict: OrderedDict[str, Tensor], strict: bool = True)
Copies parameters and buffers from
state_dictinto this module and its descendants. IfstrictisTrue, then the keys ofstate_dictmust exactly match the keys returned by this module’sstate_dict()function.- Parameters
state_dict (dict) – a dict containing parameters and persistent buffers.
strict (bool, optional) – whether to strictly enforce that the keys in
state_dictmatch the keys returned by this module’sstate_dict()function. Default:True
- Returns
missing_keys is a list of str containing the missing keys
unexpected_keys is a list of str containing the unexpected keys
- Return type
NamedTuplewithmissing_keysandunexpected_keysfields
- modules() Iterator[torch.nn.modules.module.Module]
Returns an iterator over all modules in the network.
- Yields
Module – a module in the network
Note
Duplicate modules are returned only once. In the following example,
lwill be returned only once.Example:
>>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.modules()): print(idx, '->', m) 0 -> Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) 1 -> Linear(in_features=2, out_features=2, bias=True)
- named_buffers(prefix: str = '', recurse: bool = True) Iterator[Tuple[str, torch.Tensor]]
Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
- Parameters
prefix (str) – prefix to prepend to all buffer names.
recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.
- Yields
(string, torch.Tensor) – Tuple containing the name and buffer
Example:
>>> for name, buf in self.named_buffers(): >>> if name in ['running_var']: >>> print(buf.size())
- named_children() Iterator[Tuple[str, torch.nn.modules.module.Module]]
Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
- Yields
(string, Module) – Tuple containing a name and child module
Example:
>>> for name, module in model.named_children(): >>> if name in ['conv4', 'conv5']: >>> print(module)
- named_modules(memo: Optional[Set[torch.nn.modules.module.Module]] = None, prefix: str = '')
Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
- Yields
(string, Module) – Tuple of name and module
Note
Duplicate modules are returned only once. In the following example,
lwill be returned only once.Example:
>>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.named_modules()): print(idx, '->', m) 0 -> ('', Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )) 1 -> ('0', Linear(in_features=2, out_features=2, bias=True))
- named_parameters(prefix: str = '', recurse: bool = True) Iterator[Tuple[str, torch.nn.parameter.Parameter]]
Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
- Parameters
prefix (str) – prefix to prepend to all parameter names.
recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.
- Yields
(string, Parameter) – Tuple containing the name and parameter
Example:
>>> for name, param in self.named_parameters(): >>> if name in ['bias']: >>> print(param.size())
- parameters(recurse: bool = True) Iterator[torch.nn.parameter.Parameter]
Returns an iterator over module parameters.
This is typically passed to an optimizer.
- Parameters
recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.
- Yields
Parameter – module parameter
Example:
>>> for param in model.parameters(): >>> print(type(param), param.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
- register_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) torch.utils.hooks.RemovableHandle
Registers a backward hook on the module.
This function is deprecated in favor of
nn.Module.register_full_backward_hook()and the behavior of this function will change in future versions.- Returns
a handle that can be used to remove the added hook by calling
handle.remove()- Return type
torch.utils.hooks.RemovableHandle
- register_buffer(name: str, tensor: Optional[torch.Tensor], persistent: bool = True) None
Adds a buffer to the module.
This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s
running_meanis not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by settingpersistenttoFalse. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’sstate_dict.Buffers can be accessed as attributes using given names.
- Parameters
name (string) – name of the buffer. The buffer can be accessed from this module using the given name
tensor (Tensor) – buffer to be registered.
persistent (bool) – whether the buffer is part of this module’s
state_dict.
Example:
>>> self.register_buffer('running_mean', torch.zeros(num_features))
- register_forward_hook(hook: Callable[[...], None]) torch.utils.hooks.RemovableHandle
Registers a forward hook on the module.
The hook will be called every time after
forward()has computed an output. It should have the following signature:hook(module, input, output) -> None or modified output
The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the
forward. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called afterforward()is called.- Returns
a handle that can be used to remove the added hook by calling
handle.remove()- Return type
torch.utils.hooks.RemovableHandle
- register_forward_pre_hook(hook: Callable[[...], None]) torch.utils.hooks.RemovableHandle
Registers a forward pre-hook on the module.
The hook will be called every time before
forward()is invoked. It should have the following signature:hook(module, input) -> None or modified input
The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the
forward. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple).- Returns
a handle that can be used to remove the added hook by calling
handle.remove()- Return type
torch.utils.hooks.RemovableHandle
- register_full_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) torch.utils.hooks.RemovableHandle
Registers a backward hook on the module.
The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature:
hook(module, grad_input, grad_output) -> tuple(Tensor) or None
The
grad_inputandgrad_outputare tuples that contain the gradients with respect to the inputs and outputs respectively. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place ofgrad_inputin subsequent computations.grad_inputwill only correspond to the inputs given as positional arguments and all kwarg arguments are ignored. Entries ingrad_inputandgrad_outputwill beNonefor all non-Tensor arguments.Warning
Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error.
- Returns
a handle that can be used to remove the added hook by calling
handle.remove()- Return type
torch.utils.hooks.RemovableHandle
- register_parameter(name: str, param: Optional[torch.nn.parameter.Parameter]) None
Adds a parameter to the module.
The parameter can be accessed as an attribute using given name.
- Parameters
name (string) – name of the parameter. The parameter can be accessed from this module using the given name
param (Parameter) – parameter to be added to the module.
- requires_grad_(requires_grad: bool = True) torch.nn.modules.module.T
Change if autograd should record operations on parameters in this module.
This method sets the parameters’
requires_gradattributes in-place.This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training).
- Parameters
requires_grad (bool) – whether autograd should record operations on parameters in this module. Default:
True.- Returns
self
- Return type
Module
- state_dict(destination=None, prefix='', keep_vars=False)
Returns a dictionary containing a whole state of the module.
Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names.
- Returns
a dictionary containing a whole state of the module
- Return type
dict
Example:
>>> module.state_dict().keys() ['bias', 'weight']
- to(*args, **kwargs)
Moves and/or casts the parameters and buffers.
This can be called as
- to(device=None, dtype=None, non_blocking=False)
- to(dtype, non_blocking=False)
- to(tensor, non_blocking=False)
- to(memory_format=torch.channels_last)
Its signature is similar to
torch.Tensor.to(), but only accepts floating point or complexdtype`s. In addition, this method will only cast the floating point or complex parameters and buffers to :attr:`dtype(if given). The integral parameters and buffers will be moveddevice, if that is given, but with dtypes unchanged. Whennon_blockingis set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.See below for examples.
Note
This method modifies the module in-place.
- Parameters
device (
torch.device) – the desired device of the parameters and buffers in this moduledtype (
torch.dtype) – the desired floating point or complex dtype of the parameters and buffers in this moduletensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module
memory_format (
torch.memory_format) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)
- Returns
self
- Return type
Module
Examples:
>>> linear = nn.Linear(2, 2) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]]) >>> linear.to(torch.double) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]], dtype=torch.float64) >>> gpu1 = torch.device("cuda:1") >>> linear.to(gpu1, dtype=torch.half, non_blocking=True) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1') >>> cpu = torch.device("cpu") >>> linear.to(cpu) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16) >>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble) >>> linear.weight Parameter containing: tensor([[ 0.3741+0.j, 0.2382+0.j], [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128) >>> linear(torch.ones(3, 2, dtype=torch.cdouble)) tensor([[0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)
- train(mode: bool = True) torch.nn.modules.module.T
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout,BatchNorm, etc.- Parameters
mode (bool) – whether to set training mode (
True) or evaluation mode (False). Default:True.- Returns
self
- Return type
Module
- type(dst_type: Union[torch.dtype, str]) torch.nn.modules.module.T
Casts all parameters and buffers to
dst_type.- Parameters
dst_type (type or string) – the desired type
- Returns
self
- Return type
Module
- xpu(device: Optional[Union[int, torch.device]] = None) torch.nn.modules.module.T
Moves all model parameters and buffers to the XPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on XPU while being optimized.
- Parameters
device (int, optional) – if specified, all parameters will be copied to that device
- Returns
self
- Return type
Module
- zero_grad(set_to_none: bool = False) None
Sets gradients of all model parameters to zero. See similar function under
torch.optim.Optimizerfor more context.- Parameters
set_to_none (bool) – instead of setting to zero, set the grads to None. See
torch.optim.Optimizer.zero_grad()for details.
- training: bool
Adversarial Attacks
It contains classes to generate adversarial attacks for speaker recognition.
Attack Generation Classes
All the adv. attacks derive from the same base class:
- class hyperion.torch.adv_attacks.adv_attack.AdvAttack(model, loss=None, targeted=True, range_min=None, range_max=None)[source]
-
- property attack_info
FGSM
- class hyperion.torch.adv_attacks.fgsm_attack.FGSMAttack(model, eps, loss=None, targeted=False, range_min=None, range_max=None)[source]
-
- property attack_info
- to(device)
- class hyperion.torch.adv_attacks.snr_fgsm_attack.SNRFGSMAttack(model, snr, loss=None, targeted=False, range_min=None, range_max=None)[source]
-
- property attack_info
- to(device)
- class hyperion.torch.adv_attacks.rand_fgsm_attack.RandFGSMAttack(model, eps, alpha, loss=None, targeted=False, range_min=None, range_max=None)[source]
-
- property attack_info
- to(device)
PGD
- class hyperion.torch.adv_attacks.pgd_attack.PGDAttack(model, eps, alpha, norm, max_iter=10, random_eps=False, num_random_init=0, loss=None, norm_time=False, time_dim=None, targeted=False, range_min=None, range_max=None)[source]
- __init__(model, eps, alpha, norm, max_iter=10, random_eps=False, num_random_init=0, loss=None, norm_time=False, time_dim=None, targeted=False, range_min=None, range_max=None)[source]
- property attack_info
- static _random_sphere(shape, eps, norm, dtype, device)[source]
We use Theorem 1 in https://arxiv.org/pdf/math/0503650.pdf to sample uniformly from l_p balls in R^n
- to(device)
Carlini-Wagner
Carlini-Wagner attacks derive from the same base class:
- class hyperion.torch.adv_attacks.carlini_wagner.CarliniWagner(model, confidence=0.0, lr=0.01, max_iter=10000, abort_early=True, initial_c=0.001, norm_time=False, time_dim=None, use_snr=False, targeted=False, range_min=None, range_max=None)[source]
- __init__(model, confidence=0.0, lr=0.01, max_iter=10000, abort_early=True, initial_c=0.001, norm_time=False, time_dim=None, use_snr=False, targeted=False, range_min=None, range_max=None)[source]
- property attack_info
- to(device)
- class hyperion.torch.adv_attacks.carlini_wagner_l2.CarliniWagnerL2(model, confidence=0.0, lr=0.01, binary_search_steps=9, max_iter=10000, abort_early=True, initial_c=0.001, norm_time=False, time_dim=None, use_snr=False, targeted=False, range_min=None, range_max=None)[source]
- __init__(model, confidence=0.0, lr=0.01, binary_search_steps=9, max_iter=10000, abort_early=True, initial_c=0.001, norm_time=False, time_dim=None, use_snr=False, targeted=False, range_min=None, range_max=None)[source]
- property attack_info
- static atanh(x, eps=1e-06)
- f(z, target)
- to(device)
- w_x(x)
- x_w(w)
- class hyperion.torch.adv_attacks.carlini_wagner_linf.CarliniWagnerLInf(model, confidence=0.0, lr=0.01, max_iter=10000, abort_early=True, initial_c=0.001, reduce_c=False, c_incr_factor=2, tau_decr_factor=0.9, targeted=False, range_min=None, range_max=None)[source]
- __init__(model, confidence=0.0, lr=0.01, max_iter=10000, abort_early=True, initial_c=0.001, reduce_c=False, c_incr_factor=2, tau_decr_factor=0.9, targeted=False, range_min=None, range_max=None)[source]
- property attack_info
- static atanh(x, eps=1e-06)
- f(z, target)
- to(device)
- w_x(x)
- x_w(w)
- class hyperion.torch.adv_attacks.carlini_wagner_l0.CarliniWagnerL0(model, confidence=0.0, lr=0.01, max_iter=10000, abort_early=True, initial_c=0.001, reduce_c=False, c_incr_factor=2, indep_channels=False, targeted=False, range_min=None, range_max=None)[source]
- __init__(model, confidence=0.0, lr=0.01, max_iter=10000, abort_early=True, initial_c=0.001, reduce_c=False, c_incr_factor=2, indep_channels=False, targeted=False, range_min=None, range_max=None)[source]
- property attack_info
- static atanh(x, eps=1e-06)
- f(z, target)
- to(device)
- w_x(x)
- x_w(w)
Attack Generator Factories
These are factory classes that create attack generator objects. They create attacks from Hyperion or from the Adversarial Robustness Toolbox <https://github.com/Trusted-AI/adversarial-robustness-toolbox>
- class hyperion.torch.adv_attacks.attack_factory.AttackFactory[source]
- static create(model, attack_type, eps=0, snr=100, alpha=0, norm=inf, random_eps=False, num_random_init=0, confidence=0.0, lr=0.01, binary_search_steps=9, max_iter=10, abort_early=True, c=0.001, reduce_c=False, c_incr_factor=2, tau_decr_factor=0.9, indep_channels=False, norm_time=False, time_dim=None, use_snr=False, loss=None, targeted=False, range_min=None, range_max=None, eps_scale=1)[source]
- static add_argparse_args(parser, prefix=None)
- class hyperion.torch.adv_attacks.random_attack_factory.RandomAttackFactory(attack_types, min_eps=1e-05, max_eps=0.1, min_snr=30, max_snr=60, min_alpha=1e-05, max_alpha=0.02, norms=[inf], random_eps=False, min_num_random_init=0, max_num_random_init=3, min_confidence=0, max_confidence=1, min_lr=0.001, max_lr=0.01, min_binary_search_steps=9, max_binary_search_steps=9, min_iter=5, max_iter=10, abort_early=True, min_c=0.001, max_c=0.01, reduce_c=False, c_incr_factor=2, tau_decr_factor=0.9, indep_channels=False, norm_time=False, time_dim=None, use_snr=False, loss=None, targeted=False, range_min=None, range_max=None, eps_scale=1)[source]
- __init__(attack_types, min_eps=1e-05, max_eps=0.1, min_snr=30, max_snr=60, min_alpha=1e-05, max_alpha=0.02, norms=[inf], random_eps=False, min_num_random_init=0, max_num_random_init=3, min_confidence=0, max_confidence=1, min_lr=0.001, max_lr=0.01, min_binary_search_steps=9, max_binary_search_steps=9, min_iter=5, max_iter=10, abort_early=True, min_c=0.001, max_c=0.01, reduce_c=False, c_incr_factor=2, tau_decr_factor=0.9, indep_channels=False, norm_time=False, time_dim=None, use_snr=False, loss=None, targeted=False, range_min=None, range_max=None, eps_scale=1)[source]
- static add_argparse_args(parser, prefix=None)
- class hyperion.torch.adv_attacks.art_attack_factory.ARTAttackFactory[source]
- static create(model, attack_type, eps=0, delta=0.01, step_adapt=0.667, num_trial=25, sample_size=20, init_size=100, norm=inf, eps_step=0.1, num_random_init=0, minimal=False, random_eps=False, min_eps=None, beta=0.001, theta=0.1, gamma=1.0, etha=0.01, confidence=0.0, lr=0.01, lr_decay=0.5, lr_num_decay=20, momentum=0.8, binary_search_steps=9, max_iter=10, overshoot=1.1, num_grads=10, c=0.001, max_halving=5, max_doubling=5, decision_rule='EN', init_eval=100, max_eval=10000, num_parallel=128, variable_h=0.0001, use_importance=False, abort_early=True, th=None, sigma=0.5, lambda_tv=0.3, labmda_c=1.0, lambda_s=0.5, reg=3000, kernel_size=5, eps_factor=1.1, eps_iter=10, conj_sinkhorn_iter=400, proj_sinkhorn_iter=400, targeted=False, num_samples=1, eps_scale=1, batch_size=1)[source]
- static add_argparse_args(parser, prefix=None)
Trainers
Generic Trainer
- class hyperion.torch.trainers.torch_trainer.TorchTrainer(model, loss, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
Base Trainer class to train basic neural network models
- model
model object.
- loss
nn.Module loss class
- optim
pytorch optimizer object or optimizer options dict
- epochs
max. number of epochs
- exp_path
experiment output path
- cur_epoch
current epoch
- grad_acc_steps
gradient accumulation steps to simulate larger batch size.
- device
cpu/gpu device
- metrics
extra metrics to compute besides cxe.
- lrsched
learning rate scheduler object
- loggers
LoggerList object, loggers write training progress to std. output and file.
- ddp
if True use distributed data parallel training
- ddp_type
type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)
- train_mode
training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]
- use_amp
uses mixed precision training.
- log_interval
number of optim. steps between log outputs
- use_tensorboard
use tensorboard logger
- use_wandb
use wandb logger
- wandb
wandb dictionary of options
- grad_clip
norm to clip gradients, if 0 there is no clipping
- grad_clip_norm
norm type to clip gradients
- swa_start
epoch to start doing swa
- swa_lr
SWA learning rate
- swa_anneal_epochs
SWA learning rate anneal epochs
- cpu_offload
CPU offload of gradients when using fully sharded ddp
- __init__(model, loss, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
- fit(train_data, val_data=None)[source]
Training function, it performs the training and validation epochs
- Parameters
train_data – PyTorch data loader for the training loop
val_data – PyTorch data loader for the validation loop
- train_epoch(data_loader)[source]
Training epoch loop
- Parameters
data_loader – PyTorch data loader return input/output pairs
- validation_epoch(data_loader, swa_update_bn=False)[source]
Validation epoch loop
- Parameters
data_loader – PyTorch data loader return input/output pairs
- _default_loggers(log_interval, use_tensorboard, use_wandb, wandb)[source]
Creates the default data loaders
- checkpoint(logs=None)[source]
Creates a checkpoint of the training, to save and posterior recovery
- Parameters
logs – logs containing the current value of the metrics.
- save_checkpoint(logs=None)[source]
Saves a checkpoint of the training status
- Parameters
logs – logs containing the current value of the metrics.
- save_swa_model(logs=None)[source]
Saves a checkpoint of the training status
- Parameters
logs – logs containing the current value of the metrics.
- load_checkpoint(file_path)[source]
Loads a training checkpoint from file.
- Parameters
file_path – checkpoint file path
- static add_argparse_args(parser, prefix=None, skip=[])
x-Vector Trainers
- class hyperion.torch.trainers.xvector_trainer.XVectorTrainer(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', loss=None, train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
Trainer to train x-vector style models.
- model
x-Vector model object.
- optim
pytorch optimizer object or options dict
- epochs
max. number of epochs
- exp_path
experiment output path
- cur_epoch
current epoch
- grad_acc_steps
gradient accumulation steps to simulate larger batch size.
- device
cpu/gpu device
- metrics
extra metrics to compute besides cxe.
- lrsched
learning rate scheduler object or options dict
- loggers
LoggerList object, loggers write training progress to std. output and file. If None, it uses default loggers.
- ddp
if True use distributed data parallel training
- ddp_type
type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)
- loss
if None, it uses cross-entropy
- train_mode
training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]
- use_amp
uses mixed precision training.
- log_interval
number of optim. steps between log outputs
- use_tensorboard
use tensorboard logger
- use_wandb
use wandb logger
- wandb
wandb dictionary of options
- grad_clip
norm to clip gradients, if 0 there is no clipping
- grad_clip_norm
norm type to clip gradients
- swa_start
epoch to start doing swa
- swa_lr
SWA learning rate
- swa_anneal_epochs
SWA learning rate anneal epochs
- cpu_offload
CPU offload of gradients when using fully sharded ddp
- __init__(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', loss=None, train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
- train_epoch(data_loader)[source]
Training epoch loop
- Parameters
data_loader – pytorch data loader returning features and class labels.
- _default_loggers(log_interval, use_tensorboard, use_wandb, wandb)
Creates the default data loaders
- _get_lr()
Returns the current learning rate to show in the loggers
- static add_argparse_args(parser, prefix=None, skip=[])
- static add_class_args(parser, prefix=None, skip=[])
- bn_update_epoch(data_loader)
- checkpoint(logs=None)
Creates a checkpoint of the training, to save and posterior recovery
- Parameters
logs – logs containing the current value of the metrics.
- static filter_args(**kwargs)
- fit(train_data, val_data=None)
Training function, it performs the training and validation epochs
- Parameters
train_data – PyTorch data loader for the training loop
val_data – PyTorch data loader for the validation loop
- load_checkpoint(file_path)
Loads a training checkpoint from file.
- Parameters
file_path – checkpoint file path
- load_last_checkpoint()
Loads the last training checkpoint in the experiment dir.
- save_checkpoint(logs=None)
Saves a checkpoint of the training status
- Parameters
logs – logs containing the current value of the metrics.
- save_swa_model(logs=None)
Saves a checkpoint of the training status
- Parameters
logs – logs containing the current value of the metrics.
- set_train_mode()
- update_model()
- validation_epoch(data_loader, swa_update_bn=False)
Validation epoch loop
- Parameters
data_loader – PyTorch data loader return input/output pairs
- class hyperion.torch.trainers.xvector_trainer_from_wav.XVectorTrainerFromWav(model, feat_extractor, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', loss=None, train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
Trainer to train x-vector style models.
- model
x-Vector model object.
- feat_extractor
feature extractor nn.Module
- optim
pytorch optimizer object or options dict
- epochs
max. number of epochs
- exp_path
experiment output path
- cur_epoch
current epoch
- grad_acc_steps
gradient accumulation steps to simulate larger batch size.
- device
cpu/gpu device
- metrics
extra metrics to compute besides cxe.
- lrsched
learning rate scheduler object or options dict.
- loggers
LoggerList object, loggers write training progress to std. output and file.
- ddp
if True use distributed data parallel training
- ddp_type
type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)
- loss
if None, it uses cross-entropy
- train_mode
training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]
- use_amp
uses mixed precision training.
- log_interval
number of optim. steps between log outputs
- use_tensorboard
use tensorboard logger
- use_wandb
use wandb logger
- wandb
wandb dictionary of options
- grad_clip
norm to clip gradients, if 0 there is no clipping
- grad_clip_norm
norm type to clip gradients
- swa_start
epoch to start doing swa
- swa_lr
SWA learning rate
- swa_anneal_epochs
SWA learning rate anneal epochs
- cpu_offload
CPU offload of gradients when using fully sharded ddp
- __init__(model, feat_extractor, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', loss=None, train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
- train_epoch(data_loader)[source]
Training epoch loop
- Parameters
data_loader – pytorch data loader returning features and class labels.
- validation_epoch(data_loader, swa_update_bn=False)[source]
Validation epoch loop
- Parameters
data_loader – PyTorch data loader return input/output pairs
- _default_loggers(log_interval, use_tensorboard, use_wandb, wandb)
Creates the default data loaders
- _get_lr()
Returns the current learning rate to show in the loggers
- static add_argparse_args(parser, prefix=None, skip=[])
- static add_class_args(parser, prefix=None, skip=[])
- bn_update_epoch(data_loader)
- checkpoint(logs=None)
Creates a checkpoint of the training, to save and posterior recovery
- Parameters
logs – logs containing the current value of the metrics.
- static filter_args(**kwargs)
- fit(train_data, val_data=None)
Training function, it performs the training and validation epochs
- Parameters
train_data – PyTorch data loader for the training loop
val_data – PyTorch data loader for the validation loop
- load_checkpoint(file_path)
Loads a training checkpoint from file.
- Parameters
file_path – checkpoint file path
- load_last_checkpoint()
Loads the last training checkpoint in the experiment dir.
- save_checkpoint(logs=None)
Saves a checkpoint of the training status
- Parameters
logs – logs containing the current value of the metrics.
- save_swa_model(logs=None)
Saves a checkpoint of the training status
- Parameters
logs – logs containing the current value of the metrics.
- set_train_mode()
- update_model()
- class hyperion.torch.trainers.xvector_trainer_deep_feat_reg.XVectorTrainerDeepFeatReg(model, prior_model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, reg_layers_enc=None, reg_layers_classif=None, reg_weight_enc=0.1, reg_weight_classif=0.1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', loss=None, reg_loss=None, train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
Trainer to train x-vector style models.
- model
x-Vector model object that we want to fine-tune
- prior_model
x-Vector model object that we use as regularizer
- optim
pytorch optimizer object or options dict
- epochs
max. number of epochs
- exp_path
experiment output path
- cur_epoch
current epoch
- grad_acc_steps
gradient accumulation steps to simulate larger batch size.
- reg_layers_enc
list of encoder layer indexes that we use for regularization
- reg_layers_classif
list of classification head layer indexes that we use for regularization
- reg_weight_enc
weight of the regularization loss for encoder hidden activations
- reg_weight_classif
weight of the regularization loss for classification head hidden activations
- device
cpu/gpu device
- metrics
extra metrics to compute besides cxe.
- lrsched
learning rate scheduler object or options dict.
- loggers
LoggerList object, loggers write training progress to std. output and file.
- ddp
if True use distributed data parallel training
- ddp_type
type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)
- loss
if None, it uses cross-entropy
- reg_loss
nn.Module loss used for regularization, if None it uses L1 loss.
- train_mode
training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]
- use_amp
uses mixed precision training.
- log_interval
number of optim. steps between log outputs
- use_tensorboard
use tensorboard logger
- use_wandb
use wandb logger
- wandb
wandb dictionary of options
- grad_clip
norm to clip gradients, if 0 there is no clipping
- grad_clip_norm
norm type to clip gradients
- swa_start
epoch to start doing swa
- swa_lr
SWA learning rate
- swa_anneal_epochs
SWA learning rate anneal epochs
- cpu_offload
CPU offload of gradients when using fully sharded ddp
- __init__(model, prior_model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, reg_layers_enc=None, reg_layers_classif=None, reg_weight_enc=0.1, reg_weight_classif=0.1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', loss=None, reg_loss=None, train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
- train_epoch(data_loader)[source]
Training epoch loop
- Parameters
data_loader – PyTorch data loader return input/output pairs
- _default_loggers(log_interval, use_tensorboard, use_wandb, wandb)
Creates the default data loaders
- _get_lr()
Returns the current learning rate to show in the loggers
- static add_argparse_args(parser, prefix=None, skip=[])
- bn_update_epoch(data_loader)
- checkpoint(logs=None)
Creates a checkpoint of the training, to save and posterior recovery
- Parameters
logs – logs containing the current value of the metrics.
- fit(train_data, val_data=None)
Training function, it performs the training and validation epochs
- Parameters
train_data – PyTorch data loader for the training loop
val_data – PyTorch data loader for the validation loop
- load_checkpoint(file_path)
Loads a training checkpoint from file.
- Parameters
file_path – checkpoint file path
- load_last_checkpoint()
Loads the last training checkpoint in the experiment dir.
- save_checkpoint(logs=None)
Saves a checkpoint of the training status
- Parameters
logs – logs containing the current value of the metrics.
- save_swa_model(logs=None)
Saves a checkpoint of the training status
- Parameters
logs – logs containing the current value of the metrics.
- set_train_mode()
- update_model()
- validation_epoch(data_loader, swa_update_bn=False)
Validation epoch loop
- Parameters
data_loader – PyTorch data loader return input/output pairs
- class hyperion.torch.trainers.xvector_trainer_deep_feat_reg_from_wav.XVectorTrainerDeepFeatRegFromWav(model, feat_extractor, prior_model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, reg_layers_enc=None, reg_layers_classif=None, reg_weight_enc=0.1, reg_weight_classif=0.1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', loss=None, reg_loss=None, train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
Trainer to train x-vector style models.
- model
x-Vector model object that we want to fine-tune
- feat_extractor
feature extractor nn.Module
- prior_model
x-Vector model object that we use as regularizer
- optim
pytorch optimizer object or options dict
- epochs
max. number of epochs
- exp_path
experiment output path
- cur_epoch
current epoch
- grad_acc_steps
gradient accumulation steps to simulate larger batch size.
- reg_layers_enc
list of encoder layer indexes that we use for regularization
- reg_layers_classif
list of classification head layer indexes that we use for regularization
- reg_weight_enc
weight of the regularization loss for encoder hidden activations
- reg_weight_classif
weight of the regularization loss for classification head hidden activations
- device
cpu/gpu device
- lrsched
learning rate scheduler object or options dict.
- loggers
LoggerList object, loggers write training progress to std. output and file.
- ddp
if True use distributed data parallel training
- ddp_type
type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)
- loss
if None, it uses cross-entropy
- reg_loss
nn.Module loss used for regularization, if None it uses L1 loss.
- train_mode
training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]
- use_amp
uses mixed precision training.
- log_interval
number of optim. steps between log outputs
- use_tensorboard
use tensorboard logger
- use_wandb
use wandb logger
- wandb
wandb dictionary of options
- grad_clip
norm to clip gradients, if 0 there is no clipping
- grad_clip_norm
norm type to clip gradients
- swa_start
epoch to start doing swa
- swa_lr
SWA learning rate
- swa_anneal_epochs
SWA learning rate anneal epochs
- cpu_offload
CPU offload of gradients when using fully sharded ddp
- __init__(model, feat_extractor, prior_model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, reg_layers_enc=None, reg_layers_classif=None, reg_weight_enc=0.1, reg_weight_classif=0.1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', loss=None, reg_loss=None, train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
- train_epoch(data_loader)[source]
Training epoch loop
- Parameters
data_loader – PyTorch data loader return input/output pairs
- validation_epoch(data_loader, swa_update_bn=False)[source]
Validation epoch loop
- Parameters
data_loader – PyTorch data loader return input/output pairs
- _default_loggers(log_interval, use_tensorboard, use_wandb, wandb)
Creates the default data loaders
- _get_lr()
Returns the current learning rate to show in the loggers
- static add_argparse_args(parser, prefix=None, skip=[])
- static add_class_args(parser, prefix=None, skip=[])
- bn_update_epoch(data_loader)
- checkpoint(logs=None)
Creates a checkpoint of the training, to save and posterior recovery
- Parameters
logs – logs containing the current value of the metrics.
- static filter_args(**kwargs)
- fit(train_data, val_data=None)
Training function, it performs the training and validation epochs
- Parameters
train_data – PyTorch data loader for the training loop
val_data – PyTorch data loader for the validation loop
- load_checkpoint(file_path)
Loads a training checkpoint from file.
- Parameters
file_path – checkpoint file path
- load_last_checkpoint()
Loads the last training checkpoint in the experiment dir.
- save_checkpoint(logs=None)
Saves a checkpoint of the training status
- Parameters
logs – logs containing the current value of the metrics.
- save_swa_model(logs=None)
Saves a checkpoint of the training status
- Parameters
logs – logs containing the current value of the metrics.
- set_train_mode()
- update_model()
Auto-encoder Trainer
- class hyperion.torch.trainers.ae_trainer.AETrainer(model, loss, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
Auto-encoder trainer class
- model
model object.
- loss
nn.Module loss class
- optim
pytorch optimizer object or optimizer options dict
- epochs
max. number of epochs
- exp_path
experiment output path
- cur_epoch
current epoch
- grad_acc_steps
gradient accumulation steps to simulate larger batch size.
- device
cpu/gpu device
- metrics
extra metrics to compute besides cxe.
- lrsched
learning rate scheduler object
- loggers
LoggerList object, loggers write training progress to std. output and file.
- ddp
if True use distributed data parallel training
- ddp_type
type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)
- train_mode
training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]
- use_amp
uses mixed precision training.
- log_interval
number of optim. steps between log outputs
- use_tensorboard
use tensorboard logger
- use_wandb
use wandb logger
- wandb
wandb dictionary of options
- grad_clip
norm to clip gradients, if 0 there is no clipping
- grad_clip_norm
norm type to clip gradients
- swa_start
epoch to start doing swa
- swa_lr
SWA learning rate
- swa_anneal_epochs
SWA learning rate anneal epochs
- cpu_offload
CPU offload of gradients when using fully sharded ddp
- __init__(model, loss, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
- train_epoch(data_loader)[source]
Training epoch loop
- Parameters
data_loader – pytorch data loader returning features and class labels.
- validation_epoch(data_loader, swa_update_bn=False)[source]
Validation epoch loop
- Parameters
data_loader – PyTorch data loader return input/output pairs
- _default_loggers(log_interval, use_tensorboard, use_wandb, wandb)
Creates the default data loaders
- _get_lr()
Returns the current learning rate to show in the loggers
- static add_argparse_args(parser, prefix=None, skip=[])
- static add_class_args(parser, prefix=None, skip=[])
- bn_update_epoch(data_loader)
- checkpoint(logs=None)
Creates a checkpoint of the training, to save and posterior recovery
- Parameters
logs – logs containing the current value of the metrics.
- static filter_args(**kwargs)
- fit(train_data, val_data=None)
Training function, it performs the training and validation epochs
- Parameters
train_data – PyTorch data loader for the training loop
val_data – PyTorch data loader for the validation loop
- load_checkpoint(file_path)
Loads a training checkpoint from file.
- Parameters
file_path – checkpoint file path
- load_last_checkpoint()
Loads the last training checkpoint in the experiment dir.
- save_checkpoint(logs=None)
Saves a checkpoint of the training status
- Parameters
logs – logs containing the current value of the metrics.
- save_swa_model(logs=None)
Saves a checkpoint of the training status
- Parameters
logs – logs containing the current value of the metrics.
- set_train_mode()
- update_model()
VAE Trainers
- class hyperion.torch.trainers.vae_trainer.VAETrainer(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
Variational Auto-encoder trainer class
- model
model object.
- optim
pytorch optimizer object or optimizer options dict
- epochs
max. number of epochs
- exp_path
experiment output path
- cur_epoch
current epoch
- grad_acc_steps
gradient accumulation steps to simulate larger batch size.
- device
cpu/gpu device
- metrics
extra metrics to compute besides cxe.
- lrsched
learning rate scheduler object
- loggers
LoggerList object, loggers write training progress to std. output and file.
- ddp
if True use distributed data parallel training
- ddp_type
type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)
- train_mode
training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]
- use_amp
uses mixed precision training.
- log_interval
number of optim. steps between log outputs
- log_interval
number of optim. steps between log outputs
- use_tensorboard
use tensorboard logger
- use_wandb
use wandb logger
- grad_clip
norm to clip gradients, if 0 there is no clipping
- grad_clip_norm
norm type to clip gradients
- swa_start
epoch to start doing swa
- swa_lr
SWA learning rate
- swa_anneal_epochs
SWA learning rate anneal epochs
- cpu_offload
CPU offload of gradients when using fully sharded ddp
- __init__(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
- train_epoch(data_loader)[source]
Training epoch loop
- Parameters
data_loader – PyTorch data loader return input/output pairs
- validation_epoch(data_loader, swa_update_bn=False)[source]
Validation epoch loop
- Parameters
data_loader – PyTorch data loader return input/output pairs
- _default_loggers(log_interval, use_tensorboard, use_wandb, wandb)
Creates the default data loaders
- _get_lr()
Returns the current learning rate to show in the loggers
- static add_argparse_args(parser, prefix=None, skip=[])
- static add_class_args(parser, prefix=None, skip=[])
- bn_update_epoch(data_loader)
- checkpoint(logs=None)
Creates a checkpoint of the training, to save and posterior recovery
- Parameters
logs – logs containing the current value of the metrics.
- static filter_args(**kwargs)
- fit(train_data, val_data=None)
Training function, it performs the training and validation epochs
- Parameters
train_data – PyTorch data loader for the training loop
val_data – PyTorch data loader for the validation loop
- load_checkpoint(file_path)
Loads a training checkpoint from file.
- Parameters
file_path – checkpoint file path
- load_last_checkpoint()
Loads the last training checkpoint in the experiment dir.
- save_checkpoint(logs=None)
Saves a checkpoint of the training status
- Parameters
logs – logs containing the current value of the metrics.
- save_swa_model(logs=None)
Saves a checkpoint of the training status
- Parameters
logs – logs containing the current value of the metrics.
- set_train_mode()
- update_model()
- class hyperion.torch.trainers.dvae_trainer.DVAETrainer(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
Denoising VAE trainer class
- model
model object.
- optim
pytorch optimizer object or optimizer options dict
- epochs
max. number of epochs
- exp_path
experiment output path
- cur_epoch
current epoch
- grad_acc_steps
gradient accumulation steps to simulate larger batch size.
- device
cpu/gpu device
- metrics
extra metrics to compute besides cxe.
- lrsched
learning rate scheduler object
- loggers
LoggerList object, loggers write training progress to std. output and file.
- ddp
if True use distributed data parallel training
- ddp_type
type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)
- train_mode
training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]
- use_amp
uses mixed precision training.
- log_interval
number of optim. steps between log outputs
- use_tensorboard
use tensorboard logger
- use_wandb
use wandb logger
- wandb
wandb dictionary of options
- grad_clip
norm to clip gradients, if 0 there is no clipping
- grad_clip_norm
norm type to clip gradients
- swa_start
epoch to start doing swa
- swa_lr
SWA learning rate
- swa_anneal_epochs
SWA learning rate anneal epochs
- cpu_offload
CPU offload of gradients when using fully sharded ddp
- __init__(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
- train_epoch(data_loader)[source]
Training epoch loop
- Parameters
data_loader – pytorch data loader returning noisy and clean features
- validation_epoch(data_loader, swa_update_bn=False)[source]
Validation epoch loop
- Parameters
data_loader – PyTorch data loader return input/output pairs
- _default_loggers(log_interval, use_tensorboard, use_wandb, wandb)
Creates the default data loaders
- _get_lr()
Returns the current learning rate to show in the loggers
- static add_argparse_args(parser, prefix=None, skip=[])
- static add_class_args(parser, prefix=None, skip=[])
- bn_update_epoch(data_loader)
- checkpoint(logs=None)
Creates a checkpoint of the training, to save and posterior recovery
- Parameters
logs – logs containing the current value of the metrics.
- static filter_args(**kwargs)
- fit(train_data, val_data=None)
Training function, it performs the training and validation epochs
- Parameters
train_data – PyTorch data loader for the training loop
val_data – PyTorch data loader for the validation loop
- load_checkpoint(file_path)
Loads a training checkpoint from file.
- Parameters
file_path – checkpoint file path
- load_last_checkpoint()
Loads the last training checkpoint in the experiment dir.
- save_checkpoint(logs=None)
Saves a checkpoint of the training status
- Parameters
logs – logs containing the current value of the metrics.
- save_swa_model(logs=None)
Saves a checkpoint of the training status
- Parameters
logs – logs containing the current value of the metrics.
- set_train_mode()
- update_model()
VQ-VAE Trainers
- class hyperion.torch.trainers.vq_vae_trainer.VQVAETrainer(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
Vector Quantized Variational Auto-encoder trainer class
- model
model object.
- optim
pytorch optimizer object or optimizer options dict
- epochs
max. number of epochs
- exp_path
experiment output path
- cur_epoch
current epoch
- grad_acc_steps
gradient accumulation steps to simulate larger batch size.
- device
cpu/gpu device
- metrics
extra metrics to compute besides cxe.
- lrsched
learning rate scheduler object
- loggers
LoggerList object, loggers write training progress to std. output and file.
- ddp
if True use distributed data parallel training
- ddp_type
type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)
- train_mode
training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]
- use_amp
uses mixed precision training.
- log_interval
number of optim. steps between log outputs
- use_tensorboard
use tensorboard logger
- use_wandb
use wandb logger
- wandb
wandb dictionary of options
- grad_clip
norm to clip gradients, if 0 there is no clipping
- grad_clip_norm
norm type to clip gradients
- swa_start
epoch to start doing swa
- swa_lr
SWA learning rate
- swa_anneal_epochs
SWA learning rate anneal epochs
- cpu_offload
CPU offload of gradients when using fully sharded ddp
- __init__(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
- train_epoch(data_loader)[source]
Training epoch loop
- Parameters
data_loader – PyTorch data loader return input/output pairs
- validation_epoch(data_loader, swa_update_bn=False)[source]
Validation epoch loop
- Parameters
data_loader – PyTorch data loader return input/output pairs
- _default_loggers(log_interval, use_tensorboard, use_wandb, wandb)
Creates the default data loaders
- _get_lr()
Returns the current learning rate to show in the loggers
- static add_argparse_args(parser, prefix=None, skip=[])
- static add_class_args(parser, prefix=None, skip=[])
- bn_update_epoch(data_loader)
- checkpoint(logs=None)
Creates a checkpoint of the training, to save and posterior recovery
- Parameters
logs – logs containing the current value of the metrics.
- static filter_args(**kwargs)
- fit(train_data, val_data=None)
Training function, it performs the training and validation epochs
- Parameters
train_data – PyTorch data loader for the training loop
val_data – PyTorch data loader for the validation loop
- load_checkpoint(file_path)
Loads a training checkpoint from file.
- Parameters
file_path – checkpoint file path
- load_last_checkpoint()
Loads the last training checkpoint in the experiment dir.
- save_checkpoint(logs=None)
Saves a checkpoint of the training status
- Parameters
logs – logs containing the current value of the metrics.
- save_swa_model(logs=None)
Saves a checkpoint of the training status
- Parameters
logs – logs containing the current value of the metrics.
- set_train_mode()
- update_model()
- class hyperion.torch.trainers.vq_dvae_trainer.VQDVAETrainer(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
Vector Quantized Variational Auto-encoder trainer class
- model
model object.
- optim
pytorch optimizer object or optimizer options dict
- epochs
max. number of epochs
- exp_path
experiment output path
- cur_epoch
current epoch
- grad_acc_steps
gradient accumulation steps to simulate larger batch size.
- device
cpu/gpu device
- metrics
extra metrics to compute besides cxe.
- lrsched
learning rate scheduler object
- loggers
LoggerList object, loggers write training progress to std. output and file.
- ddp
if True use distributed data parallel training
- ddp_type
type of distributed data parallel in (ddp, oss_ddp, oss_shared_ddp)
- train_mode
training mode in [‘train’, ‘ft-full’, ‘ft-last-layer’]
- use_amp
uses mixed precision training.
- log_interval
number of optim. steps between log outputs
- use_tensorboard
use tensorboard logger
- use_wandb
use wandb logger
- wandb
wandb dictionary of options
- grad_clip
norm to clip gradients, if 0 there is no clipping
- grad_clip_norm
norm type to clip gradients
- swa_start
epoch to start doing swa
- swa_lr
SWA learning rate
- swa_anneal_epochs
SWA learning rate anneal epochs
- cpu_offload
CPU offload of gradients when using fully sharded ddp
- __init__(model, optim={}, epochs=100, exp_path='./train', cur_epoch=0, grad_acc_steps=1, device=None, metrics=None, lrsched=None, loggers=None, ddp=False, ddp_type='ddp', train_mode='train', use_amp=False, log_interval=10, use_tensorboard=False, use_wandb=False, wandb={}, grad_clip=0, grad_clip_norm=2, swa_start=0, swa_lr=0.001, swa_anneal_epochs=10, cpu_offload=False)[source]
- train_epoch(data_loader)[source]
Training epoch loop
- Parameters
data_loader – pytorch data loader returning noisy and clean features
- validation_epoch(data_loader, swa_update_bn=False)[source]
Validation epoch loop
- Parameters
data_loader – PyTorch data loader return input/output pairs
- _default_loggers(log_interval, use_tensorboard, use_wandb, wandb)
Creates the default data loaders
- _get_lr()
Returns the current learning rate to show in the loggers
- static add_argparse_args(parser, prefix=None, skip=[])
- static add_class_args(parser, prefix=None, skip=[])
- bn_update_epoch(data_loader)
- checkpoint(logs=None)
Creates a checkpoint of the training, to save and posterior recovery
- Parameters
logs – logs containing the current value of the metrics.
- static filter_args(**kwargs)
- fit(train_data, val_data=None)
Training function, it performs the training and validation epochs
- Parameters
train_data – PyTorch data loader for the training loop
val_data – PyTorch data loader for the validation loop
- load_checkpoint(file_path)
Loads a training checkpoint from file.
- Parameters
file_path – checkpoint file path
- load_last_checkpoint()
Loads the last training checkpoint in the experiment dir.
- save_checkpoint(logs=None)
Saves a checkpoint of the training status
- Parameters
logs – logs containing the current value of the metrics.
- save_swa_model(logs=None)
Saves a checkpoint of the training status
- Parameters
logs – logs containing the current value of the metrics.
- set_train_mode()
- update_model()
Datasets, Data Loaders and Samplers
Datasets
Audio Datasets
- class hyperion.torch.data.audio_dataset.AudioDataset(audio_path, key_file, class_file=None, time_durs_file=None, min_chunk_length=1, max_chunk_length=None, aug_cfg=None, return_fullseqs=False, return_class=True, return_clean_aug_pair=False, transpose_input=False, wav_scale=32767, is_val=False)[source]
- __init__(audio_path, key_file, class_file=None, time_durs_file=None, min_chunk_length=1, max_chunk_length=None, aug_cfg=None, return_fullseqs=False, return_class=True, return_clean_aug_pair=False, transpose_input=False, wav_scale=32767, is_val=False)[source]
- property wav_scale
- property num_seqs
- property seq_lengths
- property total_length
- property min_chunk_length
- property max_chunk_length
- property min_seq_length
- property max_seq_length
- property var_chunk_length
- static add_argparse_args(parser, prefix=None)
Feature Sequence Datasets
- class hyperion.torch.data.feat_seq_dataset.FeatSeqDataset(rspecifier, key_file, class_file=None, num_frames_file=None, path_prefix=None, min_chunk_length=1, max_chunk_length=None, return_fullseqs=False, return_class=True, transpose_input=True, is_val=False)[source]
- __init__(rspecifier, key_file, class_file=None, num_frames_file=None, path_prefix=None, min_chunk_length=1, max_chunk_length=None, return_fullseqs=False, return_class=True, transpose_input=True, is_val=False)[source]
- property num_seqs
- property seq_lengths
- property total_length
- property min_chunk_length
- property max_chunk_length
- property min_seq_length
- property max_seq_length
- property var_chunk_length
- static add_argparse_args(parser, prefix=None)
- class hyperion.torch.data.paired_feat_seq_dataset.PairedFeatSeqDataset(rspecifier, key_file, pairs_file, class_file=None, num_frames_file=None, path_prefix=None, min_chunk_length=1, max_chunk_length=None, return_fullseqs=False, return_class=True, transpose_input=True, is_val=False)[source]
- __init__(rspecifier, key_file, pairs_file, class_file=None, num_frames_file=None, path_prefix=None, min_chunk_length=1, max_chunk_length=None, return_fullseqs=False, return_class=True, transpose_input=True, is_val=False)[source]
- static add_argparse_args(parser, prefix=None)
- static add_class_args(parser, prefix=None)
- static filter_args(**kwargs)
- get_random_chunk_length()
- property max_chunk_length
- property max_seq_length
- property min_chunk_length
- property min_seq_length
- property num_seqs
- property seq_lengths
- property total_length
- property var_chunk_length
Embedding Datasets
Samplers
Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- class hyperion.torch.data.weighted_seq_sampler.ClassWeightedSeqSampler(dataset, batch_size=1, iters_per_epoch='auto', num_egs_per_class=1, num_egs_per_utt=1, var_batch_size=False)[source]
- __init__(dataset, batch_size=1, iters_per_epoch='auto', num_egs_per_class=1, num_egs_per_utt=1, var_batch_size=False)[source]
- static add_argparse_args(parser, prefix=None)
Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
Data Transformations
Optimizers
These are custom optimizers and a factory class to create optimizers from config params.
Custom Optimizers
- class hyperion.torch.optim.radam.RAdam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, degenerated_to_sgd=True)[source]
Implements Rectified Adam optimzier (RAdam) from
Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han. “On the Variance of the Adaptive Learning Rate and Beyond.” arXiv preprint arXiv:1908.03265 (2019).
- __init__(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, degenerated_to_sgd=True)[source]
- step(closure=None)[source]
Performs a single optimization step (parameter update).
- Parameters
closure (callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.
Note
Unless otherwise specified, this function should not modify the
.gradfield of the parameters.
- add_param_group(param_group)
Add a param group to the
Optimizers param_groups.This can be useful when fine tuning a pre-trained network as frozen layers can be made trainable and added to the
Optimizeras training progresses.- Parameters
param_group (dict) – Specifies what Tensors should be optimized along with group
options. (specific optimization) –
- load_state_dict(state_dict)
Loads the optimizer state.
- Parameters
state_dict (dict) – optimizer state. Should be an object returned from a call to
state_dict().
- state_dict()
Returns the state of the optimizer as a
dict.It contains two entries:
- state - a dict holding current optimization state. Its content
differs between optimizer classes.
param_groups - a dict containing all parameter groups
- zero_grad(set_to_none: bool = False)
Sets the gradients of all optimized
torch.Tensors to zero.- Parameters
set_to_none (bool) – instead of setting to zero, set the grads to None. This will in general have lower memory footprint, and can modestly improve performance. However, it changes certain behaviors. For example: 1. When the user tries to access a gradient and perform manual ops on it, a None attribute or a Tensor full of 0s will behave differently. 2. If the user requests
zero_grad(set_to_none=True)followed by a backward pass,.grads are guaranteed to be None for params that did not receive a gradient. 3.torch.optimoptimizers have a different behavior if the gradient is 0 or None (in one case it does the step with a gradient of 0 and in the other it skips the step altogether).
Optimizer Factory
- class hyperion.torch.optim.factory.OptimizerFactory[source]
- static create(params, opt_type, lr, momentum=0, beta1=0.9, beta2=0.99, rho=0.9, eps=1e-08, weight_decay=0, amsgrad=False, nesterov=False, lambd=0.0001, asgd_alpha=0.75, t0=1000000.0, rmsprop_alpha=0.99, centered=False, lr_decay=0, init_acc_val=0, max_iter=20, oss=False)[source]
- static add_argparse_args(parser, prefix=None)
Learning Rate Schedulers
These are custom learning rate schedulers and a factory class to create schedulers from config params.
Custom LR Schedulers
- class hyperion.torch.lr_schedulers.red_lr_on_plateau.ReduceLROnPlateau(optimizer, monitor='val_loss', mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, warmup_steps=0, eps=1e-08)[source]
Reduce learning rate when a metric has stopped improving. Models often benefit from reducing the learning rate by a factor of 2-10 once learning stagnates. This scheduler reads a metrics quantity and if no improvement is seen for a ‘patience’ number of epochs, the learning rate is reduced.
- optimizer
optimizer.
- Type
Optimizer
- mode
One of min, max. In min mode, lr will be reduced when the quantity monitored has stopped decreasing; in max mode it will be reduced when the quantity monitored has stopped increasing. Default: ‘min’.
- Type
str
- factor
Factor by which the learning rate will be reduced. new_lr = lr * factor. Default: 0.1.
- Type
float
- patience
Number of epochs with no improvement after which learning rate will be reduced. For example, if patience = 2, then we will ignore the first 2 epochs with no improvement, and will only decrease the LR after the 3rd epoch if the loss still hasn’t improved then. Default: 10.
- Type
int
- threshold
Threshold for measuring the new optimum, to only focus on significant changes. Default: 1e-4.
- Type
float
- threshold_mode
One of rel, abs. In rel mode, dynamic_threshold = best * ( 1 + threshold ) in ‘max’ mode or best * ( 1 - threshold ) in min mode. In abs mode, dynamic_threshold = best + threshold in max mode or best - threshold in min mode. Default: ‘rel’.
- Type
str
- cooldown
Number of epochs to wait before resuming normal operation after lr has been reduced. Default: 0.
- Type
int
- min_lr
A scalar or a list of scalars. A lower bound on the learning rate of all param groups or each group respectively. Default: 0.
- Type
float or list
- eps
Minimal decay applied to lr. If the difference between new and old lr is smaller than eps, the update is ignored. Default: 1e-8.
- Type
float
- __init__(optimizer, monitor='val_loss', mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, warmup_steps=0, eps=1e-08)[source]
- property in_cooldown
- load_state_dict(state_dict)[source]
Loads the schedulers state.
- Parameters
state_dict (dict) – scheduler state. Should be an object returned from a call to
state_dict().
- get_lr()
- get_warmup_lr()
- property in_warmup
- state_dict()
Returns the state of the scheduler as a
dict.It contains an entry for every variable in self.__dict__ which is not the optimizer.
- class hyperion.torch.lr_schedulers.exp_lr.ExponentialLR(optimizer, decay_rate, decay_steps, hold_steps, min_lr=0, warmup_steps=0, epoch=0, step=0, update_lr_on_opt_step=False)[source]
Exponential learning rate scheduler.
- __init__(optimizer, decay_rate, decay_steps, hold_steps, min_lr=0, warmup_steps=0, epoch=0, step=0, update_lr_on_opt_step=False)[source]
- load_state_dict(state_dict)[source]
Loads the schedulers state.
- Parameters
state_dict (dict) – scheduler state. Should be an object returned from a call to
state_dict().
- get_warmup_lr()
- property in_warmup
- on_epoch_begin(epoch=None, **kwargs)
- on_epoch_end(metrics=None)
- on_opt_step()
- state_dict()
Returns the state of the scheduler as a
dict.It contains an entry for every variable in self.__dict__ which is not the optimizer.
- class hyperion.torch.lr_schedulers.invpow_lr.InvPowLR(optimizer, power=0.5, hold_steps=0, min_lr=0, warmup_steps=0, epoch=0, step=0, update_lr_on_opt_step=False)[source]
inverse power learning rate scheduler.
- __init__(optimizer, power=0.5, hold_steps=0, min_lr=0, warmup_steps=0, epoch=0, step=0, update_lr_on_opt_step=False)[source]
- load_state_dict(state_dict)[source]
Loads the schedulers state.
- Parameters
state_dict (dict) – scheduler state. Should be an object returned from a call to
state_dict().
- get_warmup_lr()
- property in_warmup
- on_epoch_begin(epoch=None, **kwargs)
- on_epoch_end(metrics=None)
- on_opt_step()
- state_dict()
Returns the state of the scheduler as a
dict.It contains an entry for every variable in self.__dict__ which is not the optimizer.
- class hyperion.torch.lr_schedulers.cos_lr.CosineLR(optimizer, T, T_mul=1, min_lr=0, warmup_steps=0, warm_restarts=False, gamma=1, last_restart=0, num_restarts=0, epoch=0, step=0, update_lr_on_opt_step=False)[source]
Set the learning rate of each parameter group using a cosine annealing schedule, where \(\eta_{max}\) is set to the initial lr and \(T_{cur}\) is the number of epochs since the last restart in SGDR:
\[\eta_t = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})(1 + \cos(\frac{T_{cur}}{T_{max}}\pi))\]When epoch=-1, sets initial lr as lr.
It has been proposed in SGDR: Stochastic Gradient Descent with Warm Restarts.
- Parameters
optimizer (Optimizer) – Wrapped optimizer.
T_max (int) – Maximum number of iterations.
eta_min (float) – Minimum learning rate. Default: 0.
epoch (int) – The index of last epoch. Default: -1.
- __init__(optimizer, T, T_mul=1, min_lr=0, warmup_steps=0, warm_restarts=False, gamma=1, last_restart=0, num_restarts=0, epoch=0, step=0, update_lr_on_opt_step=False)[source]
- get_warmup_lr()
- property in_warmup
- load_state_dict(state_dict)
Loads the schedulers state.
- Parameters
state_dict (dict) – scheduler state. Should be an object returned from a call to
state_dict().
- on_epoch_end(metrics=None)
- on_opt_step()
- state_dict()
Returns the state of the scheduler as a
dict.It contains an entry for every variable in self.__dict__ which is not the optimizer.
LR Scheduler Factory
- class hyperion.torch.lr_schedulers.factory.LRSchedulerFactory[source]
- create(lrsch_type, decay_rate=0.01, decay_steps=100, power=0.5, hold_steps=10, t=10, t_mul=1, warm_restarts=False, gamma=1, monitor='val_loss', mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0, eps=1e-08, min_lr=0, warmup_steps=0, update_lr_on_opt_step=False)[source]
- static add_argparse_args(parser, prefix=None)
Metrics
This are metric classes and functions that cannot be used as loss function.
Metric Classes
- class hyperion.torch.metrics.metrics.TorchMetric(weight=None, reduction='mean')[source]
Base class for metrics that cannot be objective functions
- __init__(weight=None, reduction='mean')[source]
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- T_destination
alias of TypeVar(‘T_destination’, bound=
Mapping[str,torch.Tensor])
- _get_backward_hooks()
Returns the backward hooks for use in the call function. It returns two lists, one with the full backward hooks and one with the non-full backward hooks.
- _load_from_state_dict(state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs)
Copies parameters and buffers from
state_dictinto only this module, but not its descendants. This is called on every submodule inload_state_dict(). Metadata saved for this module in inputstate_dictis provided aslocal_metadata. For state dicts without metadata,local_metadatais empty. Subclasses can achieve class-specific backward compatible loading using the version number at local_metadata.get(“version”, None).Note
state_dictis not the same object as the inputstate_dicttoload_state_dict(). So it can be modified.- Parameters
state_dict (dict) – a dict containing parameters and persistent buffers.
prefix (str) – the prefix for parameters and buffers used in this module
local_metadata (dict) – a dict containing the metadata for this module. See
strict (bool) – whether to strictly enforce that the keys in
state_dictwithprefixmatch the names of parameters and buffers in this modulemissing_keys (list of str) – if
strict=True, add missing keys to this listunexpected_keys (list of str) – if
strict=True, add unexpected keys to this listerror_msgs (list of str) – error messages should be added to this list, and will be reported together in
load_state_dict()
- _named_members(get_members_fn, prefix='', recurse=True)
Helper method for yielding various names + members of modules.
- _register_load_state_dict_pre_hook(hook)
These hooks will be called with arguments: state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs, before loading state_dict into self. These arguments are exactly the same as those of _load_from_state_dict.
- _register_state_dict_hook(hook)
These hooks will be called with arguments: self, state_dict, prefix, local_metadata, after the state_dict of self is set. Note that only parameters and buffers of self or its children are guaranteed to exist in state_dict. The hooks may modify state_dict inplace or return a new one.
- _save_to_state_dict(destination, prefix, keep_vars)
Saves module state to destination dictionary, containing a state of the module, but not its descendants. This is called on every submodule in
state_dict().In rare cases, subclasses can achieve class-specific behavior by overriding this method with custom logic.
- Parameters
destination (dict) – a dict where state will be stored
prefix (str) – the prefix for parameters and buffers used in this module
- add_module(name: str, module: Optional[torch.nn.modules.module.Module]) None
Adds a child module to the current module.
The module can be accessed as an attribute using the given name.
- Parameters
name (string) – name of the child module. The child module can be accessed from this module using the given name
module (Module) – child module to be added to the module.
- apply(fn: Callable[[torch.nn.modules.module.Module], None]) torch.nn.modules.module.T
Applies
fnrecursively to every submodule (as returned by.children()) as well as self. Typical use includes initializing the parameters of a model (see also nn-init-doc).- Parameters
fn (
Module-> None) – function to be applied to each submodule- Returns
self
- Return type
Module
Example:
>>> @torch.no_grad() >>> def init_weights(m): >>> print(m) >>> if type(m) == nn.Linear: >>> m.weight.fill_(1.0) >>> print(m.weight) >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2)) >>> net.apply(init_weights) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )
- bfloat16() torch.nn.modules.module.T
Casts all floating point parameters and buffers to
bfloat16datatype.- Returns
self
- Return type
Module
- buffers(recurse: bool = True) Iterator[torch.Tensor]
Returns an iterator over module buffers.
- Parameters
recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.
- Yields
torch.Tensor – module buffer
Example:
>>> for buf in model.buffers(): >>> print(type(buf), buf.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
- children() Iterator[torch.nn.modules.module.Module]
Returns an iterator over immediate children modules.
- Yields
Module – a child module
- cpu() torch.nn.modules.module.T
Moves all model parameters and buffers to the CPU.
- Returns
self
- Return type
Module
- cuda(device: Optional[Union[int, torch.device]] = None) torch.nn.modules.module.T
Moves all model parameters and buffers to the GPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.
- Parameters
device (int, optional) – if specified, all parameters will be copied to that device
- Returns
self
- Return type
Module
- double() torch.nn.modules.module.T
Casts all floating point parameters and buffers to
doubledatatype.- Returns
self
- Return type
Module
- dump_patches: bool = False
This allows better BC support for
load_state_dict(). Instate_dict(), the version number will be saved as in the attribute _metadata of the returned state dict, and thus pickled. _metadata is a dictionary with keys that follow the naming convention of state dict. See_load_from_state_dicton how to use this information in loading.If new parameters/buffers are added/removed from a module, this number shall be bumped, and the module’s _load_from_state_dict method can compare the version number and do appropriate changes if the state dict is from before the change.
- eval() torch.nn.modules.module.T
Sets the module in evaluation mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout,BatchNorm, etc.This is equivalent with
self.train(False).- Returns
self
- Return type
Module
- extra_repr() str
Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- float() torch.nn.modules.module.T
Casts all floating point parameters and buffers to float datatype.
- Returns
self
- Return type
Module
- forward(*input: Any) None
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- half() torch.nn.modules.module.T
Casts all floating point parameters and buffers to
halfdatatype.- Returns
self
- Return type
Module
- load_state_dict(state_dict: OrderedDict[str, Tensor], strict: bool = True)
Copies parameters and buffers from
state_dictinto this module and its descendants. IfstrictisTrue, then the keys ofstate_dictmust exactly match the keys returned by this module’sstate_dict()function.- Parameters
state_dict (dict) – a dict containing parameters and persistent buffers.
strict (bool, optional) – whether to strictly enforce that the keys in
state_dictmatch the keys returned by this module’sstate_dict()function. Default:True
- Returns
missing_keys is a list of str containing the missing keys
unexpected_keys is a list of str containing the unexpected keys
- Return type
NamedTuplewithmissing_keysandunexpected_keysfields
- modules() Iterator[torch.nn.modules.module.Module]
Returns an iterator over all modules in the network.
- Yields
Module – a module in the network
Note
Duplicate modules are returned only once. In the following example,
lwill be returned only once.Example:
>>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.modules()): print(idx, '->', m) 0 -> Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) 1 -> Linear(in_features=2, out_features=2, bias=True)
- named_buffers(prefix: str = '', recurse: bool = True) Iterator[Tuple[str, torch.Tensor]]
Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
- Parameters
prefix (str) – prefix to prepend to all buffer names.
recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.
- Yields
(string, torch.Tensor) – Tuple containing the name and buffer
Example:
>>> for name, buf in self.named_buffers(): >>> if name in ['running_var']: >>> print(buf.size())
- named_children() Iterator[Tuple[str, torch.nn.modules.module.Module]]
Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
- Yields
(string, Module) – Tuple containing a name and child module
Example:
>>> for name, module in model.named_children(): >>> if name in ['conv4', 'conv5']: >>> print(module)
- named_modules(memo: Optional[Set[torch.nn.modules.module.Module]] = None, prefix: str = '')
Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
- Yields
(string, Module) – Tuple of name and module
Note
Duplicate modules are returned only once. In the following example,
lwill be returned only once.Example:
>>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.named_modules()): print(idx, '->', m) 0 -> ('', Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )) 1 -> ('0', Linear(in_features=2, out_features=2, bias=True))
- named_parameters(prefix: str = '', recurse: bool = True) Iterator[Tuple[str, torch.nn.parameter.Parameter]]
Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
- Parameters
prefix (str) – prefix to prepend to all parameter names.
recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.
- Yields
(string, Parameter) – Tuple containing the name and parameter
Example:
>>> for name, param in self.named_parameters(): >>> if name in ['bias']: >>> print(param.size())
- parameters(recurse: bool = True) Iterator[torch.nn.parameter.Parameter]
Returns an iterator over module parameters.
This is typically passed to an optimizer.
- Parameters
recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.
- Yields
Parameter – module parameter
Example:
>>> for param in model.parameters(): >>> print(type(param), param.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
- register_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) torch.utils.hooks.RemovableHandle
Registers a backward hook on the module.
This function is deprecated in favor of
nn.Module.register_full_backward_hook()and the behavior of this function will change in future versions.- Returns
a handle that can be used to remove the added hook by calling
handle.remove()- Return type
torch.utils.hooks.RemovableHandle
- register_buffer(name: str, tensor: Optional[torch.Tensor], persistent: bool = True) None
Adds a buffer to the module.
This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s
running_meanis not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by settingpersistenttoFalse. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’sstate_dict.Buffers can be accessed as attributes using given names.
- Parameters
name (string) – name of the buffer. The buffer can be accessed from this module using the given name
tensor (Tensor) – buffer to be registered.
persistent (bool) – whether the buffer is part of this module’s
state_dict.
Example:
>>> self.register_buffer('running_mean', torch.zeros(num_features))
- register_forward_hook(hook: Callable[[...], None]) torch.utils.hooks.RemovableHandle
Registers a forward hook on the module.
The hook will be called every time after
forward()has computed an output. It should have the following signature:hook(module, input, output) -> None or modified output
The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the
forward. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called afterforward()is called.- Returns
a handle that can be used to remove the added hook by calling
handle.remove()- Return type
torch.utils.hooks.RemovableHandle
- register_forward_pre_hook(hook: Callable[[...], None]) torch.utils.hooks.RemovableHandle
Registers a forward pre-hook on the module.
The hook will be called every time before
forward()is invoked. It should have the following signature:hook(module, input) -> None or modified input
The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the
forward. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple).- Returns
a handle that can be used to remove the added hook by calling
handle.remove()- Return type
torch.utils.hooks.RemovableHandle
- register_full_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) torch.utils.hooks.RemovableHandle
Registers a backward hook on the module.
The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature:
hook(module, grad_input, grad_output) -> tuple(Tensor) or None
The
grad_inputandgrad_outputare tuples that contain the gradients with respect to the inputs and outputs respectively. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place ofgrad_inputin subsequent computations.grad_inputwill only correspond to the inputs given as positional arguments and all kwarg arguments are ignored. Entries ingrad_inputandgrad_outputwill beNonefor all non-Tensor arguments.Warning
Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error.
- Returns
a handle that can be used to remove the added hook by calling
handle.remove()- Return type
torch.utils.hooks.RemovableHandle
- register_parameter(name: str, param: Optional[torch.nn.parameter.Parameter]) None
Adds a parameter to the module.
The parameter can be accessed as an attribute using given name.
- Parameters
name (string) – name of the parameter. The parameter can be accessed from this module using the given name
param (Parameter) – parameter to be added to the module.
- requires_grad_(requires_grad: bool = True) torch.nn.modules.module.T
Change if autograd should record operations on parameters in this module.
This method sets the parameters’
requires_gradattributes in-place.This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training).
- Parameters
requires_grad (bool) – whether autograd should record operations on parameters in this module. Default:
True.- Returns
self
- Return type
Module
- state_dict(destination=None, prefix='', keep_vars=False)
Returns a dictionary containing a whole state of the module.
Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names.
- Returns
a dictionary containing a whole state of the module
- Return type
dict
Example:
>>> module.state_dict().keys() ['bias', 'weight']
- to(*args, **kwargs)
Moves and/or casts the parameters and buffers.
This can be called as
- to(device=None, dtype=None, non_blocking=False)
- to(dtype, non_blocking=False)
- to(tensor, non_blocking=False)
- to(memory_format=torch.channels_last)
Its signature is similar to
torch.Tensor.to(), but only accepts floating point or complexdtype`s. In addition, this method will only cast the floating point or complex parameters and buffers to :attr:`dtype(if given). The integral parameters and buffers will be moveddevice, if that is given, but with dtypes unchanged. Whennon_blockingis set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.See below for examples.
Note
This method modifies the module in-place.
- Parameters
device (
torch.device) – the desired device of the parameters and buffers in this moduledtype (
torch.dtype) – the desired floating point or complex dtype of the parameters and buffers in this moduletensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module
memory_format (
torch.memory_format) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)
- Returns
self
- Return type
Module
Examples:
>>> linear = nn.Linear(2, 2) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]]) >>> linear.to(torch.double) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]], dtype=torch.float64) >>> gpu1 = torch.device("cuda:1") >>> linear.to(gpu1, dtype=torch.half, non_blocking=True) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1') >>> cpu = torch.device("cpu") >>> linear.to(cpu) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16) >>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble) >>> linear.weight Parameter containing: tensor([[ 0.3741+0.j, 0.2382+0.j], [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128) >>> linear(torch.ones(3, 2, dtype=torch.cdouble)) tensor([[0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)
- train(mode: bool = True) torch.nn.modules.module.T
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout,BatchNorm, etc.- Parameters
mode (bool) – whether to set training mode (
True) or evaluation mode (False). Default:True.- Returns
self
- Return type
Module
- type(dst_type: Union[torch.dtype, str]) torch.nn.modules.module.T
Casts all parameters and buffers to
dst_type.- Parameters
dst_type (type or string) – the desired type
- Returns
self
- Return type
Module
- xpu(device: Optional[Union[int, torch.device]] = None) torch.nn.modules.module.T
Moves all model parameters and buffers to the XPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on XPU while being optimized.
- Parameters
device (int, optional) – if specified, all parameters will be copied to that device
- Returns
self
- Return type
Module
- zero_grad(set_to_none: bool = False) None
Sets gradients of all model parameters to zero. See similar function under
torch.optim.Optimizerfor more context.- Parameters
set_to_none (bool) – instead of setting to zero, set the grads to None. See
torch.optim.Optimizer.zero_grad()for details.
- training: bool
Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- class hyperion.torch.metrics.accuracy.CategoricalAccuracy(weight=None, reduction='mean')[source]
- __init__(weight=None, reduction='mean')[source]
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(input, target)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- T_destination
alias of TypeVar(‘T_destination’, bound=
Mapping[str,torch.Tensor])
- _get_backward_hooks()
Returns the backward hooks for use in the call function. It returns two lists, one with the full backward hooks and one with the non-full backward hooks.
- _load_from_state_dict(state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs)
Copies parameters and buffers from
state_dictinto only this module, but not its descendants. This is called on every submodule inload_state_dict(). Metadata saved for this module in inputstate_dictis provided aslocal_metadata. For state dicts without metadata,local_metadatais empty. Subclasses can achieve class-specific backward compatible loading using the version number at local_metadata.get(“version”, None).Note
state_dictis not the same object as the inputstate_dicttoload_state_dict(). So it can be modified.- Parameters
state_dict (dict) – a dict containing parameters and persistent buffers.
prefix (str) – the prefix for parameters and buffers used in this module
local_metadata (dict) – a dict containing the metadata for this module. See
strict (bool) – whether to strictly enforce that the keys in
state_dictwithprefixmatch the names of parameters and buffers in this modulemissing_keys (list of str) – if
strict=True, add missing keys to this listunexpected_keys (list of str) – if
strict=True, add unexpected keys to this listerror_msgs (list of str) – error messages should be added to this list, and will be reported together in
load_state_dict()
- _named_members(get_members_fn, prefix='', recurse=True)
Helper method for yielding various names + members of modules.
- _register_load_state_dict_pre_hook(hook)
These hooks will be called with arguments: state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs, before loading state_dict into self. These arguments are exactly the same as those of _load_from_state_dict.
- _register_state_dict_hook(hook)
These hooks will be called with arguments: self, state_dict, prefix, local_metadata, after the state_dict of self is set. Note that only parameters and buffers of self or its children are guaranteed to exist in state_dict. The hooks may modify state_dict inplace or return a new one.
- _save_to_state_dict(destination, prefix, keep_vars)
Saves module state to destination dictionary, containing a state of the module, but not its descendants. This is called on every submodule in
state_dict().In rare cases, subclasses can achieve class-specific behavior by overriding this method with custom logic.
- Parameters
destination (dict) – a dict where state will be stored
prefix (str) – the prefix for parameters and buffers used in this module
- add_module(name: str, module: Optional[torch.nn.modules.module.Module]) None
Adds a child module to the current module.
The module can be accessed as an attribute using the given name.
- Parameters
name (string) – name of the child module. The child module can be accessed from this module using the given name
module (Module) – child module to be added to the module.
- apply(fn: Callable[[torch.nn.modules.module.Module], None]) torch.nn.modules.module.T
Applies
fnrecursively to every submodule (as returned by.children()) as well as self. Typical use includes initializing the parameters of a model (see also nn-init-doc).- Parameters
fn (
Module-> None) – function to be applied to each submodule- Returns
self
- Return type
Module
Example:
>>> @torch.no_grad() >>> def init_weights(m): >>> print(m) >>> if type(m) == nn.Linear: >>> m.weight.fill_(1.0) >>> print(m.weight) >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2)) >>> net.apply(init_weights) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )
- bfloat16() torch.nn.modules.module.T
Casts all floating point parameters and buffers to
bfloat16datatype.- Returns
self
- Return type
Module
- buffers(recurse: bool = True) Iterator[torch.Tensor]
Returns an iterator over module buffers.
- Parameters
recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.
- Yields
torch.Tensor – module buffer
Example:
>>> for buf in model.buffers(): >>> print(type(buf), buf.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
- children() Iterator[torch.nn.modules.module.Module]
Returns an iterator over immediate children modules.
- Yields
Module – a child module
- cpu() torch.nn.modules.module.T
Moves all model parameters and buffers to the CPU.
- Returns
self
- Return type
Module
- cuda(device: Optional[Union[int, torch.device]] = None) torch.nn.modules.module.T
Moves all model parameters and buffers to the GPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.
- Parameters
device (int, optional) – if specified, all parameters will be copied to that device
- Returns
self
- Return type
Module
- double() torch.nn.modules.module.T
Casts all floating point parameters and buffers to
doubledatatype.- Returns
self
- Return type
Module
- dump_patches: bool = False
This allows better BC support for
load_state_dict(). Instate_dict(), the version number will be saved as in the attribute _metadata of the returned state dict, and thus pickled. _metadata is a dictionary with keys that follow the naming convention of state dict. See_load_from_state_dicton how to use this information in loading.If new parameters/buffers are added/removed from a module, this number shall be bumped, and the module’s _load_from_state_dict method can compare the version number and do appropriate changes if the state dict is from before the change.
- eval() torch.nn.modules.module.T
Sets the module in evaluation mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout,BatchNorm, etc.This is equivalent with
self.train(False).- Returns
self
- Return type
Module
- extra_repr() str
Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- float() torch.nn.modules.module.T
Casts all floating point parameters and buffers to float datatype.
- Returns
self
- Return type
Module
- half() torch.nn.modules.module.T
Casts all floating point parameters and buffers to
halfdatatype.- Returns
self
- Return type
Module
- load_state_dict(state_dict: OrderedDict[str, Tensor], strict: bool = True)
Copies parameters and buffers from
state_dictinto this module and its descendants. IfstrictisTrue, then the keys ofstate_dictmust exactly match the keys returned by this module’sstate_dict()function.- Parameters
state_dict (dict) – a dict containing parameters and persistent buffers.
strict (bool, optional) – whether to strictly enforce that the keys in
state_dictmatch the keys returned by this module’sstate_dict()function. Default:True
- Returns
missing_keys is a list of str containing the missing keys
unexpected_keys is a list of str containing the unexpected keys
- Return type
NamedTuplewithmissing_keysandunexpected_keysfields
- modules() Iterator[torch.nn.modules.module.Module]
Returns an iterator over all modules in the network.
- Yields
Module – a module in the network
Note
Duplicate modules are returned only once. In the following example,
lwill be returned only once.Example:
>>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.modules()): print(idx, '->', m) 0 -> Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) 1 -> Linear(in_features=2, out_features=2, bias=True)
- named_buffers(prefix: str = '', recurse: bool = True) Iterator[Tuple[str, torch.Tensor]]
Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
- Parameters
prefix (str) – prefix to prepend to all buffer names.
recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.
- Yields
(string, torch.Tensor) – Tuple containing the name and buffer
Example:
>>> for name, buf in self.named_buffers(): >>> if name in ['running_var']: >>> print(buf.size())
- named_children() Iterator[Tuple[str, torch.nn.modules.module.Module]]
Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
- Yields
(string, Module) – Tuple containing a name and child module
Example:
>>> for name, module in model.named_children(): >>> if name in ['conv4', 'conv5']: >>> print(module)
- named_modules(memo: Optional[Set[torch.nn.modules.module.Module]] = None, prefix: str = '')
Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
- Yields
(string, Module) – Tuple of name and module
Note
Duplicate modules are returned only once. In the following example,
lwill be returned only once.Example:
>>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.named_modules()): print(idx, '->', m) 0 -> ('', Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )) 1 -> ('0', Linear(in_features=2, out_features=2, bias=True))
- named_parameters(prefix: str = '', recurse: bool = True) Iterator[Tuple[str, torch.nn.parameter.Parameter]]
Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
- Parameters
prefix (str) – prefix to prepend to all parameter names.
recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.
- Yields
(string, Parameter) – Tuple containing the name and parameter
Example:
>>> for name, param in self.named_parameters(): >>> if name in ['bias']: >>> print(param.size())
- parameters(recurse: bool = True) Iterator[torch.nn.parameter.Parameter]
Returns an iterator over module parameters.
This is typically passed to an optimizer.
- Parameters
recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.
- Yields
Parameter – module parameter
Example:
>>> for param in model.parameters(): >>> print(type(param), param.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
- register_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) torch.utils.hooks.RemovableHandle
Registers a backward hook on the module.
This function is deprecated in favor of
nn.Module.register_full_backward_hook()and the behavior of this function will change in future versions.- Returns
a handle that can be used to remove the added hook by calling
handle.remove()- Return type
torch.utils.hooks.RemovableHandle
- register_buffer(name: str, tensor: Optional[torch.Tensor], persistent: bool = True) None
Adds a buffer to the module.
This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s
running_meanis not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by settingpersistenttoFalse. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’sstate_dict.Buffers can be accessed as attributes using given names.
- Parameters
name (string) – name of the buffer. The buffer can be accessed from this module using the given name
tensor (Tensor) – buffer to be registered.
persistent (bool) – whether the buffer is part of this module’s
state_dict.
Example:
>>> self.register_buffer('running_mean', torch.zeros(num_features))
- register_forward_hook(hook: Callable[[...], None]) torch.utils.hooks.RemovableHandle
Registers a forward hook on the module.
The hook will be called every time after
forward()has computed an output. It should have the following signature:hook(module, input, output) -> None or modified output
The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the
forward. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called afterforward()is called.- Returns
a handle that can be used to remove the added hook by calling
handle.remove()- Return type
torch.utils.hooks.RemovableHandle
- register_forward_pre_hook(hook: Callable[[...], None]) torch.utils.hooks.RemovableHandle
Registers a forward pre-hook on the module.
The hook will be called every time before
forward()is invoked. It should have the following signature:hook(module, input) -> None or modified input
The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the
forward. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple).- Returns
a handle that can be used to remove the added hook by calling
handle.remove()- Return type
torch.utils.hooks.RemovableHandle
- register_full_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) torch.utils.hooks.RemovableHandle
Registers a backward hook on the module.
The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature:
hook(module, grad_input, grad_output) -> tuple(Tensor) or None
The
grad_inputandgrad_outputare tuples that contain the gradients with respect to the inputs and outputs respectively. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place ofgrad_inputin subsequent computations.grad_inputwill only correspond to the inputs given as positional arguments and all kwarg arguments are ignored. Entries ingrad_inputandgrad_outputwill beNonefor all non-Tensor arguments.Warning
Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error.
- Returns
a handle that can be used to remove the added hook by calling
handle.remove()- Return type
torch.utils.hooks.RemovableHandle
- register_parameter(name: str, param: Optional[torch.nn.parameter.Parameter]) None
Adds a parameter to the module.
The parameter can be accessed as an attribute using given name.
- Parameters
name (string) – name of the parameter. The parameter can be accessed from this module using the given name
param (Parameter) – parameter to be added to the module.
- requires_grad_(requires_grad: bool = True) torch.nn.modules.module.T
Change if autograd should record operations on parameters in this module.
This method sets the parameters’
requires_gradattributes in-place.This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training).
- Parameters
requires_grad (bool) – whether autograd should record operations on parameters in this module. Default:
True.- Returns
self
- Return type
Module
- state_dict(destination=None, prefix='', keep_vars=False)
Returns a dictionary containing a whole state of the module.
Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names.
- Returns
a dictionary containing a whole state of the module
- Return type
dict
Example:
>>> module.state_dict().keys() ['bias', 'weight']
- to(*args, **kwargs)
Moves and/or casts the parameters and buffers.
This can be called as
- to(device=None, dtype=None, non_blocking=False)
- to(dtype, non_blocking=False)
- to(tensor, non_blocking=False)
- to(memory_format=torch.channels_last)
Its signature is similar to
torch.Tensor.to(), but only accepts floating point or complexdtype`s. In addition, this method will only cast the floating point or complex parameters and buffers to :attr:`dtype(if given). The integral parameters and buffers will be moveddevice, if that is given, but with dtypes unchanged. Whennon_blockingis set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.See below for examples.
Note
This method modifies the module in-place.
- Parameters
device (
torch.device) – the desired device of the parameters and buffers in this moduledtype (
torch.dtype) – the desired floating point or complex dtype of the parameters and buffers in this moduletensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module
memory_format (
torch.memory_format) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)
- Returns
self
- Return type
Module
Examples:
>>> linear = nn.Linear(2, 2) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]]) >>> linear.to(torch.double) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]], dtype=torch.float64) >>> gpu1 = torch.device("cuda:1") >>> linear.to(gpu1, dtype=torch.half, non_blocking=True) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1') >>> cpu = torch.device("cpu") >>> linear.to(cpu) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16) >>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble) >>> linear.weight Parameter containing: tensor([[ 0.3741+0.j, 0.2382+0.j], [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128) >>> linear(torch.ones(3, 2, dtype=torch.cdouble)) tensor([[0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)
- train(mode: bool = True) torch.nn.modules.module.T
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout,BatchNorm, etc.- Parameters
mode (bool) – whether to set training mode (
True) or evaluation mode (False). Default:True.- Returns
self
- Return type
Module
- type(dst_type: Union[torch.dtype, str]) torch.nn.modules.module.T
Casts all parameters and buffers to
dst_type.- Parameters
dst_type (type or string) – the desired type
- Returns
self
- Return type
Module
- xpu(device: Optional[Union[int, torch.device]] = None) torch.nn.modules.module.T
Moves all model parameters and buffers to the XPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on XPU while being optimized.
- Parameters
device (int, optional) – if specified, all parameters will be copied to that device
- Returns
self
- Return type
Module
- zero_grad(set_to_none: bool = False) None
Sets gradients of all model parameters to zero. See similar function under
torch.optim.Optimizerfor more context.- Parameters
set_to_none (bool) – instead of setting to zero, set the grads to None. See
torch.optim.Optimizer.zero_grad()for details.
- training: bool
- class hyperion.torch.metrics.accuracy.BinaryAccuracy(weight=None, reduction='mean', thr=0.5)[source]
- __init__(weight=None, reduction='mean', thr=0.5)[source]
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(input, target)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- T_destination
alias of TypeVar(‘T_destination’, bound=
Mapping[str,torch.Tensor])
- _get_backward_hooks()
Returns the backward hooks for use in the call function. It returns two lists, one with the full backward hooks and one with the non-full backward hooks.
- _load_from_state_dict(state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs)
Copies parameters and buffers from
state_dictinto only this module, but not its descendants. This is called on every submodule inload_state_dict(). Metadata saved for this module in inputstate_dictis provided aslocal_metadata. For state dicts without metadata,local_metadatais empty. Subclasses can achieve class-specific backward compatible loading using the version number at local_metadata.get(“version”, None).Note
state_dictis not the same object as the inputstate_dicttoload_state_dict(). So it can be modified.- Parameters
state_dict (dict) – a dict containing parameters and persistent buffers.
prefix (str) – the prefix for parameters and buffers used in this module
local_metadata (dict) – a dict containing the metadata for this module. See
strict (bool) – whether to strictly enforce that the keys in
state_dictwithprefixmatch the names of parameters and buffers in this modulemissing_keys (list of str) – if
strict=True, add missing keys to this listunexpected_keys (list of str) – if
strict=True, add unexpected keys to this listerror_msgs (list of str) – error messages should be added to this list, and will be reported together in
load_state_dict()
- _named_members(get_members_fn, prefix='', recurse=True)
Helper method for yielding various names + members of modules.
- _register_load_state_dict_pre_hook(hook)
These hooks will be called with arguments: state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs, before loading state_dict into self. These arguments are exactly the same as those of _load_from_state_dict.
- _register_state_dict_hook(hook)
These hooks will be called with arguments: self, state_dict, prefix, local_metadata, after the state_dict of self is set. Note that only parameters and buffers of self or its children are guaranteed to exist in state_dict. The hooks may modify state_dict inplace or return a new one.
- _save_to_state_dict(destination, prefix, keep_vars)
Saves module state to destination dictionary, containing a state of the module, but not its descendants. This is called on every submodule in
state_dict().In rare cases, subclasses can achieve class-specific behavior by overriding this method with custom logic.
- Parameters
destination (dict) – a dict where state will be stored
prefix (str) – the prefix for parameters and buffers used in this module
- add_module(name: str, module: Optional[torch.nn.modules.module.Module]) None
Adds a child module to the current module.
The module can be accessed as an attribute using the given name.
- Parameters
name (string) – name of the child module. The child module can be accessed from this module using the given name
module (Module) – child module to be added to the module.
- apply(fn: Callable[[torch.nn.modules.module.Module], None]) torch.nn.modules.module.T
Applies
fnrecursively to every submodule (as returned by.children()) as well as self. Typical use includes initializing the parameters of a model (see also nn-init-doc).- Parameters
fn (
Module-> None) – function to be applied to each submodule- Returns
self
- Return type
Module
Example:
>>> @torch.no_grad() >>> def init_weights(m): >>> print(m) >>> if type(m) == nn.Linear: >>> m.weight.fill_(1.0) >>> print(m.weight) >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2)) >>> net.apply(init_weights) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )
- bfloat16() torch.nn.modules.module.T
Casts all floating point parameters and buffers to
bfloat16datatype.- Returns
self
- Return type
Module
- buffers(recurse: bool = True) Iterator[torch.Tensor]
Returns an iterator over module buffers.
- Parameters
recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.
- Yields
torch.Tensor – module buffer
Example:
>>> for buf in model.buffers(): >>> print(type(buf), buf.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
- children() Iterator[torch.nn.modules.module.Module]
Returns an iterator over immediate children modules.
- Yields
Module – a child module
- cpu() torch.nn.modules.module.T
Moves all model parameters and buffers to the CPU.
- Returns
self
- Return type
Module
- cuda(device: Optional[Union[int, torch.device]] = None) torch.nn.modules.module.T
Moves all model parameters and buffers to the GPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.
- Parameters
device (int, optional) – if specified, all parameters will be copied to that device
- Returns
self
- Return type
Module
- double() torch.nn.modules.module.T
Casts all floating point parameters and buffers to
doubledatatype.- Returns
self
- Return type
Module
- dump_patches: bool = False
This allows better BC support for
load_state_dict(). Instate_dict(), the version number will be saved as in the attribute _metadata of the returned state dict, and thus pickled. _metadata is a dictionary with keys that follow the naming convention of state dict. See_load_from_state_dicton how to use this information in loading.If new parameters/buffers are added/removed from a module, this number shall be bumped, and the module’s _load_from_state_dict method can compare the version number and do appropriate changes if the state dict is from before the change.
- eval() torch.nn.modules.module.T
Sets the module in evaluation mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout,BatchNorm, etc.This is equivalent with
self.train(False).- Returns
self
- Return type
Module
- extra_repr() str
Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- float() torch.nn.modules.module.T
Casts all floating point parameters and buffers to float datatype.
- Returns
self
- Return type
Module
- half() torch.nn.modules.module.T
Casts all floating point parameters and buffers to
halfdatatype.- Returns
self
- Return type
Module
- load_state_dict(state_dict: OrderedDict[str, Tensor], strict: bool = True)
Copies parameters and buffers from
state_dictinto this module and its descendants. IfstrictisTrue, then the keys ofstate_dictmust exactly match the keys returned by this module’sstate_dict()function.- Parameters
state_dict (dict) – a dict containing parameters and persistent buffers.
strict (bool, optional) – whether to strictly enforce that the keys in
state_dictmatch the keys returned by this module’sstate_dict()function. Default:True
- Returns
missing_keys is a list of str containing the missing keys
unexpected_keys is a list of str containing the unexpected keys
- Return type
NamedTuplewithmissing_keysandunexpected_keysfields
- modules() Iterator[torch.nn.modules.module.Module]
Returns an iterator over all modules in the network.
- Yields
Module – a module in the network
Note
Duplicate modules are returned only once. In the following example,
lwill be returned only once.Example:
>>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.modules()): print(idx, '->', m) 0 -> Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) 1 -> Linear(in_features=2, out_features=2, bias=True)
- named_buffers(prefix: str = '', recurse: bool = True) Iterator[Tuple[str, torch.Tensor]]
Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
- Parameters
prefix (str) – prefix to prepend to all buffer names.
recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.
- Yields
(string, torch.Tensor) – Tuple containing the name and buffer
Example:
>>> for name, buf in self.named_buffers(): >>> if name in ['running_var']: >>> print(buf.size())
- named_children() Iterator[Tuple[str, torch.nn.modules.module.Module]]
Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
- Yields
(string, Module) – Tuple containing a name and child module
Example:
>>> for name, module in model.named_children(): >>> if name in ['conv4', 'conv5']: >>> print(module)
- named_modules(memo: Optional[Set[torch.nn.modules.module.Module]] = None, prefix: str = '')
Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
- Yields
(string, Module) – Tuple of name and module
Note
Duplicate modules are returned only once. In the following example,
lwill be returned only once.Example:
>>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.named_modules()): print(idx, '->', m) 0 -> ('', Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )) 1 -> ('0', Linear(in_features=2, out_features=2, bias=True))
- named_parameters(prefix: str = '', recurse: bool = True) Iterator[Tuple[str, torch.nn.parameter.Parameter]]
Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
- Parameters
prefix (str) – prefix to prepend to all parameter names.
recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.
- Yields
(string, Parameter) – Tuple containing the name and parameter
Example:
>>> for name, param in self.named_parameters(): >>> if name in ['bias']: >>> print(param.size())
- parameters(recurse: bool = True) Iterator[torch.nn.parameter.Parameter]
Returns an iterator over module parameters.
This is typically passed to an optimizer.
- Parameters
recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.
- Yields
Parameter – module parameter
Example:
>>> for param in model.parameters(): >>> print(type(param), param.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
- register_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) torch.utils.hooks.RemovableHandle
Registers a backward hook on the module.
This function is deprecated in favor of
nn.Module.register_full_backward_hook()and the behavior of this function will change in future versions.- Returns
a handle that can be used to remove the added hook by calling
handle.remove()- Return type
torch.utils.hooks.RemovableHandle
- register_buffer(name: str, tensor: Optional[torch.Tensor], persistent: bool = True) None
Adds a buffer to the module.
This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s
running_meanis not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by settingpersistenttoFalse. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’sstate_dict.Buffers can be accessed as attributes using given names.
- Parameters
name (string) – name of the buffer. The buffer can be accessed from this module using the given name
tensor (Tensor) – buffer to be registered.
persistent (bool) – whether the buffer is part of this module’s
state_dict.
Example:
>>> self.register_buffer('running_mean', torch.zeros(num_features))
- register_forward_hook(hook: Callable[[...], None]) torch.utils.hooks.RemovableHandle
Registers a forward hook on the module.
The hook will be called every time after
forward()has computed an output. It should have the following signature:hook(module, input, output) -> None or modified output
The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the
forward. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called afterforward()is called.- Returns
a handle that can be used to remove the added hook by calling
handle.remove()- Return type
torch.utils.hooks.RemovableHandle
- register_forward_pre_hook(hook: Callable[[...], None]) torch.utils.hooks.RemovableHandle
Registers a forward pre-hook on the module.
The hook will be called every time before
forward()is invoked. It should have the following signature:hook(module, input) -> None or modified input
The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the
forward. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple).- Returns
a handle that can be used to remove the added hook by calling
handle.remove()- Return type
torch.utils.hooks.RemovableHandle
- register_full_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) torch.utils.hooks.RemovableHandle
Registers a backward hook on the module.
The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature:
hook(module, grad_input, grad_output) -> tuple(Tensor) or None
The
grad_inputandgrad_outputare tuples that contain the gradients with respect to the inputs and outputs respectively. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place ofgrad_inputin subsequent computations.grad_inputwill only correspond to the inputs given as positional arguments and all kwarg arguments are ignored. Entries ingrad_inputandgrad_outputwill beNonefor all non-Tensor arguments.Warning
Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error.
- Returns
a handle that can be used to remove the added hook by calling
handle.remove()- Return type
torch.utils.hooks.RemovableHandle
- register_parameter(name: str, param: Optional[torch.nn.parameter.Parameter]) None
Adds a parameter to the module.
The parameter can be accessed as an attribute using given name.
- Parameters
name (string) – name of the parameter. The parameter can be accessed from this module using the given name
param (Parameter) – parameter to be added to the module.
- requires_grad_(requires_grad: bool = True) torch.nn.modules.module.T
Change if autograd should record operations on parameters in this module.
This method sets the parameters’
requires_gradattributes in-place.This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training).
- Parameters
requires_grad (bool) – whether autograd should record operations on parameters in this module. Default:
True.- Returns
self
- Return type
Module
- state_dict(destination=None, prefix='', keep_vars=False)
Returns a dictionary containing a whole state of the module.
Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names.
- Returns
a dictionary containing a whole state of the module
- Return type
dict
Example:
>>> module.state_dict().keys() ['bias', 'weight']
- to(*args, **kwargs)
Moves and/or casts the parameters and buffers.
This can be called as
- to(device=None, dtype=None, non_blocking=False)
- to(dtype, non_blocking=False)
- to(tensor, non_blocking=False)
- to(memory_format=torch.channels_last)
Its signature is similar to
torch.Tensor.to(), but only accepts floating point or complexdtype`s. In addition, this method will only cast the floating point or complex parameters and buffers to :attr:`dtype(if given). The integral parameters and buffers will be moveddevice, if that is given, but with dtypes unchanged. Whennon_blockingis set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.See below for examples.
Note
This method modifies the module in-place.
- Parameters
device (
torch.device) – the desired device of the parameters and buffers in this moduledtype (
torch.dtype) – the desired floating point or complex dtype of the parameters and buffers in this moduletensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module
memory_format (
torch.memory_format) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)
- Returns
self
- Return type
Module
Examples:
>>> linear = nn.Linear(2, 2) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]]) >>> linear.to(torch.double) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]], dtype=torch.float64) >>> gpu1 = torch.device("cuda:1") >>> linear.to(gpu1, dtype=torch.half, non_blocking=True) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1') >>> cpu = torch.device("cpu") >>> linear.to(cpu) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16) >>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble) >>> linear.weight Parameter containing: tensor([[ 0.3741+0.j, 0.2382+0.j], [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128) >>> linear(torch.ones(3, 2, dtype=torch.cdouble)) tensor([[0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)
- train(mode: bool = True) torch.nn.modules.module.T
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout,BatchNorm, etc.- Parameters
mode (bool) – whether to set training mode (
True) or evaluation mode (False). Default:True.- Returns
self
- Return type
Module
- type(dst_type: Union[torch.dtype, str]) torch.nn.modules.module.T
Casts all parameters and buffers to
dst_type.- Parameters
dst_type (type or string) – the desired type
- Returns
self
- Return type
Module
- xpu(device: Optional[Union[int, torch.device]] = None) torch.nn.modules.module.T
Moves all model parameters and buffers to the XPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on XPU while being optimized.
- Parameters
device (int, optional) – if specified, all parameters will be copied to that device
- Returns
self
- Return type
Module
- zero_grad(set_to_none: bool = False) None
Sets gradients of all model parameters to zero. See similar function under
torch.optim.Optimizerfor more context.- Parameters
set_to_none (bool) – instead of setting to zero, set the grads to None. See
torch.optim.Optimizer.zero_grad()for details.
- training: bool
- class hyperion.torch.metrics.accuracy.BinaryAccuracyWithLogits(weight=None, reduction='mean', thr=0.0)[source]
- __init__(weight=None, reduction='mean', thr=0.0)[source]
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(input, target)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- T_destination
alias of TypeVar(‘T_destination’, bound=
Mapping[str,torch.Tensor])
- _get_backward_hooks()
Returns the backward hooks for use in the call function. It returns two lists, one with the full backward hooks and one with the non-full backward hooks.
- _load_from_state_dict(state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs)
Copies parameters and buffers from
state_dictinto only this module, but not its descendants. This is called on every submodule inload_state_dict(). Metadata saved for this module in inputstate_dictis provided aslocal_metadata. For state dicts without metadata,local_metadatais empty. Subclasses can achieve class-specific backward compatible loading using the version number at local_metadata.get(“version”, None).Note
state_dictis not the same object as the inputstate_dicttoload_state_dict(). So it can be modified.- Parameters
state_dict (dict) – a dict containing parameters and persistent buffers.
prefix (str) – the prefix for parameters and buffers used in this module
local_metadata (dict) – a dict containing the metadata for this module. See
strict (bool) – whether to strictly enforce that the keys in
state_dictwithprefixmatch the names of parameters and buffers in this modulemissing_keys (list of str) – if
strict=True, add missing keys to this listunexpected_keys (list of str) – if
strict=True, add unexpected keys to this listerror_msgs (list of str) – error messages should be added to this list, and will be reported together in
load_state_dict()
- _named_members(get_members_fn, prefix='', recurse=True)
Helper method for yielding various names + members of modules.
- _register_load_state_dict_pre_hook(hook)
These hooks will be called with arguments: state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs, before loading state_dict into self. These arguments are exactly the same as those of _load_from_state_dict.
- _register_state_dict_hook(hook)
These hooks will be called with arguments: self, state_dict, prefix, local_metadata, after the state_dict of self is set. Note that only parameters and buffers of self or its children are guaranteed to exist in state_dict. The hooks may modify state_dict inplace or return a new one.
- _save_to_state_dict(destination, prefix, keep_vars)
Saves module state to destination dictionary, containing a state of the module, but not its descendants. This is called on every submodule in
state_dict().In rare cases, subclasses can achieve class-specific behavior by overriding this method with custom logic.
- Parameters
destination (dict) – a dict where state will be stored
prefix (str) – the prefix for parameters and buffers used in this module
- add_module(name: str, module: Optional[torch.nn.modules.module.Module]) None
Adds a child module to the current module.
The module can be accessed as an attribute using the given name.
- Parameters
name (string) – name of the child module. The child module can be accessed from this module using the given name
module (Module) – child module to be added to the module.
- apply(fn: Callable[[torch.nn.modules.module.Module], None]) torch.nn.modules.module.T
Applies
fnrecursively to every submodule (as returned by.children()) as well as self. Typical use includes initializing the parameters of a model (see also nn-init-doc).- Parameters
fn (
Module-> None) – function to be applied to each submodule- Returns
self
- Return type
Module
Example:
>>> @torch.no_grad() >>> def init_weights(m): >>> print(m) >>> if type(m) == nn.Linear: >>> m.weight.fill_(1.0) >>> print(m.weight) >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2)) >>> net.apply(init_weights) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )
- bfloat16() torch.nn.modules.module.T
Casts all floating point parameters and buffers to
bfloat16datatype.- Returns
self
- Return type
Module
- buffers(recurse: bool = True) Iterator[torch.Tensor]
Returns an iterator over module buffers.
- Parameters
recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.
- Yields
torch.Tensor – module buffer
Example:
>>> for buf in model.buffers(): >>> print(type(buf), buf.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
- children() Iterator[torch.nn.modules.module.Module]
Returns an iterator over immediate children modules.
- Yields
Module – a child module
- cpu() torch.nn.modules.module.T
Moves all model parameters and buffers to the CPU.
- Returns
self
- Return type
Module
- cuda(device: Optional[Union[int, torch.device]] = None) torch.nn.modules.module.T
Moves all model parameters and buffers to the GPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.
- Parameters
device (int, optional) – if specified, all parameters will be copied to that device
- Returns
self
- Return type
Module
- double() torch.nn.modules.module.T
Casts all floating point parameters and buffers to
doubledatatype.- Returns
self
- Return type
Module
- dump_patches: bool = False
This allows better BC support for
load_state_dict(). Instate_dict(), the version number will be saved as in the attribute _metadata of the returned state dict, and thus pickled. _metadata is a dictionary with keys that follow the naming convention of state dict. See_load_from_state_dicton how to use this information in loading.If new parameters/buffers are added/removed from a module, this number shall be bumped, and the module’s _load_from_state_dict method can compare the version number and do appropriate changes if the state dict is from before the change.
- eval() torch.nn.modules.module.T
Sets the module in evaluation mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout,BatchNorm, etc.This is equivalent with
self.train(False).- Returns
self
- Return type
Module
- extra_repr() str
Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- float() torch.nn.modules.module.T
Casts all floating point parameters and buffers to float datatype.
- Returns
self
- Return type
Module
- half() torch.nn.modules.module.T
Casts all floating point parameters and buffers to
halfdatatype.- Returns
self
- Return type
Module
- load_state_dict(state_dict: OrderedDict[str, Tensor], strict: bool = True)
Copies parameters and buffers from
state_dictinto this module and its descendants. IfstrictisTrue, then the keys ofstate_dictmust exactly match the keys returned by this module’sstate_dict()function.- Parameters
state_dict (dict) – a dict containing parameters and persistent buffers.
strict (bool, optional) – whether to strictly enforce that the keys in
state_dictmatch the keys returned by this module’sstate_dict()function. Default:True
- Returns
missing_keys is a list of str containing the missing keys
unexpected_keys is a list of str containing the unexpected keys
- Return type
NamedTuplewithmissing_keysandunexpected_keysfields
- modules() Iterator[torch.nn.modules.module.Module]
Returns an iterator over all modules in the network.
- Yields
Module – a module in the network
Note
Duplicate modules are returned only once. In the following example,
lwill be returned only once.Example:
>>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.modules()): print(idx, '->', m) 0 -> Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) 1 -> Linear(in_features=2, out_features=2, bias=True)
- named_buffers(prefix: str = '', recurse: bool = True) Iterator[Tuple[str, torch.Tensor]]
Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
- Parameters
prefix (str) – prefix to prepend to all buffer names.
recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.
- Yields
(string, torch.Tensor) – Tuple containing the name and buffer
Example:
>>> for name, buf in self.named_buffers(): >>> if name in ['running_var']: >>> print(buf.size())
- named_children() Iterator[Tuple[str, torch.nn.modules.module.Module]]
Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
- Yields
(string, Module) – Tuple containing a name and child module
Example:
>>> for name, module in model.named_children(): >>> if name in ['conv4', 'conv5']: >>> print(module)
- named_modules(memo: Optional[Set[torch.nn.modules.module.Module]] = None, prefix: str = '')
Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
- Yields
(string, Module) – Tuple of name and module
Note
Duplicate modules are returned only once. In the following example,
lwill be returned only once.Example:
>>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.named_modules()): print(idx, '->', m) 0 -> ('', Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )) 1 -> ('0', Linear(in_features=2, out_features=2, bias=True))
- named_parameters(prefix: str = '', recurse: bool = True) Iterator[Tuple[str, torch.nn.parameter.Parameter]]
Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
- Parameters
prefix (str) – prefix to prepend to all parameter names.
recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.
- Yields
(string, Parameter) – Tuple containing the name and parameter
Example:
>>> for name, param in self.named_parameters(): >>> if name in ['bias']: >>> print(param.size())
- parameters(recurse: bool = True) Iterator[torch.nn.parameter.Parameter]
Returns an iterator over module parameters.
This is typically passed to an optimizer.
- Parameters
recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.
- Yields
Parameter – module parameter
Example:
>>> for param in model.parameters(): >>> print(type(param), param.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
- register_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) torch.utils.hooks.RemovableHandle
Registers a backward hook on the module.
This function is deprecated in favor of
nn.Module.register_full_backward_hook()and the behavior of this function will change in future versions.- Returns
a handle that can be used to remove the added hook by calling
handle.remove()- Return type
torch.utils.hooks.RemovableHandle
- register_buffer(name: str, tensor: Optional[torch.Tensor], persistent: bool = True) None
Adds a buffer to the module.
This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s
running_meanis not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by settingpersistenttoFalse. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’sstate_dict.Buffers can be accessed as attributes using given names.
- Parameters
name (string) – name of the buffer. The buffer can be accessed from this module using the given name
tensor (Tensor) – buffer to be registered.
persistent (bool) – whether the buffer is part of this module’s
state_dict.
Example:
>>> self.register_buffer('running_mean', torch.zeros(num_features))
- register_forward_hook(hook: Callable[[...], None]) torch.utils.hooks.RemovableHandle
Registers a forward hook on the module.
The hook will be called every time after
forward()has computed an output. It should have the following signature:hook(module, input, output) -> None or modified output
The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the
forward. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called afterforward()is called.- Returns
a handle that can be used to remove the added hook by calling
handle.remove()- Return type
torch.utils.hooks.RemovableHandle
- register_forward_pre_hook(hook: Callable[[...], None]) torch.utils.hooks.RemovableHandle
Registers a forward pre-hook on the module.
The hook will be called every time before
forward()is invoked. It should have the following signature:hook(module, input) -> None or modified input
The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the
forward. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple).- Returns
a handle that can be used to remove the added hook by calling
handle.remove()- Return type
torch.utils.hooks.RemovableHandle
- register_full_backward_hook(hook: Callable[[torch.nn.modules.module.Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[None, torch.Tensor]]) torch.utils.hooks.RemovableHandle
Registers a backward hook on the module.
The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature:
hook(module, grad_input, grad_output) -> tuple(Tensor) or None
The
grad_inputandgrad_outputare tuples that contain the gradients with respect to the inputs and outputs respectively. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place ofgrad_inputin subsequent computations.grad_inputwill only correspond to the inputs given as positional arguments and all kwarg arguments are ignored. Entries ingrad_inputandgrad_outputwill beNonefor all non-Tensor arguments.Warning
Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error.
- Returns
a handle that can be used to remove the added hook by calling
handle.remove()- Return type
torch.utils.hooks.RemovableHandle
- register_parameter(name: str, param: Optional[torch.nn.parameter.Parameter]) None
Adds a parameter to the module.
The parameter can be accessed as an attribute using given name.
- Parameters
name (string) – name of the parameter. The parameter can be accessed from this module using the given name
param (Parameter) – parameter to be added to the module.
- requires_grad_(requires_grad: bool = True) torch.nn.modules.module.T
Change if autograd should record operations on parameters in this module.
This method sets the parameters’
requires_gradattributes in-place.This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training).
- Parameters
requires_grad (bool) – whether autograd should record operations on parameters in this module. Default:
True.- Returns
self
- Return type
Module
- state_dict(destination=None, prefix='', keep_vars=False)
Returns a dictionary containing a whole state of the module.
Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names.
- Returns
a dictionary containing a whole state of the module
- Return type
dict
Example:
>>> module.state_dict().keys() ['bias', 'weight']
- to(*args, **kwargs)
Moves and/or casts the parameters and buffers.
This can be called as
- to(device=None, dtype=None, non_blocking=False)
- to(dtype, non_blocking=False)
- to(tensor, non_blocking=False)
- to(memory_format=torch.channels_last)
Its signature is similar to
torch.Tensor.to(), but only accepts floating point or complexdtype`s. In addition, this method will only cast the floating point or complex parameters and buffers to :attr:`dtype(if given). The integral parameters and buffers will be moveddevice, if that is given, but with dtypes unchanged. Whennon_blockingis set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.See below for examples.
Note
This method modifies the module in-place.
- Parameters
device (
torch.device) – the desired device of the parameters and buffers in this moduledtype (
torch.dtype) – the desired floating point or complex dtype of the parameters and buffers in this moduletensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module
memory_format (
torch.memory_format) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)
- Returns
self
- Return type
Module
Examples:
>>> linear = nn.Linear(2, 2) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]]) >>> linear.to(torch.double) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]], dtype=torch.float64) >>> gpu1 = torch.device("cuda:1") >>> linear.to(gpu1, dtype=torch.half, non_blocking=True) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1') >>> cpu = torch.device("cpu") >>> linear.to(cpu) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16) >>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble) >>> linear.weight Parameter containing: tensor([[ 0.3741+0.j, 0.2382+0.j], [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128) >>> linear(torch.ones(3, 2, dtype=torch.cdouble)) tensor([[0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)
- train(mode: bool = True) torch.nn.modules.module.T
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout,BatchNorm, etc.- Parameters
mode (bool) – whether to set training mode (
True) or evaluation mode (False). Default:True.- Returns
self
- Return type
Module
- type(dst_type: Union[torch.dtype, str]) torch.nn.modules.module.T
Casts all parameters and buffers to
dst_type.- Parameters
dst_type (type or string) – the desired type
- Returns
self
- Return type
Module
- xpu(device: Optional[Union[int, torch.device]] = None) torch.nn.modules.module.T
Moves all model parameters and buffers to the XPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on XPU while being optimized.
- Parameters
device (int, optional) – if specified, all parameters will be copied to that device
- Returns
self
- Return type
Module
- zero_grad(set_to_none: bool = False) None
Sets gradients of all model parameters to zero. See similar function under
torch.optim.Optimizerfor more context.- Parameters
set_to_none (bool) – instead of setting to zero, set the grads to None. See
torch.optim.Optimizer.zero_grad()for details.
- training: bool
Metric Functions
Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- hyperion.torch.metrics.accuracy_functional.categorical_accuracy(input, target, weight=None, reduction='mean')[source]
Loggers
The logger classes are used to write information to standard output, log files, tensorboard or WandB.
The LoggerList class contains a set of loggers. When we log something to the LoggerList, the same is written
in all the loggers contained in it. The loggers support multi-gpu training with DistributedDataParallel
Individual Loggers
- class hyperion.torch.loggers.logger.Logger[source]
Base class for logger objects
- params
training params dictionary
- on_epoch_begin(epoch, logs, **kwargs)[source]
At the start of an epoch
- Parameters
epoch – index of the epoch
logs – dictionary of logs
- on_batch_begin(batch, logs, **kwargs)[source]
At the start of a batch
- Parameters
batch – batch index within the epoch
logs – dictionary of logs
- on_batch_end(logs, **kwargs)[source]
At the end of a batch
- Parameters
batch – batch index within the epoch
logs – dictionary of logs
- class hyperion.torch.loggers.prog_logger.ProgLogger(metrics=None, interval=10)[source]
Logger that prints training progress to stdout
- metrics
list of metrics
- interval
number of batches between prints
- on_train_begin(logs=None, **kwargs)[source]
At the start of training
- Parameters
logs – dictionary of logs
- on_epoch_begin(epoch, logs=None, **kwargs)[source]
At the start of an epoch
- Parameters
epoch – index of the epoch
logs – dictionary of logs
- on_batch_begin(batch, logs=None, **kwargs)[source]
At the start of a batch
- Parameters
batch – batch index within the epoch
logs – dictionary of logs
- on_batch_end(logs=None, **kwargs)[source]
At the end of a batch
- Parameters
batch – batch index within the epoch
logs – dictionary of logs
- on_epoch_end(logs=None, **kwargs)[source]
At the end of an epoch
- Parameters
logs – dictionary of logs
- on_train_end(logs, **kwargs)
At the end of training
- Parameters
batch – batch index within the epoch
logs – dictionary of logs
- class hyperion.torch.loggers.csv_logger.CSVLogger(file_path, sep=',', append=False)[source]
- Logger that prints metrics to csv file
at the end of each epoch
- file_path
filenane of csv file.
- sep
column separator for csv file
- append
False, overwrite existing file, True, appends.
- on_train_begin(logs=None, **kwargs)[source]
At the start of training
- Parameters
logs – dictionary of logs
- on_epoch_end(logs=None, **kwargs)[source]
At the end of an epoch
- Parameters
logs – dictionary of logs
- on_train_end(logs=None, **kwargs)[source]
At the end of training
- Parameters
batch – batch index within the epoch
logs – dictionary of logs
- on_batch_begin(batch, logs, **kwargs)
At the start of a batch
- Parameters
batch – batch index within the epoch
logs – dictionary of logs
- on_batch_end(logs, **kwargs)
At the end of a batch
- Parameters
batch – batch index within the epoch
logs – dictionary of logs
- on_epoch_begin(epoch, logs, **kwargs)
At the start of an epoch
- Parameters
epoch – index of the epoch
logs – dictionary of logs
- class hyperion.torch.loggers.tensorboard_logger.TensorBoardLogger(tb_path, interval=10)[source]
Logger that sends training progress to tensorboard
- tb_path
tensorboard output directory
- on_train_begin(logs=None, **kwargs)[source]
At the start of training
- Parameters
logs – dictionary of logs
- on_epoch_begin(epoch, logs=None, **kwargs)[source]
At the start of an epoch
- Parameters
epoch – index of the epoch
logs – dictionary of logs
- on_batch_end(logs=None, **kwargs)[source]
At the end of a batch
- Parameters
batch – batch index within the epoch
logs – dictionary of logs
- on_epoch_end(logs=None, **kwargs)[source]
At the end of an epoch
- Parameters
logs – dictionary of logs
- on_train_end(logs=None, **kwargs)[source]
At the end of training
- Parameters
batch – batch index within the epoch
logs – dictionary of logs
- on_batch_begin(batch, logs, **kwargs)
At the start of a batch
- Parameters
batch – batch index within the epoch
logs – dictionary of logs
- class hyperion.torch.loggers.wandb_logger.WAndBLogger(project=None, group=None, name=None, path=None, mode='online', interval=10)[source]
Logger that sends training progress to weights and biases (wandb)
- tb_path
tensorboard output directory
- on_train_begin(logs=None, **kwargs)[source]
At the start of training
- Parameters
logs – dictionary of logs
- on_epoch_begin(epoch, logs=None, **kwargs)[source]
At the start of an epoch
- Parameters
epoch – index of the epoch
logs – dictionary of logs
- on_batch_end(logs=None, **kwargs)[source]
At the end of a batch
- Parameters
batch – batch index within the epoch
logs – dictionary of logs
- on_epoch_end(logs=None, **kwargs)[source]
At the end of an epoch
- Parameters
logs – dictionary of logs
- on_train_end(logs=None, **kwargs)[source]
At the end of training
- Parameters
batch – batch index within the epoch
logs – dictionary of logs
- on_batch_begin(batch, logs, **kwargs)
At the start of a batch
- Parameters
batch – batch index within the epoch
logs – dictionary of logs
Logger List
- class hyperion.torch.loggers.logger_list.LoggerList(loggers=None)[source]
Container for a list of logger callbacks
- loggers
list of Logger objects
- property tensorboard_logger
- property tensorboard_writer
- on_epoch_begin(epoch, logs=None, **kwargs)[source]
At the start of an epoch
- Parameters
epoch – index of the epoch
logs – dictionary of logs
- on_epoch_end(logs=None, **kwargs)[source]
At the end of an epoch
- Parameters
epoch – index of the epoch
logs – dictionary of logs
- on_batch_begin(batch, logs=None, **kwargs)[source]
At the start of a batch
- Parameters
batch – batch index within the epoch
logs – dictionary of logs
- on_batch_end(logs=None, **kwargs)[source]
At the end of a batch
- Parameters
batch – batch index within the epoch
logs – dictionary of logs
Utils
Device Handling Utils
Utilities to handle GPU devices, like finding a free GPU in a shared server.
Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
Distributed Data Parallel Utils
These contains utils to perform multigpu training with Distributed Data Paralell.
Copyright 2021 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- hyperion.torch.utils.ddp.ddp_init(gpu_id, num_gpus, node_id=0, num_nodes=1, master_addr='localhost', master_port=None)[source]
- class hyperion.torch.utils.ddp.TorchDDP(*args: Any, **kwargs: Any)[source]
- __init__(*args: Any, **kwargs: Any) None
- class hyperion.torch.utils.ddp.FairShardedDDP(*args: Any, **kwargs: Any)[source]
- __init__(module: torch.nn.Module, sharded_optimizer: Union[fairscale.optim.oss.OSS, List[fairscale.optim.oss.OSS]], process_group: Optional[Any] = None, broadcast_buffers: bool = True, sync_models_at_startup: bool = True, reduce_buffer_size: int = 8388608, auto_refresh_trainable: bool = True, reduce_fp16: bool = False)
- _clear_counters() None
Reset all the grad reduce and call counters
- _consume_work_handles() None
Consume all the futures which are tied to this optimizer’s buckets. We start from the first/older ones, since they are the most likely to be ready and non-blocking
- _get_reduce_fn(index: int, param: torch.Tensor, dst_rank: int) Callable
Two possible backward hooks for a given parameter: either directly reduce to the appropriate rank, or contribute to a bucket and reduce when the bucket is full.
Either way a delayed action is necessary and is passed as a callback.
- _passing_sync_batchnorm_handle(module: torch.nn.Module) None
Passes handle required for
torch.nn.modules.SyncBatchNorm. Adapted fromtorch.nn.distributed.DistributedDataParallel.
- _setup_backward_hooks() None
Attach a reduce function to each grad-requiring parameter. This makes the gradient reduction automatic whenever there’s a backward pass
- _setup_bucket_strategy() None
Devise a bucketing strategy on a per-rank ownership level. These buckets will not be sharded, since the gradients would be re-allocated during the backward in that case. This method can be a slow for big models, but it it not typically called often (not for every forward for instance)
- _sync_params_and_buffers() None
Sync the complete model states in between the ranks
- _try_consume_work_handle() None
Try to consume the oldest future. This is non blocking, if not ready we’ll pass
- forward(*inputs: Any, **kwargs: Any) Any
Module forward pass, handles any DDP-specific work in the background. Primes the backward pass for gradient reduction to the proper ranks.
- no_sync() Generator
A context manager to disable gradient synchronization.
- reduce() None
This does not need to be called, the gradient reduction is done automatically during the BW pass. Use this method to reduce the gradients manually
- refresh_trainable() None
If the module trainability has changed, update all the assumptions
- sync_buffers(blocking: bool = False) None
Sync all the param buffers in between ranks (including for instance batch norm statistics).
- Parameters
blocking (bool) – wait for the operation to conclude.
- to(device: Optional[torch.device], dtype: Optional[torch.dtype] = None, non_blocking: bool = False) fairscale.nn.data_parallel.sharded_ddp.ShardedDataParallel
Moves and/or casts the parameters and buffers.
Its signature is similar to
torch.Tensor.to(), but only accepts floating point desireddtypes. In addition, this method will only cast the floating point parameters and buffers todtype(if given). The integral parameters and buffers will be moveddevice, if that is given, but with dtypes unchanged. Whennon_blockingis set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.Note
This method modifies the module in-place.
- Parameters
device (
torch.device) – the desired device of the parameters and buffers in this module.dtype (
torch.dtype) – the desired floating point type of the floating point parameters and buffers.non_blocking (bool) – make it an asynchronous call.
- Returns
self.
- Return type
Module
- zero_grad(set_to_none: bool = False) None
Sets gradients of all model parameters to zero. See similar function under
torch.optim.Optimizerfor more context.- Parameters
set_to_none (bool) – instead of setting to zero, set the grads to None. See
torch.optim.Optimizer.zero_grad()for details.
- class hyperion.torch.utils.ddp.FairFullyShardedDDP(*args: Any, **kwargs: Any)[source]
- __getstate__() Dict[str, str]
Serialize the state of the current FullyShardedDataParallel instance.
Some properties are not serializable (e.g., process groups, streams), so we remove them and try to reconstruct them in
__setstate__().
- __init__(module: torch.nn.Module, process_group: Optional[torch.distributed.ProcessGroup] = None, reshard_after_forward: bool = True, mixed_precision: bool = False, fp32_reduce_scatter: bool = False, flatten_parameters: bool = True, move_params_to_cpu: bool = False, compute_dtype: Optional[torch.dtype] = None, buffer_dtype: Optional[torch.dtype] = None, move_grads_to_cpu: Optional[bool] = None, bucket_cap_mb: int = 25, compute_device: Optional[torch.device] = None, no_broadcast_optim_state: Optional[bool] = False, state_dict_device: Optional[torch.device] = None, clear_autocast_cache: bool = False, force_input_to_fp32: bool = False, verbose: bool = False, cpu_offload: bool = False)
- __setstate__(state: Dict[str, Any]) None
Intercept state setting and perform needed changes on params.
- _broadcast_pad_info_to_r0() List[List[List[int]]]
Collect [x.numel_padded_per_param for x in self._fsdp_instances] from teach rank.
- _cast_buffers(device: Optional[torch.device] = None, dtype: Optional[torch.dtype] = None, memo: Optional[Set] = None) None
Move all buffers to the given device and dtype.
If device or dtype are not given, then they will default to
self.compute_deviceandself.buffer_dtype, respectively. In the case of nested FSDP instances, we will respect the child instance’scompute_deviceandbuffer_dtypeconfiguration.- Parameters
device (torch.device, Optional) – device to cast buffers to (defaults to compute_device)
dtype (torch.dtype, Optional) – dtype to cast buffers to (defaults to buffer_dtype)
memo (Set, Optional) – set of modules that have already been processed
- _cast_fp32_param_shards_to_fp16(params: Optional[List[torch.nn.Parameter]] = None) None
Cast FP32 param shard to FP16 for a list of params.
- _free_fp16_param_shard(params: Optional[List[torch.nn.Parameter]] = None) None
Free storage for FP16 shards for a list of params.
- _free_full_params(params: Optional[List[torch.nn.Parameter]] = None) None
Free up storage for full parameters.
- _gather_optim_state(sd_state: Dict[int, Dict[str, Any]]) Tuple[Dict[int, Dict[str, List]], Dict[int, Dict[str, List]]]
For each value in state[i], if the value is a tensor, collect it from the world. Else use rank 0’s entry.
- _get_shard(tensor: torch.Tensor) Tuple[torch.Tensor, int]
Return the local shard of a full tensor.
- _init_param_attributes(p: torch.nn.Parameter) None
We manage several attributes on each Parameter instance. The first two are set by
_shard_parameters_():_is_sharded:Trueif the Parameter is sharded orFalseif the Parameter is intentionally not sharded (in which case we will all-reduce grads for this param).
_orig_size: the size of the original Parameter (before sharding)- The remaining attributes are set here:
_fp32_shard: a single shard of the parameters in full precision(typically FP32, but this is dependent on the dtype of the model as it’s passed in by the user). This can be on CPU or GPU depending on the value of ``cpu_offload``.
_fp16_shard: if ``mixed_precision`` isTrue, this will bea single shard of the parameters in FP16, used for all-gather.
_full_param_padded: the full weight (padded to be evenlydivisible by
world_size), used for computation in the forward and backward pass. This will be resized in place and only materialized (via all-gather) as needed.
- _lazy_init() None
Initialization steps that should happen lazily, typically right before the first forward pass.
- _load_state_dict(state_dict: Union[Dict[str, torch.Tensor], OrderedDict[str, torch.Tensor]], strict: bool = True) NamedTuple
Load a whole (unsharded) state_dict.
Warning
This needs to be called on all ranks, since synchronization primitives will be used.
- _post_backward_hook(param: torch.nn.Parameter, *unused: Any) None
At the start of
_post_backward_hook(),param.gradcontains the full gradient for the local batch. The reduce-scatter op will replaceparam.gradwith a single shard of the summed gradient across all GPUs. This shard will align with the current GPU rank. For example:before reduce_scatter: param.grad (GPU #0): [1, 2, 3, 4] param.grad (GPU #1): [5, 6, 7, 8] after reduce_scatter: param.grad (GPU #0): [6, 8] # 1+5, 2+6 param.grad (GPU #1): [10, 12] # 3+7, 4+8
The local GPU’s
optim.stepis responsible for updating a single shard of params, also corresponding to the current GPU’s rank. This alignment is created by_shard_parameters_(), which ensures that the local optimizer only sees the relevant parameter shard.
- _post_reduction_hook(param: torch.nn.Parameter, reduced_grad: torch.Tensor) None
Hook to call on each param after the reduce-scatter.
- _prep_grads_for_backward() None
Make sure p.grad has the correct size/device, otherwise set it to None.
- _print_r0(msg: str, restart: bool = False) None
Debugging utility to print memory usage stats nicely on rank 0
- _queue_wait_for_post_backward() None
Try to queue a wait_for_post_backward callback.
Only called on root and only queue one callback. But can be called by children FSDPs via a closure in case the root instance doesn’t own any params.
- _rebuild_full_params(force_full_precision: bool = False) Optional[List[Tuple[torch.Tensor, bool]]]
Gather all shards of params.
- Parameters
force_full_precision (bool, Optional) – by default params will be gathered in
compute_dtype(e.g., FP16), unless force_full_precision isTrue, in which case they will be gathered in full precision (e.g., FP32), possibly in fresh storage. The parameter that’s being rebuilt will end up in full precision as well.- Returns
A list of tuples, where the first element is the full-sized param and the second element is a bool indicating if it’s safe for the caller to free the full-sized param. This will be
Noneifforce_full_precision=Falseand the full params are already gathered.
- _register_post_backward_hooks() None
Register backward hooks to reshard params and reduce-scatter grads.
This is called during forward pass. The goal is to attach a hook on each of the parameter’s gradient generating function (
grad_accbelow) so that the hook is called after all gradients for that param are computed.Goals:
1. We want the hook to fire once and only once after all gradients are accumulated for a param. 2. If it fires more than once, we end up incorrectly shard the grad multiple times. (could lead to dimension too small) 3. If it fires once but too early or doesn’t fire, we leave gradients unsharded. (could lead to dimension too large)
Due to multiple-pass forward, this function can be called on the same parameter multiple times in a single forward pass. If we register the hook multiple time, we end up getting called multiple times. We could try to get a new hook every time and delete the previous one registered. However, due to unknown reason (I have debugged it for a long time!), in mixed precision mode, we get two different
grad_accobjects below during different calls of this function (in the same forward pass). If we keep the last one, the hook end up firing too early. In full precision mode, we luckily get the samegrad_accobject, so deleting and re-registering still ensured the hook fire once after all gradients are generated.Empirically, keep the first hook register per forward pass seems to work the best. We do need to remove the hook at the end of the backward pass. Otherwise, the next forward pass will not register a new hook, which is needed for a new forward pass.
- _register_pre_backward_hooks(outputs: Any) Any
Register pre-backward hook to run before the wrapped module’s backward. Hooks should be attached to all outputs from the forward.
- Returns
new outputs with hooks registered if they requires gradient.
- Return type
outputs
- _reset_lazy_init() None
Reset instance so
_lazy_init()will run on the next forward.
- _set_is_root() None
If
True, implies that no otherFullyShardedDataParallelinstance wraps this one. Called once by_lazy_init(). Also sets self.children_share_process_group = True if all child instances share the same process group. If some child instances use a different process group, self.clip_grad_norm_ will raise an error.
- _setup_streams() None
Create streams to overlap data transfer and computation.
- _shard_parameters_() None
At initialization we wrap a module with full parameters and shard the parameters in-place. Sharding is implemented by viewing each parameter as a 1D Tensor and retaining only a single slice, where the slice size is determined by the number of data parallel workers.
Wrapping modules with many small parameters (or with a very large data parallel world size) will result in many small parameter shards and slow performance. In this case it’s better to set ``flatten_parameters`` to
True, so that all of the small parameters in the module are combined into a single contiguous Tensor and sharded once.After this initial sharding is complete, the user can initialize a
torch.optim.Optimizerin the usual way, i.e.:.. code-block:: python
optim = torch.optim.Adam(sharded_module.parameters(), lr=0.0001)
The optimizer will see only a single slice of parameters and will thus allocate less memory for optimizer state, avoiding redundancy across data parallel workers.
- _use_fp32_param_shard(params: Optional[List[torch.nn.Parameter]] = None) None
Use FP32 shard for a list of params.
- _use_full_params() None
Switch p.data pointers to use the full params.
Note: this assumes full params are already gathered.
- _wait_for_post_backward() None
Wait for post-backward to finish. Only called on root instance.
- _wait_for_previous_optim_step() None
The outer-most
FullyShardedDataParallelinstance (i.e., the root instance) needs to synchronize with the default stream to ensure the previous optimizer step is done.
- apply(fn: Callable[[torch.nn.Module], None]) fairscale.nn.data_parallel.fully_sharded_data_parallel.FullyShardedDataParallel
Applies
fnrecursively to every submodule (as returned by.children()) as well as self. Typical use includes initializing the parameters of a model.Compared to
torch.nn.Module.apply, this version additionally gathers the full parameters before applyingfn. It should not be called from within anothersummon_full_paramscontext.- Parameters
fn (nn.Module) – function to be applied to each submodule
- Returns
self
- Return type
Module
- assert_state(state: Union[fairscale.nn.data_parallel.fully_sharded_data_parallel.TrainingState, List[fairscale.nn.data_parallel.fully_sharded_data_parallel.TrainingState]]) None
Assert we are in the given state.
- clip_grad_norm_(max_norm: Union[float, int], norm_type: Union[float, int] = 2.0) torch.Tensor
Clip all gradients at this point in time. The norm is computed over all gradients together, as if they were concatenated into a single vector. Gradients are modified in-place.
- Parameters
max_norm (float or int) – max norm of the gradients
norm_type (float or int) – type of the used p-norm. Can be
'inf'for infinity norm.
- Returns
Total norm of the parameters (viewed as a single vector).
Note
This is analogous to torch.nn.utils.clip_grad_norm_ but handles the partitioning and multiple devices per rank under the hood. The default torch util is not applicable here, because each rank only has a partial view of all the grads in the model, so calling it in the OSS context would lead to different scaling being applied per subset of model parameters.
Warning
This needs to be called on all ranks, since synchronization primitives will be used.
- static consolidate_shard_weights(shard_weights: List[Dict[str, torch.Tensor]], shard_metadata: List[Dict[str, Any]], with_module_buffers: bool = True) Dict[str, torch.Tensor]
Given a list of weights and meta data associated to N shards, reconstruct the weights of an equivalent consolidated (non-sharded) model.
Module parameters are consolidated using the shard metadata.
Module buffers are taken from shard 0: this assumes that module buffers are either synchronized or that the shard 0 value is valid for all shards. If this behavior is not correct for your module (for instance if buffers needs to be reduced instead), you can disable it with with_module_buffers=False.
This method is used to re-assemble checkpoints of shards without having to instantiate FSDP wrappers with the world size originally used to save the shards.
- property cpu_offload: bool
- extra_repr() str
- forward(*args: Any, **kwargs: Any) torch.Tensor
- gather_full_optim_state_dict(optim: torch.optim.Optimizer, **ignored: Dict) Optional[Dict[str, Any]]
Return the last known global optimizer state. The returned state is compatible with Pytorch, in that the sharded properties are not exposed. Multiple parameter groups are not yet supported.
This should be called only on the root FSDP instance. Nested FSDP instances are supported as long as they have the same world_size as the parent or world_size=1.
- Parameters
optim (Optimizer) – an optimizer instance for this FSDP rank. Its state_dict is used in the consolidation. However, its state is not modified.
- Returns
- A dict with four entries (On rank zero, other workers return
None) state - a dict holding gathered optimization state, 1 entry per unflat parameter
param_groups - a dict containing the 1 parameter group
param_id_map - global (unflat) to local (flat) id mapping
uncollected_local_ids - keys in the state dict that were not broadcast
- A dict with four entries (On rank zero, other workers return
- get_shard_from_optim_state_dict(full_optim_state_dict: Dict[str, Any]) Dict[str, Any]
Get the portion of the optimizer state dict associated with the shard
This can be used to get the right sharded optimizer state to be loaded into the sharded optimizer for this FSDP rank.
- Parameters
full_optim_state_dict (dict) – consolidated optimizer state returned by
gather_full_optim_state, or loaded from a checkpoint.- Returns
a shard of the optimizer state.
- Return type
(dict)
- load_local_state_dict(state_dict: Union[Dict[str, torch.Tensor], OrderedDict[str, torch.Tensor]], strict: bool = True) NamedTuple
Load a local (sharded) state_dict.
- load_state_dict(state_dict: Union[Dict[str, torch.Tensor], OrderedDict[str, torch.Tensor]], strict: bool = True) NamedTuple
- local_metadata_dict() Dict[str, Any]
Get the information needed to reconstruct the model from shards offline.
- local_state_dict(*args: Any, **kwargs: Any) Any
Returns the local (sharded) state of the module. Parameters are sharded, so the resulting state_dict can only be loaded after the Module has been wrapped with FullyShardedDataParallel.
- property module: torch.nn.Module
- no_sync() Generator
A context manager to disable gradient synchronizations across DDP processes. Within this context, gradients will be accumulated on module variables, which will later be synchronized in the first forward-backward pass after exiting the context.
Note
This may result in higher memory usage because we will accumulate the full model gradients (instead of gradient shards) until the eventual sync.
- property params_with_grad: List[torch.nn.Parameter]
[p for p in self.parameters() if p.grad is not None]
- set_gradient_divide_factors(pre: float, post: float, recursive: bool) None
Allowing user to override the pre and post divide factors.
- Parameters
pre (float) – divide factor before the reduction.
post (float) – divide factor after the reduction.
recursive (bool) – recursively set it for all child FSDP instances or not.
- state_dict(*args: Any, **kwargs: Any) Any
Returns the whole (unsharded) state of the module. Parameters are not sharded, so the resulting state_dict can be loaded directly by the wrapped Module without any sharding-specific logic. Returned tensors will be full precision (e.g., FP32).
Warning
This needs to be called on all ranks, since synchronization primitives will be used.
- summon_full_params(recurse: bool = True, volatile: bool = False) Generator
A context manager to expose full params for the current FSDP instance. Can be useful after forward/backward for a model to get the params for additional processing or checking. Parameters will be gathered in full precision (e.g., FP32).
Note
This can be used on inner FSDPs.
Note
This can not be used within a forward or backward pass. Nor can forward and backward be started from within this context.
Note
The full parameters will be freed after the context manager exits; it is up to the caller to clone them if needed.
Note
The full parameters can be modified, but only the portion corresponding to the local param shard will persist after the context manager exits (unless
volatile=True, in which case there are no guarantees about persistence).- Parameters
recurse (bool, Optional) – recursively summon all params for nested FSDP instances (default: True)
volatile (bool, Optional) – if
True, modifications to params are not guaranteed to persist after the context manager exists; enabling this can be slightly more efficient (default: False)
Metric Accumulators
Tools to combine the metrics computed in multiple GPUs into a single metric
Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- class hyperion.torch.utils.metric_acc.MetricAcc(device=None)[source]
Class to accumulate metrics during an epoch.
- update(metrics, num_samples=1)[source]
Updates the values of the metric
It uses recursive formula, it may be more numerically stable
m^(i) = m^(i-1) + n^(i)/sum(n^(i)) (x^(i) - m^(i-1))
where i is the batch number, m^(i) is the accumulated average of the metric at batch i, x^(i) is the average of the metric at batch i, n^(i) is the batch_size at batch i.
- Parameters
metrics – dictionary with metrics for current batch
num_samples – number of samples in current batch (batch_size)
- property metrics
Returns metrics dictionary
Evaluation Utils
Functions that can be usefull when evaluating neural networks. For example, when a signal is too long to fit in memory and needs to be splitted into chunks
Copyright 2019 Johns Hopkins University (Author: Jesus Villalba) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
Math Functions
Copyright 2020 Johns Hopkins University (Author: Jesus Villalba, Nanxin Chen) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- hyperion.torch.utils.math.invert_trimat(A, lower=False, right_inv=False, return_logdet=False, return_inv=False)[source]
- Inversion of triangular matrices.
Returns lambda function f that multiplies the inverse of A times a vector.
- Parameters
A – Triangular matrix.
lower – if True A is lower triangular, else A is upper triangular.
right_inv – If False, f(v)=A^{-1}v; if True f(v)=v’ A^{-1}
return_logdet – If True, it also returns the log determinant of A.
return_inv – If True, it also returns A^{-1}
- Returns
Lambda function that multiplies A^{-1} times vector. Log determinant of A A^{-1}
Miscellaneous Functions
Copyright 2020 Johns Hopkins University (Author: Jesus Villalba, Nanxin Chen) Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)