wav2vec2
mindnlp.transformers.models.wav2vec2.configuration_wav2vec2
¶
Wav2Vec2 model configuration
mindnlp.transformers.models.wav2vec2.configuration_wav2vec2.Wav2Vec2Config
¶
Bases: PretrainedConfig
This is the configuration class to store the configuration of a [Wav2Vec2Model]. It is used to instantiate an
Wav2Vec2 model according to the specified arguments, defining the model architecture. Instantiating a configuration
with the defaults will yield a similar configuration to that of the Wav2Vec2
facebook/wav2vec2-base-960h architecture.
Configuration objects inherit from [PretrainedConfig] and can be used to control the model outputs. Read the
documentation from [PretrainedConfig] for more information.
| PARAMETER | DESCRIPTION |
|---|---|
vocab_size |
Vocabulary size of the Wav2Vec2 model. Defines the number of different tokens that can be represented by
the
TYPE:
|
hidden_size |
Dimensionality of the encoder layers and the pooler layer.
TYPE:
|
num_hidden_layers |
Number of hidden layers in the Transformer encoder.
TYPE:
|
num_attention_heads |
Number of attention heads for each attention layer in the Transformer encoder.
TYPE:
|
intermediate_size |
Dimensionality of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
TYPE:
|
hidden_act |
The non-linear activation function (function or string) in the encoder and pooler. If string,
TYPE:
|
hidden_dropout |
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
TYPE:
|
activation_dropout |
The dropout ratio for activations inside the fully connected layer.
TYPE:
|
attention_dropout |
The dropout ratio for the attention probabilities.
TYPE:
|
final_dropout |
The dropout probability for the final projection layer of [
TYPE:
|
layerdrop |
The LayerDrop probability. See the LayerDrop paper for more details.
TYPE:
|
initializer_range |
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
TYPE:
|
layer_norm_eps |
The epsilon used by the layer normalization layers.
TYPE:
|
feat_extract_norm |
The norm to be applied to 1D convolutional layers in feature encoder. One of
TYPE:
|
feat_proj_dropout |
The dropout probability for output of the feature encoder.
TYPE:
|
feat_extract_activation |
The non-linear activation function (function or string) in the 1D convolutional layers of the feature
extractor. If string,
TYPE:
|
feat_quantizer_dropout |
The dropout probability for quantized feature encoder states.
TYPE:
|
conv_dim |
A tuple of integers defining the number of input and output channels of each 1D convolutional layer in the feature encoder. The length of conv_dim defines the number of 1D convolutional layers.
TYPE:
|
conv_stride |
A tuple of integers defining the stride of each 1D convolutional layer in the feature encoder. The length of conv_stride defines the number of convolutional layers and has to match the length of conv_dim.
TYPE:
|
conv_kernel |
A tuple of integers defining the kernel size of each 1D convolutional layer in the feature encoder. The length of conv_kernel defines the number of convolutional layers and has to match the length of conv_dim.
TYPE:
|
conv_bias |
Whether the 1D convolutional layers have a bias.
TYPE:
|
num_conv_pos_embeddings |
Number of convolutional positional embeddings. Defines the kernel size of 1D convolutional positional embeddings layer.
TYPE:
|
num_conv_pos_embedding_groups |
Number of groups of 1D convolutional positional embeddings layer.
TYPE:
|
do_stable_layer_norm |
Whether to apply stable layer norm architecture of the Transformer encoder.
TYPE:
|
apply_spec_augment |
Whether to apply SpecAugment data augmentation to the outputs of the feature encoder. For reference see SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition.
TYPE:
|
mask_time_prob |
Percentage (between 0 and 1) of all feature vectors along the time axis which will be masked. The masking
procecure generates ''mask_time_prob*len(time_axis)/mask_time_length'' independent masks over the axis. If
reasoning from the propability of each feature vector to be chosen as the start of the vector span to be
masked, mask_time_prob should be
TYPE:
|
mask_time_length |
Length of vector span along the time axis.
TYPE:
|
mask_time_min_masks |
The minimum number of masks of length
TYPE:
|
mask_feature_prob |
Percentage (between 0 and 1) of all feature vectors along the feature axis which will be masked. The
masking procecure generates ''mask_feature_prob*len(feature_axis)/mask_time_length'' independent masks over
the axis. If reasoning from the propability of each feature vector to be chosen as the start of the vector
span to be masked, mask_feature_prob should be
TYPE:
|
mask_feature_length |
Length of vector span along the feature axis.
TYPE:
|
mask_feature_min_masks |
The minimum number of masks of length
TYPE:
|
num_codevectors_per_group |
Number of entries in each quantization codebook (group).
TYPE:
|
num_codevector_groups |
Number of codevector groups for product codevector quantization.
TYPE:
|
contrastive_logits_temperature |
The temperature kappa in the contrastive loss.
TYPE:
|
feat_quantizer_dropout |
The dropout probability for the output of the feature encoder that's used by the quantizer.
TYPE:
|
num_negatives |
Number of negative samples for the contrastive loss.
TYPE:
|
codevector_dim |
Dimensionality of the quantized feature vectors.
TYPE:
|
proj_codevector_dim |
Dimensionality of the final projection of both the quantized and the transformer features.
TYPE:
|
diversity_loss_weight |
The weight of the codebook diversity loss component.
TYPE:
|
ctc_loss_reduction |
Specifies the reduction to apply to the output of
TYPE:
|
ctc_zero_infinity |
Whether to zero infinite losses and the associated gradients of
TYPE:
|
use_weighted_layer_sum |
Whether to use a weighted average of layer outputs with learned weights. Only relevant when using an
instance of [
TYPE:
|
classifier_proj_size |
Dimensionality of the projection before token mean-pooling for classification.
TYPE:
|
tdnn_dim |
A tuple of integers defining the number of output channels of each 1D convolutional layer in the TDNN module of the XVector model. The length of tdnn_dim defines the number of TDNN layers.
TYPE:
|
tdnn_kernel |
A tuple of integers defining the kernel size of each 1D convolutional layer in the TDNN module of the XVector model. The length of tdnn_kernel has to match the length of tdnn_dim.
TYPE:
|
tdnn_dilation |
A tuple of integers defining the dilation factor of each 1D convolutional layer in TDNN module of the XVector model. The length of tdnn_dilation has to match the length of tdnn_dim.
TYPE:
|
xvector_output_dim |
Dimensionality of the XVector embedding vectors.
TYPE:
|
add_adapter |
Whether a convolutional network should be stacked on top of the Wav2Vec2 Encoder. Can be very useful for warm-starting Wav2Vec2 for SpeechEncoderDecoder models.
TYPE:
|
adapter_kernel_size |
Kernel size of the convolutional layers in the adapter network. Only relevant if
TYPE:
|
adapter_stride |
Stride of the convolutional layers in the adapter network. Only relevant if
TYPE:
|
num_adapter_layers |
Number of convolutional layers that should be used in the adapter network. Only relevant if
TYPE:
|
adapter_attn_dim |
Dimension of the attention adapter weights to be used in each attention block. An example of a model using attention adapters is facebook/mms-1b-all.
TYPE:
|
output_hidden_size |
Dimensionality of the encoder output layer. If not defined, this defaults to hidden-size. Only relevant
if
TYPE:
|
Example
>>> from transformers import Wav2Vec2Config, Wav2Vec2Model
...
>>> # Initializing a Wav2Vec2 facebook/wav2vec2-base-960h style configuration
>>> configuration = Wav2Vec2Config()
...
>>> # Initializing a model (with random weights) from the facebook/wav2vec2-base-960h style configuration
>>> model = Wav2Vec2Model(configuration)
...
>>> # Accessing the model configuration
>>> configuration = model.config
Source code in mindnlp/transformers/models/wav2vec2/configuration_wav2vec2.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 | |
mindnlp.transformers.models.wav2vec2.configuration_wav2vec2.Wav2Vec2Config.inputs_to_logits_ratio
property
¶
Calculates the ratio of inputs to logits for the Wav2Vec2Config class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the Wav2Vec2Config class.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
None
|
This method does not return any value. |
This method calculates the ratio of inputs to logits by multiplying the convolution stride values. The convolution stride values are accessed using the self.conv_stride attribute. The functools.reduce() function is used to multiply all the stride values together. If there are no stride values, the ratio is assumed to be 1. The calculated ratio is then returned as the output of this method.
mindnlp.transformers.models.wav2vec2.configuration_wav2vec2.Wav2Vec2Config.__init__(vocab_size=32, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act='gelu', hidden_dropout=0.1, activation_dropout=0.1, attention_dropout=0.1, feat_proj_dropout=0.0, feat_quantizer_dropout=0.0, final_dropout=0.1, layerdrop=0.1, initializer_range=0.02, layer_norm_eps=1e-05, feat_extract_norm='group', feat_extract_activation='gelu', conv_dim=(512, 512, 512, 512, 512, 512, 512), conv_stride=(5, 2, 2, 2, 2, 2, 2), conv_kernel=(10, 3, 3, 3, 3, 2, 2), conv_bias=False, num_conv_pos_embeddings=128, num_conv_pos_embedding_groups=16, do_stable_layer_norm=False, apply_spec_augment=True, mask_time_prob=0.05, mask_time_length=10, mask_time_min_masks=2, mask_feature_prob=0.0, mask_feature_length=10, mask_feature_min_masks=0, num_codevectors_per_group=320, num_codevector_groups=2, contrastive_logits_temperature=0.1, num_negatives=100, codevector_dim=256, proj_codevector_dim=256, diversity_loss_weight=0.1, ctc_loss_reduction='sum', ctc_zero_infinity=False, use_weighted_layer_sum=False, classifier_proj_size=256, tdnn_dim=(512, 512, 512, 512, 1500), tdnn_kernel=(5, 3, 3, 1, 1), tdnn_dilation=(1, 2, 3, 1, 1), xvector_output_dim=512, pad_token_id=0, bos_token_id=1, eos_token_id=2, add_adapter=False, adapter_kernel_size=3, adapter_stride=2, num_adapter_layers=3, output_hidden_size=None, adapter_attn_dim=None, **kwargs)
¶
Initializes a new instance of the Wav2Vec2Config class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The class instance.
|
vocab_size |
The size of the vocabulary. Defaults to 32.
TYPE:
|
hidden_size |
The size of the hidden layers. Defaults to 768.
TYPE:
|
num_hidden_layers |
The number of hidden layers. Defaults to 12.
TYPE:
|
num_attention_heads |
The number of attention heads. Defaults to 12.
TYPE:
|
intermediate_size |
The size of the intermediate layers. Defaults to 3072.
TYPE:
|
hidden_act |
The activation function for the hidden layers. Defaults to 'gelu'.
TYPE:
|
hidden_dropout |
The dropout rate for the hidden layers. Defaults to 0.1.
TYPE:
|
activation_dropout |
The dropout rate for the activation function. Defaults to 0.1.
TYPE:
|
attention_dropout |
The dropout rate for the attention mechanism. Defaults to 0.1.
TYPE:
|
feat_proj_dropout |
The dropout rate for the feature projection. Defaults to 0.0.
TYPE:
|
feat_quantizer_dropout |
The dropout rate for the feature quantizer. Defaults to 0.0.
TYPE:
|
final_dropout |
The final dropout rate. Defaults to 0.1.
TYPE:
|
layerdrop |
The layer dropout rate. Defaults to 0.1.
TYPE:
|
initializer_range |
The range for weight initialization. Defaults to 0.02.
TYPE:
|
layer_norm_eps |
The epsilon value for layer normalization. Defaults to 1e-05.
TYPE:
|
feat_extract_norm |
The normalization method for feature extraction. Defaults to 'group'.
TYPE:
|
feat_extract_activation |
The activation function for feature extraction. Defaults to 'gelu'.
TYPE:
|
conv_dim |
The dimensions for convolutional layers. Defaults to (512, 512, 512, 512, 512, 512, 512).
TYPE:
|
conv_stride |
The stride for convolutional layers. Defaults to (5, 2, 2, 2, 2, 2, 2).
TYPE:
|
conv_kernel |
The kernel size for convolutional layers. Defaults to (10, 3, 3, 3, 3, 2, 2).
TYPE:
|
conv_bias |
Whether to include bias in convolutional layers. Defaults to False.
TYPE:
|
num_conv_pos_embeddings |
The number of positional embeddings for convolutional layers. Defaults to 128.
TYPE:
|
num_conv_pos_embedding_groups |
The number of groups for positional embeddings. Defaults to 16.
TYPE:
|
do_stable_layer_norm |
Whether to use stable layer normalization. Defaults to False.
TYPE:
|
apply_spec_augment |
Whether to apply SpecAugment during training. Defaults to True.
TYPE:
|
mask_time_prob |
The probability of masking time steps during SpecAugment. Defaults to 0.05.
TYPE:
|
mask_time_length |
The maximum length of time masking during SpecAugment. Defaults to 10.
TYPE:
|
mask_time_min_masks |
The minimum number of time masks during SpecAugment. Defaults to 2.
TYPE:
|
mask_feature_prob |
The probability of masking features during SpecAugment. Defaults to 0.0.
TYPE:
|
mask_feature_length |
The maximum length of feature masking during SpecAugment. Defaults to 10.
TYPE:
|
mask_feature_min_masks |
The minimum number of feature masks during SpecAugment. Defaults to 0.
TYPE:
|
num_codevectors_per_group |
The number of codevectors per group for quantization. Defaults to 320.
TYPE:
|
num_codevector_groups |
The number of codevector groups for quantization. Defaults to 2.
TYPE:
|
contrastive_logits_temperature |
The temperature for contrastive loss. Defaults to 0.1.
TYPE:
|
num_negatives |
The number of negative samples for contrastive loss. Defaults to 100.
TYPE:
|
codevector_dim |
The dimension of the codevectors. Defaults to 256.
TYPE:
|
proj_codevector_dim |
The dimension of projected codevectors. Defaults to 256.
TYPE:
|
diversity_loss_weight |
The weight for diversity loss. Defaults to 0.1.
TYPE:
|
ctc_loss_reduction |
The reduction method for CTC loss. Defaults to 'sum'.
TYPE:
|
ctc_zero_infinity |
Whether to zero out infinity in CTC loss. Defaults to False.
TYPE:
|
use_weighted_layer_sum |
Whether to use weighted layer sum. Defaults to False.
TYPE:
|
classifier_proj_size |
The size of the projection for the classifier. Defaults to 256.
TYPE:
|
tdnn_dim |
The dimensions for time-delay neural network layers. Defaults to (512, 512, 512, 512, 1500).
TYPE:
|
tdnn_kernel |
The kernel size for time-delay neural network layers. Defaults to (5, 3, 3, 1, 1).
TYPE:
|
tdnn_dilation |
The dilation for time-delay neural network layers. Defaults to (1, 2, 3, 1, 1).
TYPE:
|
xvector_output_dim |
The output dimension for x-vector representation. Defaults to 512.
TYPE:
|
pad_token_id |
The token ID for padding. Defaults to 0.
TYPE:
|
bos_token_id |
The token ID for the beginning of sentence. Defaults to 1.
TYPE:
|
eos_token_id |
The token ID for the end of sentence. Defaults to 2.
TYPE:
|
add_adapter |
Whether to add adapter layers. Defaults to False.
TYPE:
|
adapter_kernel_size |
The kernel size for adapter layers. Defaults to 3.
TYPE:
|
adapter_stride |
The stride for adapter layers. Defaults to 2.
TYPE:
|
num_adapter_layers |
The number of adapter layers. Defaults to 3.
TYPE:
|
output_hidden_size |
The size of the output hidden layers. Defaults to None.
TYPE:
|
adapter_attn_dim |
The attention dimension for adapter layers. Defaults to None.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the configuration for convolutional layers is incorrect, i.e., if the dimensions, strides, or kernel sizes are not of the same length. |
Source code in mindnlp/transformers/models/wav2vec2/configuration_wav2vec2.py
212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 | |
mindnlp.transformers.models.wav2vec2.feature_extraction_wav2vec2
¶
Feature extractor class for Wav2Vec2
mindnlp.transformers.models.wav2vec2.feature_extraction_wav2vec2.Wav2Vec2FeatureExtractor
¶
Bases: SequenceFeatureExtractor
Constructs a Wav2Vec2 feature extractor.
This feature extractor inherits from [~feature_extraction_sequence_utils.SequenceFeatureExtractor] which contains
most of the main methods. Users should refer to this superclass for more information regarding those methods.
| PARAMETER | DESCRIPTION |
|---|---|
feature_size |
The feature dimension of the extracted features.
TYPE:
|
sampling_rate |
The sampling rate at which the audio files should be digitalized expressed in hertz (Hz).
TYPE:
|
padding_value |
The value that is used to fill the padding values.
TYPE:
|
do_normalize |
Whether or not to zero-mean unit-variance normalize the input. Normalizing can help to significantly improve the performance for some models, e.g., wav2vec2-lv60.
TYPE:
|
return_attention_mask |
Whether or not [ Wav2Vec2 models that have set For Wav2Vec2 models that have set
TYPE:
|
Source code in mindnlp/transformers/models/wav2vec2/feature_extraction_wav2vec2.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 | |
mindnlp.transformers.models.wav2vec2.feature_extraction_wav2vec2.Wav2Vec2FeatureExtractor.__call__(raw_speech, padding=False, max_length=None, truncation=False, pad_to_multiple_of=None, return_attention_mask=None, return_tensors=None, sampling_rate=None, **kwargs)
¶
Main method to featurize and prepare for the model one or several sequence(s).
| PARAMETER | DESCRIPTION |
|---|---|
raw_speech |
The sequence or batch of sequences to be padded. Each sequence can be a numpy array, a list of float values, a list of numpy arrays or a list of list of float values. Must be mono channel audio, not stereo, i.e. single float per timestep.
TYPE:
|
padding |
Select a strategy to pad the returned sequences (according to the model's padding side and padding index) among:
TYPE:
|
max_length |
Maximum length of the returned list and optionally padding length (see above).
TYPE:
|
truncation |
Activates truncation to cut input sequences longer than max_length to max_length.
TYPE:
|
pad_to_multiple_of |
If set will pad the sequence to a multiple of the provided value. This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability
TYPE:
|
return_attention_mask |
Whether to return the attention mask. If left to the default, will return the attention mask according to the specific feature_extractor's default. Wav2Vec2 models that have set For Wav2Vec2 models that have set
TYPE:
|
return_tensors |
If set, will return tensors instead of list of python integers. Acceptable values are:
TYPE:
|
sampling_rate |
The sampling rate at which the
TYPE:
|
padding_value |
TYPE:
|
Source code in mindnlp/transformers/models/wav2vec2/feature_extraction_wav2vec2.py
122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 | |
mindnlp.transformers.models.wav2vec2.feature_extraction_wav2vec2.Wav2Vec2FeatureExtractor.__init__(feature_size=1, sampling_rate=16000, padding_value=0.0, return_attention_mask=False, do_normalize=True, **kwargs)
¶
Initialize the Wav2Vec2FeatureExtractor class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the class.
TYPE:
|
feature_size |
The size of the input features. Defaults to 1.
TYPE:
|
sampling_rate |
The sampling rate of the audio data. Defaults to 16000.
TYPE:
|
padding_value |
The value used for padding sequences. Defaults to 0.0.
TYPE:
|
return_attention_mask |
Whether to return the attention mask. Defaults to False.
TYPE:
|
do_normalize |
Whether to normalize the input features. Defaults to True.
TYPE:
|
**kwargs |
Additional keyword arguments.
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
Source code in mindnlp/transformers/models/wav2vec2/feature_extraction_wav2vec2.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 | |
mindnlp.transformers.models.wav2vec2.feature_extraction_wav2vec2.Wav2Vec2FeatureExtractor.zero_mean_unit_var_norm(input_values, attention_mask, padding_value=0.0)
staticmethod
¶
Every array in the list is normalized to have zero mean and unit variance
Source code in mindnlp/transformers/models/wav2vec2/feature_extraction_wav2vec2.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 | |
mindnlp.transformers.models.wav2vec2.processing_wav2vec2
¶
Speech processor class for Wav2Vec2
mindnlp.transformers.models.wav2vec2.processing_wav2vec2.Wav2Vec2Processor
¶
Bases: ProcessorMixin
Constructs a Wav2Vec2 processor which wraps a Wav2Vec2 feature extractor and a Wav2Vec2 CTC tokenizer into a single processor.
[Wav2Vec2Processor] offers all the functionalities of [Wav2Vec2FeatureExtractor] and [PreTrainedTokenizer].
See the docstring of [~Wav2Vec2Processor.__call__] and [~Wav2Vec2Processor.decode] for more information.
| PARAMETER | DESCRIPTION |
|---|---|
feature_extractor |
An instance of [
TYPE:
|
tokenizer |
An instance of [
TYPE:
|
Source code in mindnlp/transformers/models/wav2vec2/processing_wav2vec2.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 | |
mindnlp.transformers.models.wav2vec2.processing_wav2vec2.Wav2Vec2Processor.__call__(*args, **kwargs)
¶
When used in normal mode, this method forwards all its arguments to Wav2Vec2FeatureExtractor's
[~Wav2Vec2FeatureExtractor.__call__] and returns its output. If used in the context
[~Wav2Vec2Processor.as_target_processor] this method forwards all its arguments to PreTrainedTokenizer's
[~PreTrainedTokenizer.__call__]. Please refer to the docstring of the above two methods for more information.
Source code in mindnlp/transformers/models/wav2vec2/processing_wav2vec2.py
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 | |
mindnlp.transformers.models.wav2vec2.processing_wav2vec2.Wav2Vec2Processor.__init__(feature_extractor, tokenizer)
¶
Initializes a new instance of the Wav2Vec2Processor class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The current instance of the Wav2Vec2Processor class.
TYPE:
|
feature_extractor |
The feature extractor used for processing input data. It should be an instance of a feature extraction class.
TYPE:
|
tokenizer |
The tokenizer used for tokenizing input data. It should be an instance of a tokenizer class.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
Source code in mindnlp/transformers/models/wav2vec2/processing_wav2vec2.py
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | |
mindnlp.transformers.models.wav2vec2.processing_wav2vec2.Wav2Vec2Processor.as_target_processor()
¶
Temporarily sets the tokenizer for processing the input. Useful for encoding the labels when fine-tuning Wav2Vec2.
Source code in mindnlp/transformers/models/wav2vec2/processing_wav2vec2.py
185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 | |
mindnlp.transformers.models.wav2vec2.processing_wav2vec2.Wav2Vec2Processor.batch_decode(*args, **kwargs)
¶
This method forwards all its arguments to PreTrainedTokenizer's [~PreTrainedTokenizer.batch_decode]. Please
refer to the docstring of this method for more information.
Source code in mindnlp/transformers/models/wav2vec2/processing_wav2vec2.py
171 172 173 174 175 176 | |
mindnlp.transformers.models.wav2vec2.processing_wav2vec2.Wav2Vec2Processor.decode(*args, **kwargs)
¶
This method forwards all its arguments to PreTrainedTokenizer's [~PreTrainedTokenizer.decode]. Please refer
to the docstring of this method for more information.
Source code in mindnlp/transformers/models/wav2vec2/processing_wav2vec2.py
178 179 180 181 182 183 | |
mindnlp.transformers.models.wav2vec2.processing_wav2vec2.Wav2Vec2Processor.from_pretrained(pretrained_model_name_or_path, **kwargs)
classmethod
¶
This method creates an instance of the Wav2Vec2Processor class from a pre-trained model.
| PARAMETER | DESCRIPTION |
|---|---|
cls |
The class itself.
TYPE:
|
pretrained_model_name_or_path |
The name or path of the pre-trained model to load.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
| RAISES | DESCRIPTION |
|---|---|
OSError
|
If an OSError occurs during the loading process. |
FutureWarning
|
If the tokenizer is being loaded from a config that does not include a |
Source code in mindnlp/transformers/models/wav2vec2/processing_wav2vec2.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | |
mindnlp.transformers.models.wav2vec2.processing_wav2vec2.Wav2Vec2Processor.pad(*args, **kwargs)
¶
When used in normal mode, this method forwards all its arguments to Wav2Vec2FeatureExtractor's
[~Wav2Vec2FeatureExtractor.pad] and returns its output. If used in the context
[~Wav2Vec2Processor.as_target_processor] this method forwards all its arguments to PreTrainedTokenizer's
[~PreTrainedTokenizer.pad]. Please refer to the docstring of the above two methods for more information.
Source code in mindnlp/transformers/models/wav2vec2/processing_wav2vec2.py
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 | |
mindnlp.transformers.models.wav2vec2.tokenization_wav2vec2
¶
Tokenization class for Wav2Vec2.
mindnlp.transformers.models.wav2vec2.tokenization_wav2vec2.Wav2Vec2CTCTokenizer
¶
Bases: PreTrainedTokenizer
Constructs a Wav2Vec2CTC tokenizer.
This tokenizer inherits from [PreTrainedTokenizer] which contains some of the main methods. Users should refer to
the superclass for more information regarding such methods.
| PARAMETER | DESCRIPTION |
|---|---|
vocab_file |
File containing the vocabulary.
TYPE:
|
bos_token |
The beginning of sentence token.
TYPE:
|
eos_token |
The end of sentence token.
TYPE:
|
unk_token |
The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead.
TYPE:
|
pad_token |
The token used for padding, for example when batching sequences of different lengths.
TYPE:
|
word_delimiter_token |
The token used for defining the end of a word.
TYPE:
|
do_lower_case |
Whether or not to accept lowercase input and lowercase the output when decoding.
TYPE:
|
target_lang |
A target language the tokenizer should set by default.
TYPE:
|
Source code in mindnlp/transformers/models/wav2vec2/tokenization_wav2vec2.py
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 | |
mindnlp.transformers.models.wav2vec2.tokenization_wav2vec2.Wav2Vec2CTCTokenizer.vocab_size: int
property
¶
Returns the size of the vocabulary used by the Wav2Vec2CTCTokenizer.
| PARAMETER | DESCRIPTION |
|---|---|
self |
An instance of the Wav2Vec2CTCTokenizer class.
|
| RETURNS | DESCRIPTION |
|---|---|
int
|
The size of the vocabulary, which represents the total number of unique tokens in the decoder.
TYPE:
|
Example
>>> tokenizer = Wav2Vec2CTCTokenizer()
>>> tokenizer.vocab_size()
50000
mindnlp.transformers.models.wav2vec2.tokenization_wav2vec2.Wav2Vec2CTCTokenizer.word_delimiter_token: str
property
writable
¶
str: Word delimiter token. Log an error if used while not having been set.
mindnlp.transformers.models.wav2vec2.tokenization_wav2vec2.Wav2Vec2CTCTokenizer.word_delimiter_token_id: Optional[int]
property
writable
¶
Optional[int]: Id of the word_delimiter_token in the vocabulary. Returns None if the token has not been
set.
mindnlp.transformers.models.wav2vec2.tokenization_wav2vec2.Wav2Vec2CTCTokenizer.__init__(vocab_file, bos_token='<s>', eos_token='</s>', unk_token='<unk>', pad_token='<pad>', word_delimiter_token='|', replace_word_delimiter_char=' ', do_lower_case=False, target_lang=None, **kwargs)
¶
Initializes a new instance of the Wav2Vec2CTCTokenizer class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the Wav2Vec2CTCTokenizer class.
TYPE:
|
vocab_file |
The path to the vocabulary file.
TYPE:
|
bos_token |
The beginning of sentence token. Default is '
TYPE:
|
eos_token |
The end of sentence token. Default is ''.
TYPE:
|
unk_token |
The unknown token. Default is '
TYPE:
|
pad_token |
The padding token. Default is '
TYPE:
|
word_delimiter_token |
The word delimiter token. Default is '|'.
TYPE:
|
replace_word_delimiter_char |
The character used to replace the word delimiter. Default is ' '.
TYPE:
|
do_lower_case |
Whether to convert all tokens to lowercase. Default is False.
TYPE:
|
target_lang |
The target language for encoding. Default is None.
TYPE:
|
**kwargs |
Additional keyword arguments.
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
Source code in mindnlp/transformers/models/wav2vec2/tokenization_wav2vec2.py
124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 | |
mindnlp.transformers.models.wav2vec2.tokenization_wav2vec2.Wav2Vec2CTCTokenizer.batch_decode(sequences, skip_special_tokens=False, clean_up_tokenization_spaces=None, output_char_offsets=False, output_word_offsets=False, **kwargs)
¶
Convert a list of lists of token ids into a list of strings by calling decode.
| PARAMETER | DESCRIPTION |
|---|---|
sequences |
List of tokenized input ids. Can be obtained using the
TYPE:
|
skip_special_tokens |
Whether or not to remove special tokens in the decoding.
TYPE:
|
clean_up_tokenization_spaces |
Whether or not to clean up the tokenization spaces.
TYPE:
|
output_char_offsets |
Whether or not to output character offsets. Character offsets can be used in combination with the sampling rate and model downsampling rate to compute the time-stamps of transcribed characters. Please take a look at the Example of [
TYPE:
|
output_word_offsets |
Whether or not to output word offsets. Word offsets can be used in combination with the sampling rate and model downsampling rate to compute the time-stamps of transcribed words. Please take a look at the Example of [
TYPE:
|
kwargs |
Will be passed to the underlying model specific decode method.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
List[str]
|
|
Source code in mindnlp/transformers/models/wav2vec2/tokenization_wav2vec2.py
589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 | |
mindnlp.transformers.models.wav2vec2.tokenization_wav2vec2.Wav2Vec2CTCTokenizer.convert_tokens_to_string(tokens, group_tokens=True, spaces_between_special_tokens=False, output_char_offsets=False, output_word_offsets=False)
¶
Converts a connectionist-temporal-classification (CTC) output tokens into a single string.
Source code in mindnlp/transformers/models/wav2vec2/tokenization_wav2vec2.py
369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 | |
mindnlp.transformers.models.wav2vec2.tokenization_wav2vec2.Wav2Vec2CTCTokenizer.decode(token_ids, skip_special_tokens=False, clean_up_tokenization_spaces=None, output_char_offsets=False, output_word_offsets=False, **kwargs)
¶
Converts a sequence of ids in a string, using the tokenizer and vocabulary with options to remove special tokens and clean up tokenization spaces.
Similar to doing self.convert_tokens_to_string(self.convert_ids_to_tokens(token_ids)).
| PARAMETER | DESCRIPTION |
|---|---|
token_ids |
List of tokenized input ids. Can be obtained using the
TYPE:
|
skip_special_tokens |
Whether or not to remove special tokens in the decoding.
TYPE:
|
clean_up_tokenization_spaces |
Whether or not to clean up the tokenization spaces.
TYPE:
|
output_char_offsets |
Whether or not to output character offsets. Character offsets can be used in combination with the sampling rate and model downsampling rate to compute the time-stamps of transcribed characters. Please take a look at the example below to better understand how to make use of
TYPE:
|
output_word_offsets |
Whether or not to output word offsets. Word offsets can be used in combination with the sampling rate and model downsampling rate to compute the time-stamps of transcribed words. Please take a look at the example below to better understand how to make use of
TYPE:
|
kwargs |
Will be passed to the underlying model specific decode method.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
|
Example
>>> # Let's see how to retrieve time steps for a model
>>> from transformers import AutoTokenizer, AutoFeatureExtractor, AutoModelForCTC
>>> from datasets import load_dataset
>>> import datasets
>>> import torch
...
>>> # import model, feature extractor, tokenizer
>>> model = AutoModelForCTC.from_pretrained("facebook/wav2vec2-base-960h")
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/wav2vec2-base-960h")
>>> feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/wav2vec2-base-960h")
...
>>> # load first sample of English common_voice
>>> dataset = load_dataset("mozilla-foundation/common_voice_11_0", "en", split="train", streaming=True)
>>> dataset = dataset.cast_column("audio", datasets.Audio(sampling_rate=16_000))
>>> dataset_iter = iter(dataset)
>>> sample = next(dataset_iter)
...
>>> # forward sample through model to get greedily predicted transcription ids
>>> input_values = feature_extractor(sample["audio"]["array"], return_tensors="pt").input_values
>>> logits = model(input_values).logits[0]
>>> pred_ids = torch.argmax(logits, axis=-1)
...
>>> # retrieve word stamps (analogous commands for `output_char_offsets`)
>>> outputs = tokenizer.decode(pred_ids, output_word_offsets=True)
>>> # compute `time_offset` in seconds as product of downsampling ratio and sampling_rate
>>> time_offset = model.config.inputs_to_logits_ratio / feature_extractor.sampling_rate
...
>>> word_offsets = [
... {
... "word": d["word"],
... "start_time": round(d["start_offset"] * time_offset, 2),
... "end_time": round(d["end_offset"] * time_offset, 2),
... }
... for d in outputs.word_offsets
... ]
>>> # compare word offsets with audio `en_train_0/common_voice_en_19121553.mp3` online on the dataset viewer:
>>> # https://hf-mirror.com/datasets/mozilla-foundation/common_voice_11_0/viewer/en
>>> word_offsets[:3]
[{'word': 'THE', 'start_time': 0.7, 'end_time': 0.78}, {'word': 'TRICK', 'start_time': 0.88, 'end_time': 1.08}, {'word': 'APPEARS', 'start_time': 1.2, 'end_time': 1.64}]
Source code in mindnlp/transformers/models/wav2vec2/tokenization_wav2vec2.py
659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 | |
mindnlp.transformers.models.wav2vec2.tokenization_wav2vec2.Wav2Vec2CTCTokenizer.get_vocab()
¶
Returns the vocabulary used by the Wav2Vec2CTCTokenizer.
| PARAMETER | DESCRIPTION |
|---|---|
self |
An instance of the Wav2Vec2CTCTokenizer class.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Dict
|
A dictionary representing the vocabulary used by the tokenizer. The keys are integers representing the token IDs, and the values are the corresponding tokens.
TYPE:
|
This method retrieves the vocabulary used by the Wav2Vec2CTCTokenizer instance. The vocabulary is a dictionary that combines the encoder and added_tokens_encoder dictionaries. The encoder dictionary maps tokens to unique integer IDs, while the added_tokens_encoder dictionary contains additional tokens added by the user. The resulting vocabulary dictionary is returned.
Source code in mindnlp/transformers/models/wav2vec2/tokenization_wav2vec2.py
294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 | |
mindnlp.transformers.models.wav2vec2.tokenization_wav2vec2.Wav2Vec2CTCTokenizer.prepare_for_tokenization(text, is_split_into_words=False, **kwargs)
¶
Prepare the input text for tokenization.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the Wav2Vec2CTCTokenizer class.
TYPE:
|
text |
The input text to be prepared for tokenization.
TYPE:
|
is_split_into_words |
A flag indicating whether the input text is already split into words. If True, the input text is expected to be split into words; otherwise, the input text is treated as a continuous string. Defaults to False.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple
|
A tuple containing the prepared text and optional keyword arguments. |
Source code in mindnlp/transformers/models/wav2vec2/tokenization_wav2vec2.py
514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 | |
mindnlp.transformers.models.wav2vec2.tokenization_wav2vec2.Wav2Vec2CTCTokenizer.save_vocabulary(save_directory, filename_prefix=None)
¶
Save the vocabulary to a specified directory.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the Wav2Vec2CTCTokenizer class.
|
save_directory |
The directory where the vocabulary will be saved.
TYPE:
|
filename_prefix |
An optional prefix to be added to the filename. Defaults to None.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tuple[str]
|
Tuple[str]: A tuple containing the file path of the saved vocabulary. |
| RAISES | DESCRIPTION |
|---|---|
OSError
|
If the save_directory is not a valid directory. |
Source code in mindnlp/transformers/models/wav2vec2/tokenization_wav2vec2.py
764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 | |
mindnlp.transformers.models.wav2vec2.tokenization_wav2vec2.Wav2Vec2CTCTokenizer.set_target_lang(target_lang)
¶
Set the target language of a nested multi-lingual dictionary
Source code in mindnlp/transformers/models/wav2vec2/tokenization_wav2vec2.py
195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 | |
mindnlp.transformers.models.wav2vec2.tokenization_wav2vec2.Wav2Vec2CTCTokenizerOutput
dataclass
¶
Bases: ModelOutput
Output type of [Wav2Vec2CTCTokenizer], with transcription.
| PARAMETER | DESCRIPTION |
|---|---|
text |
Decoded logits in text from. Usually the speech transcription.
TYPE:
|
char_offsets |
Offsets of the decoded characters. In combination with sampling rate and model downsampling rate char offsets can be used to compute time stamps for each charater. Total logit score of the beam associated with produced text.
TYPE:
|
word_offsets |
Offsets of the decoded words. In combination with sampling rate and model downsampling rate word offsets can be used to compute time stamps for each word.
TYPE:
|
Source code in mindnlp/transformers/models/wav2vec2/tokenization_wav2vec2.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 | |
mindnlp.transformers.models.wav2vec2.tokenization_wav2vec2.Wav2Vec2Tokenizer
¶
Bases: PreTrainedTokenizer
Constructs a Wav2Vec2 tokenizer.
This tokenizer inherits from [PreTrainedTokenizer] which contains some of the main methods. Users should refer to
the superclass for more information regarding such methods.
| PARAMETER | DESCRIPTION |
|---|---|
vocab_file |
File containing the vocabulary.
TYPE:
|
bos_token |
The beginning of sentence token.
TYPE:
|
eos_token |
The end of sentence token.
TYPE:
|
unk_token |
The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead.
TYPE:
|
pad_token |
The token used for padding, for example when batching sequences of different lengths.
TYPE:
|
word_delimiter_token |
The token used for defining the end of a word.
TYPE:
|
do_lower_case |
Whether or not to lowercase the output when decoding.
TYPE:
|
do_normalize |
Whether or not to zero-mean unit-variance normalize the input. Normalizing can help to significantly improve the performance for some models, e.g., wav2vec2-lv60.
TYPE:
|
return_attention_mask |
Whether or not [ Wav2Vec2 models that have set For Wav2Vec2 models that have set
TYPE:
|
Source code in mindnlp/transformers/models/wav2vec2/tokenization_wav2vec2.py
792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 | |
mindnlp.transformers.models.wav2vec2.tokenization_wav2vec2.Wav2Vec2Tokenizer.vocab_size: int
property
¶
Method to retrieve the vocabulary size of the Wav2Vec2Tokenizer instance.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the Wav2Vec2Tokenizer class. This parameter refers to the current instance of the Wav2Vec2Tokenizer class. It is used to access the decoder attribute to calculate the vocabulary size.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
int
|
An integer representing the size of the vocabulary. The return value corresponds to the number of elements in the decoder attribute of the instance.
TYPE:
|
mindnlp.transformers.models.wav2vec2.tokenization_wav2vec2.Wav2Vec2Tokenizer.word_delimiter_token: str
property
writable
¶
str: Padding token. Log an error if used while not having been set.
mindnlp.transformers.models.wav2vec2.tokenization_wav2vec2.Wav2Vec2Tokenizer.word_delimiter_token_id: Optional[int]
property
writable
¶
Optional[int]: Id of the word_delimiter_token in the vocabulary. Returns None if the token has not been
set.
mindnlp.transformers.models.wav2vec2.tokenization_wav2vec2.Wav2Vec2Tokenizer.__call__(raw_speech, padding=False, max_length=None, pad_to_multiple_of=None, return_tensors=None, verbose=True, **kwargs)
¶
Main method to tokenize and prepare for the model one or several sequence(s) or one or several pair(s) of sequences.
| PARAMETER | DESCRIPTION |
|---|---|
raw_speech |
The sequence or batch of sequences to be padded. Each sequence can be a numpy array, a list of float values, a list of numpy array or a list of list of float values. Must be mono channel audio, not stereo, i.e. single float per timestep.
TYPE:
|
Source code in mindnlp/transformers/models/wav2vec2/tokenization_wav2vec2.py
976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 | |
mindnlp.transformers.models.wav2vec2.tokenization_wav2vec2.Wav2Vec2Tokenizer.__init__(vocab_file, bos_token='<s>', eos_token='</s>', unk_token='<unk>', pad_token='<pad>', word_delimiter_token='|', do_lower_case=False, do_normalize=False, return_attention_mask=False, **kwargs)
¶
Initializes a new instance of the Wav2Vec2Tokenizer class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the class.
|
vocab_file |
The path to the vocabulary file.
TYPE:
|
bos_token |
The beginning of sentence token. Default is '
TYPE:
|
eos_token |
The end of sentence token. Default is ''.
TYPE:
|
unk_token |
The unknown token. Default is '
TYPE:
|
pad_token |
The padding token. Default is '
TYPE:
|
word_delimiter_token |
The word delimiter token. Default is '|'.
TYPE:
|
do_lower_case |
Whether to convert tokens to lowercase. Default is False.
TYPE:
|
do_normalize |
Whether to apply text normalization. Default is False.
TYPE:
|
return_attention_mask |
Whether to return the attention mask. Default is False.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
| RAISES | DESCRIPTION |
|---|---|
FutureWarning
|
This class is deprecated and will be removed in version 5 of Transformers. Please use Wav2Vec2Processor or Wav2Vec2CTCTokenizer instead. |
Source code in mindnlp/transformers/models/wav2vec2/tokenization_wav2vec2.py
851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 | |
mindnlp.transformers.models.wav2vec2.tokenization_wav2vec2.Wav2Vec2Tokenizer.convert_tokens_to_string(tokens)
¶
Converts a connectionist-temporal-classification (CTC) output tokens into a single string.
Source code in mindnlp/transformers/models/wav2vec2/tokenization_wav2vec2.py
1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 | |
mindnlp.transformers.models.wav2vec2.tokenization_wav2vec2.Wav2Vec2Tokenizer.get_vocab()
¶
This method returns a vocabulary dictionary containing the encoder and added tokens encoder.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the Wav2Vec2Tokenizer class.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Dict
|
A dictionary containing the combined encoder and added tokens encoder.
TYPE:
|
Source code in mindnlp/transformers/models/wav2vec2/tokenization_wav2vec2.py
1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 | |
mindnlp.transformers.models.wav2vec2.tokenization_wav2vec2.Wav2Vec2Tokenizer.save_vocabulary(save_directory, filename_prefix=None)
¶
Saves the vocabulary of the Wav2Vec2Tokenizer to a file.
| PARAMETER | DESCRIPTION |
|---|---|
self |
An instance of the Wav2Vec2Tokenizer class.
TYPE:
|
save_directory |
The directory where the vocabulary file will be saved.
TYPE:
|
filename_prefix |
A prefix to be added to the filename. Defaults to None.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tuple[str]
|
Tuple[str]: A tuple containing the path to the saved vocabulary file. |
| RAISES | DESCRIPTION |
|---|---|
FileNotFoundError
|
If the specified save_directory does not exist. |
IsADirectoryError
|
If save_directory is not a directory. |
Source code in mindnlp/transformers/models/wav2vec2/tokenization_wav2vec2.py
1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2
¶
Mindspore Wav2Vec2 model.
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.AMSoftmaxLoss
¶
Bases: Cell
The AMSoftmaxLoss class represents a neural network cell for computing the AM-Softmax loss. This class inherits from nn.Cell and provides methods for initializing the loss function and constructing the computation graph.
| ATTRIBUTE | DESCRIPTION |
|---|---|
scale |
The scale parameter for the AM-Softmax loss function.
TYPE:
|
margin |
The margin parameter for the AM-Softmax loss function.
TYPE:
|
num_labels |
The number of unique labels in the dataset.
TYPE:
|
weight |
The weight parameter for the neural network.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
__init__ |
Initializes the AMSoftmaxLoss instance with input dimension, number of labels, scale, and margin. |
construct |
Constructs the computation graph for the AM-Softmax loss function using the given hidden states and labels. |
Note
The AMSoftmaxLoss class is designed for use in neural network training and optimization tasks.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.AMSoftmaxLoss.__init__(input_dim, num_labels, scale=30.0, margin=0.4)
¶
init
Initializes an instance of the AMSoftmaxLoss class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the class.
TYPE:
|
input_dim |
The dimension of the input features.
TYPE:
|
num_labels |
The number of unique labels for classification.
TYPE:
|
scale |
The scale factor for the angular margin. Defaults to 30.0.
TYPE:
|
margin |
The angular margin value. Defaults to 0.4.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If input_dim or num_labels are not positive integers. |
TypeError
|
If scale or margin are not of type float. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.AMSoftmaxLoss.construct(hidden_states, labels)
¶
This method constructs an AMSoftmax loss function.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the AMSoftmaxLoss class.
TYPE:
|
hidden_states |
A tensor representing the hidden states of the model.
TYPE:
|
labels |
A tensor containing the ground truth labels for the corresponding hidden states. It is expected that the labels are flattened for processing.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the dimensions of the weight tensor and hidden_states tensor are not compatible for matrix multiplication. |
RuntimeError
|
If there is an issue with the normalization operation on the weight or hidden_states tensor. |
ValueError
|
If the labels tensor does not match the expected shape for one-hot encoding. |
RuntimeError
|
If there is a problem with the cross-entropy calculation. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.TDNNLayer
¶
Bases: Cell
TDNNLayer represents a time-delay neural network (TDNN) layer for processing sequential data. It inherits from nn.Cell and is initialized with a Wav2Vec2Config and an optional layer_id.
| ATTRIBUTE | DESCRIPTION |
|---|---|
config |
The configuration for the Wav2Vec2 model.
TYPE:
|
layer_id |
The index of the TDNN layer.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
construct |
Applies the TDNN layer operations to the input hidden_states. |
The TDNNLayer class applies a convolutional layer with specified kernel size and dilation to the input data. It then applies a ReLU activation function to the output.
Note
This class is part of the Wav2Vec2 model architecture.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.TDNNLayer.__init__(config, layer_id=0)
¶
Initializes a TDNNLayer object.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the TDNNLayer class.
|
config |
An instance of Wav2Vec2Config that holds configuration parameters for the layer.
TYPE:
|
layer_id |
An integer representing the ID of the layer. Default is 0. Must be within the range of available layers in the configuration.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
| RAISES | DESCRIPTION |
|---|---|
TypeError
|
If the config parameter is not of type Wav2Vec2Config. |
ValueError
|
If the layer_id is outside the valid range of available layers in the configuration. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.TDNNLayer.construct(hidden_states)
¶
Constructs the TDNN layer with the input hidden_states.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the TDNNLayer class.
TYPE:
|
hidden_states |
The input hidden states to be processed by the TDNN layer. It should be a tensor of shape (batch_size, in_channels, sequence_length).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
hidden_states
|
The processed hidden states after applying the TDNN layer operations. It will be a tensor of shape (batch_size, out_channels, new_length), where out_channels is the number of output channels and new_length is the length of the output sequence.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
TypeError
|
If the input hidden_states is not a tensor. |
ValueError
|
If the input hidden_states does not have the expected shape or dimensions. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2Adapter
¶
Bases: Cell
Wav2Vec2Adapter is a class that represents an adapter layer for adapting the hidden states of a Wav2Vec2 model. This class inherits from nn.Cell and implements methods for initializing and constructing the adapter layer.
| ATTRIBUTE | DESCRIPTION |
|---|---|
proj |
A dense layer used for projecting hidden states if output_hidden_size is different from hidden_size.
TYPE:
|
proj_layer_norm |
A layer normalization module applied after projection if needed.
TYPE:
|
layers |
A list of Wav2Vec2AdapterLayer instances representing adapter layers.
TYPE:
|
layerdrop |
The probability of dropping a layer during training.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
__init__ |
Initializes the Wav2Vec2Adapter object with the provided configuration. |
construct |
Applies the adapter layer transformations to the input hidden states. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2Adapter.__init__(config)
¶
Initializes a new instance of the Wav2Vec2Adapter class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The current instance of the class.
|
config |
An instance of Wav2Vec2Config containing configuration parameters for the adapter. This parameter is required for initializing the adapter and must be an instance of Wav2Vec2Config.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
| RAISES | DESCRIPTION |
|---|---|
TypeError
|
If the config parameter is not of type Wav2Vec2Config. |
ValueError
|
If the output_hidden_size in the config parameter does not match the hidden_size. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2Adapter.construct(hidden_states)
¶
This method constructs the hidden states by applying transformations and layers.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the Wav2Vec2Adapter class.
TYPE:
|
hidden_states |
The input hidden states to be processed. It is expected to be a 3D array with shape (batch_size, sequence_length, hidden_size).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
numpy.ndarray: The processed hidden states with shape (batch_size, sequence_length, hidden_size). |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2AdapterLayer
¶
Bases: Cell
Wav2Vec2AdapterLayer is a Python class that represents an adapter layer for the Wav2Vec2 model. This class inherits from nn.Cell.
The adapter layer contains methods for initialization and construction.
The init method initializes the adapter layer with the provided configuration. It sets up a 1D convolutional layer with specified parameters such as kernel size, stride, padding, and bias.
The construct method takes hidden_states as input and applies the convolutional layer followed by the gated linear unit (GLU) activation function. It then returns the processed hidden states.
This class provides functionality for creating and processing adapter layers within the Wav2Vec2 model.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2AdapterLayer.__init__(config)
¶
init
Initializes a new instance of the Wav2Vec2AdapterLayer class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the Wav2Vec2AdapterLayer class.
|
config |
An instance of the Wav2Vec2Config class containing the configuration parameters for the adapter layer.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2AdapterLayer.construct(hidden_states)
¶
Method to construct the Wav2Vec2AdapterLayer.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the Wav2Vec2AdapterLayer class.
TYPE:
|
hidden_states |
The input hidden states to be processed. It should be a tensor.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
The processed hidden states after applying convolution and gated linear units (GLU) operation. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2Attention
¶
Bases: Cell
Multi-headed attention from 'Attention Is All You Need' paper
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2Attention.__init__(embed_dim, num_heads, dropout=0.0, is_decoder=False, bias=True, is_causal=False, config=None)
¶
Initializes an instance of the Wav2Vec2Attention class.
| PARAMETER | DESCRIPTION |
|---|---|
embed_dim |
The dimension of the input embeddings.
TYPE:
|
num_heads |
The number of attention heads.
TYPE:
|
dropout |
The dropout probability. Defaults to 0.0.
TYPE:
|
is_decoder |
Whether the attention module is used as a decoder. Defaults to False.
TYPE:
|
bias |
Whether to include bias in linear projections. Defaults to True.
TYPE:
|
is_causal |
Whether the attention is causal. Defaults to False.
TYPE:
|
config |
The configuration object. Defaults to None.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If embed_dim is not divisible by num_heads. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2Attention.construct(hidden_states, key_value_states=None, past_key_value=None, attention_mask=None, layer_head_mask=None, output_attentions=False)
¶
Input shape: Batch x Time x Channel
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2AttnAdapterLayer
¶
Bases: Cell
This class represents a single layer of an attention adapter module in the Wav2Vec2 model. The adapter module is designed to enhance the training throughput by directly implementing the adapter modules with 3D tensor weights as parameters, without using ModuleList.
| ATTRIBUTE | DESCRIPTION |
|---|---|
input_dim |
The dimension of the input tensor to the adapter module.
TYPE:
|
hidden_dim |
The hidden dimension of the adapter module.
TYPE:
|
norm |
A layer normalization module to normalize the hidden states.
TYPE:
|
linear_1 |
A linear transformation module that maps the hidden states to the input dimension.
TYPE:
|
act_fn |
An activation function module that applies the ReLU activation to the hidden states.
TYPE:
|
linear_2 |
A linear transformation module that maps the hidden states back to the hidden dimension.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
construct |
Applies the attention adapter layer operations to the input hidden states tensor. Args:
|
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2AttnAdapterLayer.__init__(config)
¶
Implements adapter modules directly with 3D tensor weight as parameters and without using ModuleList to speed up training throughput.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2AttnAdapterLayer.construct(hidden_states)
¶
Description: Constructs the adaptation layer for the Wav2Vec2AttnAdapterModel.
| PARAMETER | DESCRIPTION |
|---|---|
self |
(Wav2Vec2AttnAdapterLayer) The instance of the Wav2Vec2AttnAdapterLayer class.
|
hidden_states |
(Tensor) The input hidden states to be processed by the adaptation layer.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the input hidden_states tensor is empty or invalid. |
TypeError
|
If the input hidden_states is not of type Tensor. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2Encoder
¶
Bases: Cell
A class representing the Wav2Vec2Encoder in the Wav2Vec2 model architecture.
The Wav2Vec2Encoder is responsible for encoding the input hidden states with positional embeddings and applying a series of Wav2Vec2EncoderLayer for feature extraction.
| ATTRIBUTE | DESCRIPTION |
|---|---|
config |
The configuration for the Wav2Vec2 model.
TYPE:
|
pos_conv_embed |
The positional convolutional embedding layer. |
layer_norm |
The layer normalization layer.
TYPE:
|
dropout |
The dropout layer.
TYPE:
|
layers |
The list of Wav2Vec2EncoderLayer instances.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
construct |
Applies the Wav2Vec2Encoder layer-wise to the hidden states. Args:
Returns:
|
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2Encoder.__init__(config)
¶
Initializes the Wav2Vec2Encoder class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the class.
|
config |
An instance of the Wav2Vec2Config class containing the configuration parameters for the encoder. It specifies the configuration for the Wav2Vec2 model, such as hidden size, layer normalization epsilon, hidden dropout probability, and the number of hidden layers.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
| RAISES | DESCRIPTION |
|---|---|
None
|
This method does not raise any exceptions explicitly. However, exceptions may be raised during the initialization of the Wav2Vec2PositionalConvEmbedding, nn.LayerNorm, nn.Dropout, and nn.CellList objects. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2Encoder.construct(hidden_states, attention_mask=None, output_attentions=False, output_hidden_states=False, return_dict=True)
¶
Constructs the Wav2Vec2Encoder.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the Wav2Vec2Encoder class.
TYPE:
|
hidden_states |
The input hidden states. A tensor of shape (batch_size, sequence_length, hidden_size).
TYPE:
|
attention_mask |
An optional tensor specifying the attention mask. Defaults to None.
TYPE:
|
output_attentions |
Whether to output attentions. Defaults to False.
TYPE:
|
output_hidden_states |
Whether to output hidden states. Defaults to False.
TYPE:
|
return_dict |
Whether to return a dictionary. Defaults to True.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the hidden_states tensor has invalid shape or type. |
ValueError
|
If the attention_mask tensor has invalid shape or type. |
TypeError
|
If the output_attentions or output_hidden_states parameters are not of type bool. |
TypeError
|
If the return_dict parameter is not of type bool. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2EncoderLayer
¶
Bases: Cell
A class representing an encoder layer of the Wav2Vec2 model.
The Wav2Vec2EncoderLayer class inherits from the nn.Cell class and implements the functionality of a single encoder layer in the Wav2Vec2 model architecture. It consists of multiple sub-modules, including an attention mechanism, dropout layers, layer normalization, and a feed-forward neural network.
| ATTRIBUTE | DESCRIPTION |
|---|---|
attention |
The attention mechanism used in the layer.
TYPE:
|
dropout |
The dropout layer applied to the hidden states.
TYPE:
|
layer_norm |
The layer normalization applied to the hidden states.
TYPE:
|
feed_forward |
The feed-forward neural network used in the layer.
TYPE:
|
final_layer_norm |
The final layer normalization applied to the hidden states.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
construct |
Applies the forward pass of the encoder layer. Args:
Returns:
|
Note
The Wav2Vec2EncoderLayer class is designed to be used within the Wav2Vec2Encoder class, which stacks multiple encoder layers to form the complete Wav2Vec2 model.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2EncoderLayer.__init__(config)
¶
Initializes a Wav2Vec2EncoderLayer instance.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the Wav2Vec2EncoderLayer class.
TYPE:
|
config |
An instance of Wav2Vec2Config containing configuration parameters for the encoder layer.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2EncoderLayer.construct(hidden_states, attention_mask=None, output_attentions=False)
¶
Constructs the Wav2Vec2EncoderLayer.
This method applies the Wav2Vec2EncoderLayer to the input hidden_states. It performs attention, residual connections, layer normalization, feed-forward, and final layer normalization.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the Wav2Vec2EncoderLayer class.
TYPE:
|
hidden_states |
The input hidden states of shape (batch_size, sequence_length, hidden_size).
TYPE:
|
attention_mask |
The attention mask of shape (batch_size, sequence_length). Defaults to None.
TYPE:
|
output_attentions |
Whether to output the attention weights. Defaults to False.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple
|
A tuple containing the hidden states of shape (batch_size, sequence_length, hidden_size). If output_attentions is True, the tuple also contains the attention weights of shape (batch_size, num_heads, sequence_length, sequence_length). |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2EncoderLayerStableLayerNorm
¶
Bases: Cell
This class represents an encoder layer in the Wav2Vec2 model with stable layer normalization. It inherits from the nn.Cell class.
| ATTRIBUTE | DESCRIPTION |
|---|---|
attention |
An instance of the Wav2Vec2Attention class for attention mechanism.
TYPE:
|
dropout |
An instance of the nn.Dropout class for dropout regularization.
TYPE:
|
layer_norm |
An instance of the nn.LayerNorm class for stable layer normalization.
TYPE:
|
feed_forward |
An instance of the Wav2Vec2FeedForward class for feed-forward layer.
TYPE:
|
final_layer_norm |
An instance of the nn.LayerNorm class for stable layer normalization of final output.
TYPE:
|
adapter_layer |
An instance of the Wav2Vec2AttnAdapterLayer class for adapter layer, if provided. None otherwise.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
construct |
Applies the encoder layer operations on the input hidden states. Args:
Returns:
|
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2EncoderLayerStableLayerNorm.__init__(config)
¶
Initializes a new instance of the Wav2Vec2EncoderLayerStableLayerNorm class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the class.
|
config |
The configuration object containing the settings for the encoder layer. It should be an instance of the Wav2Vec2Config class.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2EncoderLayerStableLayerNorm.construct(hidden_states, attention_mask=None, output_attentions=False)
¶
Constructs the Wav2Vec2EncoderLayerStableLayerNorm.
| PARAMETER | DESCRIPTION |
|---|---|
self |
Instance of the Wav2Vec2EncoderLayerStableLayerNorm class.
|
hidden_states |
The input hidden states to be processed by the encoder layer.
TYPE:
|
attention_mask |
Optional tensor representing the attention mask. Defaults to None. If provided, masks certain elements in the attention computation.
TYPE:
|
output_attentions |
Flag indicating whether to output attention weights during computation. Defaults to False.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tuple
|
A tuple containing the processed hidden states and optionally the attention weights. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2EncoderStableLayerNorm
¶
Bases: Cell
Wav2Vec2EncoderStableLayerNorm is a Python class that represents an encoder with stable layer normalization for the Wav2Vec2 model. This class inherits from the nn.Cell module.
This class initializes with a Wav2Vec2Config object and constructs a series of encoder layers with stable layer normalization. The encoder layers operate on the input hidden states and optionally apply attention masks, producing hidden states with added positional embeddings and layer normalization.
The construct method applies the encoder layers to the input hidden states, handling attention masks, outputting hidden states, and attentions based on the specified configurations.
This class provides functionalities for building and using a stable layer normalization encoder for the Wav2Vec2 model, supporting various output options and configurations.
For detailed information on the class methods and usage, please refer to the specific method docstrings within the source code.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2EncoderStableLayerNorm.__init__(config)
¶
Initializes an instance of the Wav2Vec2EncoderStableLayerNorm class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The object instance.
|
config |
The configuration object for the Wav2Vec2 model.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2EncoderStableLayerNorm.construct(hidden_states, attention_mask=None, output_attentions=False, output_hidden_states=False, return_dict=True)
¶
Constructs the Wav2Vec2EncoderStableLayerNorm.
Args:
- hidden_states: The input hidden states of shape (batch_size, sequence_length, hidden_size).
- attention_mask: Optional attention mask of shape (batch_size, sequence_length). It is used to mask the attention scores.
- output_attentions: Boolean flag indicating whether to output attention weights. Defaults to False.
- output_hidden_states: Boolean flag indicating whether to output hidden states of all layers. Defaults to False.
- return_dict: Boolean flag indicating whether to return a dictionary as output. Defaults to True.
| RETURNS | DESCRIPTION |
|---|---|
|
None |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2FeatureEncoder
¶
Bases: Cell
Construct the features from raw audio waveform
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2FeatureEncoder.__init__(config)
¶
Initializes a new instance of the Wav2Vec2FeatureEncoder class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The object itself.
|
config |
The configuration object for the feature encoder.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2FeatureEncoder.construct(input_values)
¶
Method 'construct' in the class 'Wav2Vec2FeatureEncoder' constructs the hidden states from the input values using convolutional layers.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the class.
TYPE:
|
input_values |
The input values for constructing hidden states. It is expected to be a 2D tensor.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tensor
|
The constructed hidden states after passing through the convolutional layers. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2FeatureExtractor
¶
Bases: Wav2Vec2FeatureEncoder
Wav2Vec2FeatureExtractor is a class that represents a feature extractor for Wav2Vec2 models. It is designed to extract features from audio data for use in Wav2Vec2 models.
This class inherits from Wav2Vec2FeatureEncoder, and it is recommended to use Wav2Vec2FeatureEncoder instead of this class, as Wav2Vec2FeatureExtractor has been deprecated and will be removed in Transformers v5.
Please refer to the documentation for Wav2Vec2FeatureEncoder for feature extraction and encoding in Wav2Vec2 models.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2FeatureExtractor.__init__(config)
¶
This method initializes an instance of the Wav2Vec2FeatureExtractor class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the class.
|
config |
An instance of the Wav2Vec2Config class containing the configuration parameters for the feature extractor.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
| RAISES | DESCRIPTION |
|---|---|
FutureWarning
|
If the class Wav2Vec2FeatureExtractor is used, a FutureWarning is raised indicating that the class has been depreciated and will be removed in Transformers v5. It is recommended to use the base class instead. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2FeatureProjection
¶
Bases: Cell
Wav2Vec2FeatureProjection is a Python class that represents a feature projection module for Wav2Vec2. This class inherits from nn.Cell and contains methods for initializing the feature projection and constructing the hidden states.
The init method initializes the feature projection module by setting up layer normalization, dense projection, and dropout.
The construct method applies layer normalization to the hidden states, projects the normalized states using dense projection, and applies dropout to the projected states before returning the hidden states and the normalized hidden states.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2FeatureProjection.__init__(config)
¶
Initializes the Wav2Vec2FeatureProjection class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the Wav2Vec2FeatureProjection class.
|
config |
An instance of the Wav2Vec2Config class containing the configuration parameters for the Wav2Vec2 feature projection. It specifies the configuration for the layer normalization, projection, and dropout layers.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
| RAISES | DESCRIPTION |
|---|---|
TypeError
|
If the config parameter is not of type Wav2Vec2Config. |
ValueError
|
If the config.conv_dim[-1] is not valid or if the config.hidden_size is not valid. |
RuntimeError
|
If an error occurs during the initialization of layer normalization, projection, or dropout layers. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2FeatureProjection.construct(hidden_states)
¶
This method constructs the hidden states by applying layer normalization, projection, and dropout.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the Wav2Vec2FeatureProjection class. |
hidden_states |
The input hidden states to be processed. It should be a tensor of shape (batch_size, sequence_length, feature_dim).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
Tuple[Tensor, Tensor]: A tuple containing two tensors:
|
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2FeedForward
¶
Bases: Cell
Wav2Vec2FeedForward is a class representing the feedforward network for the Wav2Vec2 model. This class inherits from nn.Cell and contains methods for initializing the network and constructing the feedforward layers.
The init method initializes the feedforward network with the provided configuration. It sets up the intermediate dropout, intermediate dense, intermediate activation function, output dense, and output dropout layers based on the configuration parameters.
The construct method takes hidden states as input and processes them through the intermediate dense layer, intermediate activation function, intermediate dropout layer, output dense layer, and output dropout layer. It then returns the processed hidden states.
Note
This docstring is based on the provided code snippet and may need to be updated with additional information once the entire class implementation is available.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2FeedForward.__init__(config)
¶
Initialize the Wav2Vec2FeedForward class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
Instance of the class.
|
config |
Configuration object containing parameters for initialization. The config parameter is of type Wav2Vec2Config and holds the configuration settings required for initializing the feed-forward module. It is expected to contain the following attributes:
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2FeedForward.construct(hidden_states)
¶
Constructs the feed-forward network for the Wav2Vec2 model.
| PARAMETER | DESCRIPTION |
|---|---|
self |
An instance of the Wav2Vec2FeedForward class.
TYPE:
|
hidden_states |
The input hidden states to be passed through the feed-forward network.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
torch.Tensor: The output hidden states after passing through the feed-forward network. |
| RAISES | DESCRIPTION |
|---|---|
TypeError
|
If the input hidden_states is not of type torch.Tensor. |
ValueError
|
If the input hidden_states does not have a rank of 2. |
This method takes the input hidden states and passes them through a feed-forward network consisting of several layers. The feed-forward network is constructed using intermediate dense layers, activation functions, and dropout layers. The hidden_states are first passed through the intermediate dense layer, followed by the intermediate activation function and dropout layer. The resulting hidden_states are then passed through the output dense layer and another dropout layer. The final output hidden_states are returned. Note that the input hidden_states must be a tensor of rank 2, representing a batch of hidden states.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForAudioFrameClassification
¶
Bases: Wav2Vec2PreTrainedModel
This class represents a Wav2Vec2 model for audio frame classification. It inherits from the Wav2Vec2PreTrainedModel and includes methods for initializing the model, freezing the feature encoder and base model, as well as constructing the model for inference and training.
| ATTRIBUTE | DESCRIPTION |
|---|---|
wav2vec2 |
The Wav2Vec2Model used for audio frame classification.
TYPE:
|
classifier |
The classification head for the model.
TYPE:
|
num_labels |
The number of labels for classification.
TYPE:
|
layer_weights |
The weights for weighted layer sum if configured.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
__init__ |
Initializes the Wav2Vec2ForAudioFrameClassification model with the provided configuration. |
freeze_feature_encoder |
Disables the gradient computation for the feature encoder, preventing its parameters from being updated during training. |
freeze_base_model |
Disables the gradient computation for the base model, preventing its parameters from being updated during training while allowing the classification head to be updated. |
construct |
Constructs the model for inference and training, handling input values, attention masks, labels, and other optional parameters. Returns TokenClassifierOutput containing loss, logits, hidden states, and attentions. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForAudioFrameClassification.__init__(config)
¶
Initializes a new instance of the Wav2Vec2ForAudioFrameClassification class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the class.
|
config |
The configuration object for the Wav2Vec2 model. It specifies the parameters and settings for the model initialization. Must be an instance of Wav2Vec2Config.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the 'config' object has the attribute 'add_adapter' set to True, which is not supported for audio frame classification with Wav2Vec2. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForAudioFrameClassification.construct(input_values, attention_mask=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)
¶
| PARAMETER | DESCRIPTION |
|---|---|
labels |
Labels for computing the sequence classification/regression loss. Indices should be in
TYPE:
|
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForAudioFrameClassification.freeze_base_model()
¶
Calling this function will disable the gradient computation for the base model so that its parameters will not be updated during training. Only the classification head will be updated.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
3209 3210 3211 3212 3213 3214 3215 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForAudioFrameClassification.freeze_feature_encoder()
¶
Calling this function will disable the gradient computation for the feature encoder so that its parameter will not be updated during training.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
3202 3203 3204 3205 3206 3207 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForCTC
¶
Bases: Wav2Vec2PreTrainedModel
This class represents a Wav2Vec2 model fine-tuned for Connectionist Temporal Classification (CTC) tasks. It inherits from the Wav2Vec2PreTrainedModel, providing methods for initializing the model, tying weights, freezing the feature extractor, feature encoder, and base model, as well as constructing the model for inference and training.
The Wav2Vec2ForCTC class encapsulates the Wav2Vec2 model with additional methods for CTC-specific functionality, such as handling labels for CTC, computing CTC loss, and processing input values for CTC tasks.
The class provides methods for fine-tuning the Wav2Vec2 model for CTC tasks, including freezing specific components of the model, as well as constructing the model for CTC inference and training.
Additionally, the class provides methods for tying weights and freezing specific components of the model to ensure compatibility with adapter weights and to control parameter updates during training.
This class is designed for fine-tuning the Wav2Vec2 model for CTC tasks, providing a comprehensive set of methods for customizing the model's behavior and supporting CTC-specific functionality.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForCTC.__init__(config, target_lang=None)
¶
Initializes a new instance of the Wav2Vec2ForCTC class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The object itself.
|
config |
The configuration for the Wav2Vec2Model.
TYPE:
|
target_lang |
The target language. Defaults to None.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the configuration does not define the vocabulary size of the language model head. |
Note
The vocabulary size of the language model head must be defined either by instantiating the model
with Wav2Vec2ForCTC.from_pretrained(..., vocab_size=vocab_size) or by explicitly defining the
vocab_size in the model's configuration.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForCTC.construct(input_values, attention_mask=None, output_attentions=None, output_hidden_states=None, return_dict=None, labels=None)
¶
| PARAMETER | DESCRIPTION |
|---|---|
labels |
Labels for connectionist temporal classification. Note that
TYPE:
|
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForCTC.freeze_base_model()
¶
Calling this function will disable the gradient computation for the base model so that its parameters will not be updated during training. Only the classification head will be updated.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
2922 2923 2924 2925 2926 2927 2928 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForCTC.freeze_feature_encoder()
¶
Calling this function will disable the gradient computation for the feature encoder so that its parameter will not be updated during training.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
2915 2916 2917 2918 2919 2920 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForCTC.freeze_feature_extractor()
¶
Calling this function will disable the gradient computation for the feature encoder so that its parameter will not be updated during training.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForCTC.tie_weights()
¶
This method overwrites [~PreTrainedModel.tie_weights] so that adapter weights can be correctly loaded when
passing target_lang=... to from_pretrained(...).
This method is not supposed to be called by the user and is prone to be changed in the future.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForMaskedLM
¶
Bases: Wav2Vec2PreTrainedModel
This class represents a Wav2Vec2 model for Masked Language Modeling (MLM).
It is deprecated and should be replaced with Wav2Vec2ForCTC.
The Wav2Vec2ForMaskedLM class inherits from the Wav2Vec2PreTrainedModel class.
| ATTRIBUTE | DESCRIPTION |
|---|---|
`wav2vec2` |
The underlying Wav2Vec2Model.
|
`dropout` |
A dropout layer for regularization.
|
`lm_head` |
A dense layer for language modeling prediction.
|
| METHOD | DESCRIPTION |
|---|---|
`__init__` |
Initializes a new instance of the |
`construct` |
Constructs the model for masked language modeling. |
Note
This class is deprecated and should be replaced with Wav2Vec2ForCTC.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForMaskedLM.__init__(config)
¶
Initializes an instance of the 'Wav2Vec2ForMaskedLM' class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The object instance.
|
config |
The configuration object containing various hyperparameters for the model.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
| RAISES | DESCRIPTION |
|---|---|
FutureWarning
|
Raised if the class |
Description
This method initializes an instance of the 'Wav2Vec2ForMaskedLM' class. It sets up the model architecture and initializes the necessary components. The initialization process includes the following steps:
- Calls the parent class 'init' method using 'super()' to initialize the base class.
- Raises a 'FutureWarning' to notify users that the class
Wav2Vec2ForMaskedLMis deprecated and recommends usingWav2Vec2ForCTCinstead. - Initializes the 'wav2vec2' attribute as an instance of 'Wav2Vec2Model' using the provided 'config'.
- Initializes the 'dropout' attribute as an instance of 'nn.Dropout' with the dropout probability specified in 'config'.
- Initializes the 'lm_head' attribute as an instance of 'nn.Dense' with the hidden size and vocabulary size specified in 'config'.
- Calls the 'post_init' method to perform any additional post-initialization steps.
Note
The 'Wav2Vec2ForMaskedLM' class is deprecated and may not be supported in future versions. It is recommended to use the 'Wav2Vec2ForCTC' class instead.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForMaskedLM.construct(input_values, attention_mask=None, output_attentions=None, output_hidden_states=None, return_dict=None, labels=None)
¶
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the Wav2Vec2ForMaskedLM class.
TYPE:
|
input_values |
The input tensor representing the input audio features. Its shape is (batch_size, sequence_length, feature_dim).
TYPE:
|
attention_mask |
Optional tensor representing the attention mask for the input. If provided, should have the shape (batch_size, sequence_length).
TYPE:
|
output_attentions |
Optional flag to indicate whether to return attentions in the output. Defaults to None.
TYPE:
|
output_hidden_states |
Optional flag to indicate whether to return hidden states in the output. Defaults to None.
TYPE:
|
return_dict |
Optional flag to indicate whether to return the output as a dictionary. If not provided, it defaults to the value specified in the configuration.
TYPE:
|
labels |
Optional tensor representing the labels for the masked language modeling task. Its shape is (batch_size, sequence_length).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Union[Tuple, MaskedLMOutput]
|
Union[Tuple, MaskedLMOutput]: The return value can be either a tuple or a MaskedLMOutput object.
|
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForPreTraining
¶
Bases: Wav2Vec2PreTrainedModel
Wav2Vec2ForPreTraining
This class represents a pre-training model for Wav2Vec2, which is used for pre-training the Wav2Vec2 model. It includes methods for setting Gumbel softmax temperature, freezing the feature encoder, computing contrastive logits, and constructing the model for pre-training.
| METHOD | DESCRIPTION |
|---|---|
set_gumbel_temperature |
Set the Gumbel softmax temperature to a given value. Only necessary for training. |
freeze_feature_extractor |
Disable gradient computation for the feature encoder to prevent parameter updates during training. |
freeze_feature_encoder |
Disable gradient computation for the feature encoder to prevent parameter updates during training. |
compute_contrastive_logits |
Compute logits for contrastive loss based on cosine similarity between features and apply temperature. |
construct |
Construct the model for pre-training, including masking features for contrastive loss. |
| ATTRIBUTE | DESCRIPTION |
|---|---|
wav2vec2 |
Wav2Vec2Model instance for the Wav2Vec2 model.
|
dropout_features |
Dropout layer for feature vectors.
|
quantizer |
Wav2Vec2GumbelVectorQuantizer instance for quantization.
|
project_hid |
Dense layer for projecting hidden states.
|
project_q |
Dense layer for projecting quantized features.
|
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForPreTraining.__init__(config)
¶
Initializes a new instance of the Wav2Vec2ForPreTraining class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the Wav2Vec2ForPreTraining class.
|
config |
The configuration object for the Wav2Vec2 model.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForPreTraining.compute_contrastive_logits(target_features, negative_features, predicted_features, temperature=0.1)
staticmethod
¶
Compute logits for contrastive loss based using cosine similarity as the distance measure between
[positive_feature, negative_features] and [predicted_features]. Additionally, temperature can be applied.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForPreTraining.construct(input_values, attention_mask=None, mask_time_indices=None, sampled_negative_indices=None, output_attentions=None, output_hidden_states=None, return_dict=None)
¶
| PARAMETER | DESCRIPTION |
|---|---|
mask_time_indices |
Indices to mask extracted features for contrastive loss. When in training mode, model learns to predict masked extracted features in config.proj_codevector_dim space.
TYPE:
|
sampled_negative_indices |
Indices indicating which quantized target vectors are used as negative sampled vectors in contrastive loss. Required input for pre-training.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Union[Tuple, Wav2Vec2ForPreTrainingOutput]
|
Union[Tuple, Wav2Vec2ForPreTrainingOutput] |
Example
>>> import torch
>>> from transformers import AutoFeatureExtractor, Wav2Vec2ForPreTraining
>>> from transformers.models.wav2vec2.modeling_wav2vec2 import _compute_mask_indices, _sample_negative_indices
>>> from datasets import load_dataset
...
>>> feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/wav2vec2-base")
>>> model = Wav2Vec2ForPreTraining.from_pretrained("facebook/wav2vec2-base")
...
>>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
>>> input_values = feature_extractor(ds[0]["audio"]["array"], return_tensors="pt").input_values # Batch size 1
...
>>> # compute masked indices
>>> batch_size, raw_sequence_length = input_values.shape
>>> sequence_length = model._get_feat_extract_output_lengths(raw_sequence_length).item()
>>> mask_time_indices = _compute_mask_indices(
... shape=(batch_size, sequence_length), mask_prob=0.2, mask_length=2
... )
>>> sampled_negative_indices = _sample_negative_indices(
... features_shape=(batch_size, sequence_length),
... num_negatives=model.config.num_negatives,
... mask_time_indices=mask_time_indices,
... )
>>> mask_time_indices = Tensor(data=mask_time_indices, device=input_values.device, dtype=mindspore.int64)
>>> sampled_negative_indices = Tensor(
... data=sampled_negative_indices, device=input_values.device, dtype=mindspore.int64
... )
...
>>> with ops.no_grad():
... outputs = model(input_values, mask_time_indices=mask_time_indices)
...
>>> # compute cosine similarity between predicted (=projected_states) and target (=projected_quantized_states)
>>> cosine_sim = ops.cosine_similarity(outputs.projected_states, outputs.projected_quantized_states, axis=-1)
...
>>> # show that cosine similarity is much higher than random
>>> cosine_sim[mask_time_indices.to(mindspore.bool_)].mean() > 0.5
tensor(True)
>>> # for contrastive loss training model should be put into train mode
>>> model = model.train()
>>> loss = model(
... input_values, mask_time_indices=mask_time_indices, sampled_negative_indices=sampled_negative_indices
... ).loss
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForPreTraining.freeze_feature_encoder()
¶
Calling this function will disable the gradient computation for the feature encoder so that its parameter will not be updated during training.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
2507 2508 2509 2510 2511 2512 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForPreTraining.freeze_feature_extractor()
¶
Calling this function will disable the gradient computation for the feature encoder so that its parameters will not be updated during training.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForPreTraining.set_gumbel_temperature(temperature)
¶
Set the Gumbel softmax temperature to a given value. Only necessary for training
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
2489 2490 2491 2492 2493 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForPreTrainingOutput
dataclass
¶
Bases: ModelOutput
Output type of [Wav2Vec2ForPreTraining], with potential hidden states and attentions.
| PARAMETER | DESCRIPTION |
|---|---|
loss |
Total loss as the sum of the contrastive loss (L_m) and the diversity loss (L_d) as stated in the official paper . (classification) loss.
TYPE:
|
projected_states |
Hidden-states of the model projected to config.proj_codevector_dim that can be used to predict the masked projected quantized states.
TYPE:
|
projected_quantized_states |
Quantized extracted feature vectors projected to config.proj_codevector_dim representing the positive target vectors for contrastive loss.
TYPE:
|
contrastive_loss |
The contrastive loss (L_m) as stated in the official paper .
TYPE:
|
diversity_loss |
The diversity loss (L_d) as stated in the official paper .
TYPE:
|
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForSequenceClassification
¶
Bases: Wav2Vec2PreTrainedModel
The Wav2Vec2ForSequenceClassification class represents a Wav2Vec2 model for sequence classification tasks.
It inherits from the Wav2Vec2PreTrainedModel class. This class provides methods for initializing the model,
freezing specific components, and computing the sequence classification output. It also includes methods for
handling the feature extractor, feature encoder, and base model. The class supports the construction of the sequence
classification output and provides options for setting various parameters such as attention masks, output attentions,
output hidden states, and labels.
Deprecated methods such as freeze_feature_extractor and freeze_base_model are included along with their
corresponding replacements. The construct method computes the sequence classification/regression loss and handles
the classification output based on the input values, attention masks, and labels. The class allows for fine-tuning
the model for sequence classification tasks while providing flexibility in handling different components and
parameters.
For detailed information about the class and its methods, refer to the individual method docstrings and the base
class Wav2Vec2PreTrainedModel for additional context and functionality.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForSequenceClassification.__init__(config)
¶
Initializes a new instance of the Wav2Vec2ForSequenceClassification class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The object itself.
|
config |
An instance of Wav2Vec2Config containing the configuration settings for the model.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
Raised if the 'add_adapter' attribute is set to True in the config, as sequence classification does not support the use of Wav2Vec2 adapters. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForSequenceClassification.construct(input_values, attention_mask=None, output_attentions=None, output_hidden_states=None, return_dict=None, labels=None)
¶
| PARAMETER | DESCRIPTION |
|---|---|
labels |
Labels for computing the sequence classification/regression loss. Indices should be in
TYPE:
|
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForSequenceClassification.freeze_base_model()
¶
Calling this function will disable the gradient computation for the base model so that its parameters will not be updated during training. Only the classification head will be updated.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
3071 3072 3073 3074 3075 3076 3077 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForSequenceClassification.freeze_feature_encoder()
¶
Calling this function will disable the gradient computation for the feature encoder so that its parameter will not be updated during training.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
3064 3065 3066 3067 3068 3069 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForSequenceClassification.freeze_feature_extractor()
¶
Calling this function will disable the gradient computation for the feature encoder so that its parameters will not be updated during training.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForXVector
¶
Bases: Wav2Vec2PreTrainedModel
This class represents a Wav2Vec2 model for extracting x-vector embeddings from audio data. It inherits from the Wav2Vec2PreTrainedModel class, and provides methods for freezing specific model components and computing x-vector embeddings from input audio data.
The class contains methods for freezing the feature extractor, freezing the feature encoder, and freezing the base model to disable gradient computation for specific model components. Additionally, it includes methods for computing the output length of the TDNN layers and for constructing x-vector embeddings from input audio data.
The construct method takes input audio data and optional parameters such as attention mask and labels, and returns x-vector embeddings along with optional loss and hidden states. The method also supports outputting hidden states and attentions based on the configuration settings.
This class is designed to be used for x-vector extraction tasks and provides flexibility for customizing the model's behavior and freezing specific components during training.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForXVector.__init__(config)
¶
Initializes an instance of the Wav2Vec2ForXVector class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the Wav2Vec2ForXVector class.
|
config |
An object of type Wav2Vec2Config containing configuration settings for the model.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForXVector.construct(input_values, attention_mask=None, output_attentions=None, output_hidden_states=None, return_dict=None, labels=None)
¶
| PARAMETER | DESCRIPTION |
|---|---|
labels |
Labels for computing the sequence classification/regression loss. Indices should be in
TYPE:
|
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForXVector.freeze_base_model()
¶
Calling this function will disable the gradient computation for the base model so that its parameters will not be updated during training. Only the classification head will be updated.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
3496 3497 3498 3499 3500 3501 3502 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForXVector.freeze_feature_encoder()
¶
Calling this function will disable the gradient computation for the feature encoder so that its parameter will not be updated during training.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
3489 3490 3491 3492 3493 3494 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForXVector.freeze_feature_extractor()
¶
Calling this function will disable the gradient computation for the feature encoder so that its parameter will not be updated during training.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2GroupNormConvLayer
¶
Bases: Cell
This class represents a group normalization convolutional layer used in the Wav2Vec2 model. It applies a 1D convolution operation followed by group normalization, activation, and layer normalization to the input hidden states.
| PARAMETER | DESCRIPTION |
|---|---|
config |
The configuration object containing the settings for the Wav2Vec2 model.
TYPE:
|
layer_id |
The index of the convolutional layer in the model. Defaults to 0.
TYPE:
|
| ATTRIBUTE | DESCRIPTION |
|---|---|
in_conv_dim |
The input dimension of the convolutional layer.
TYPE:
|
out_conv_dim |
The output dimension of the convolutional layer.
TYPE:
|
conv |
The 1D convolutional layer used to process the hidden states.
TYPE:
|
activation |
The activation function applied to the processed hidden states.
TYPE:
|
layer_norm |
The group normalization layer applied to the hidden states.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
construct |
Applies the convolutional layer, normalization, activation, and returns the processed hidden states. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2GroupNormConvLayer.__init__(config, layer_id=0)
¶
Initializes an instance of the Wav2Vec2GroupNormConvLayer class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The current instance of the class.
|
config |
An instance of the Wav2Vec2Config class containing configuration settings.
TYPE:
|
layer_id |
The index of the convolutional layer within the configuration. Defaults to 0.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the layer_id is less than 0. |
KeyError
|
If the specified activation function in config is not found in the ACT2FN dictionary. |
ValueError
|
If the specified pad_mode in the nn.Conv1d function is not 'valid'. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2GroupNormConvLayer.construct(hidden_states)
¶
This method constructs a group normalization convolutional layer for the Wav2Vec2 model.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the Wav2Vec2GroupNormConvLayer class. |
hidden_states |
The input tensor representing the hidden states to be processed by the group normalization convolutional layer.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
torch.Tensor: The processed tensor representing the hidden states after passing through the group normalization convolutional layer. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2GumbelVectorQuantizer
¶
Bases: Cell
Vector quantization using gumbel softmax. See `CATEGORICAL REPARAMETERIZATION WITH GUMBEL-SOFTMAX for more information.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2GumbelVectorQuantizer.__init__(config)
¶
Initializes a new instance of the Wav2Vec2GumbelVectorQuantizer class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the Wav2Vec2GumbelVectorQuantizer class.
|
config |
An instance of the Wav2Vec2Config class containing configuration parameters for the vector quantizer.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2GumbelVectorQuantizer.construct(hidden_states, mask_time_indices=None)
¶
Constructs codevectors and computes perplexity for Wav2Vec2GumbelVectorQuantizer.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the Wav2Vec2GumbelVectorQuantizer class.
|
hidden_states |
The input hidden states with shape (batch_size, sequence_length, hidden_size).
TYPE:
|
mask_time_indices |
A binary mask tensor of shape (batch_size, sequence_length) where 1s indicate valid time indices and 0s indicate masked time indices. Default is None.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple
|
A tuple containing:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the input hidden_states tensor has an invalid shape. |
RuntimeError
|
If the function encounters a runtime error during computation. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2LayerNormConvLayer
¶
Bases: Cell
This class represents a convolutional layer with layer normalization in the Wav2Vec2 model. It inherits from the nn.Cell class.
| ATTRIBUTE | DESCRIPTION |
|---|---|
config |
The configuration object for the Wav2Vec2 model.
TYPE:
|
layer_id |
The ID of the current layer.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
__init__ |
Initializes the Wav2Vec2LayerNormConvLayer with the given configuration and layer ID. |
construct |
Applies the convolutional layer with layer normalization to the input hidden states. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2LayerNormConvLayer.__init__(config, layer_id=0)
¶
Initialize the Wav2Vec2LayerNormConvLayer.
| PARAMETER | DESCRIPTION |
|---|---|
config |
The configuration object containing the parameters for the layer.
TYPE:
|
layer_id |
The ID of the layer. Defaults to 0.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2LayerNormConvLayer.construct(hidden_states)
¶
Construct the hidden states using the Wav2Vec2LayerNormConvLayer method.
| PARAMETER | DESCRIPTION |
|---|---|
self |
An instance of the Wav2Vec2LayerNormConvLayer class. |
hidden_states |
The input hidden states to be processed. It should have the shape (batch_size, sequence_length, feature_dim).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2Model
¶
Bases: Wav2Vec2PreTrainedModel
The Wav2Vec2Model class is a Python class that represents a Wav2Vec2 model for speech recognition.
It is a subclass of the Wav2Vec2PreTrainedModel class.
Wav2Vec2Model inherits the following attributes and methods from the parent class:
config: An instance of theWav2Vec2Configclass, containing the configuration parameters for the model.feature_extractor: An instance of theWav2Vec2FeatureEncoderclass, responsible for extracting features from the input waveform.feature_projection: An instance of theWav2Vec2FeatureProjectionclass, responsible for projecting the extracted features.encoder: An instance of theWav2Vec2EncoderorWav2Vec2EncoderStableLayerNormclass, responsible for encoding the hidden states.adapter: An instance of theWav2Vec2Adapterclass, used to adapt the hidden states (optional).post_init(): A method called after the initialization of the model.
The Wav2Vec2Model class also defines the following methods:
freeze_feature_extractor: Disables the gradient computation for the feature encoder, preventing its parameters from being updated during training.freeze_feature_encoder: Disables the gradient computation for the feature encoder, preventing its parameters from being updated during training._mask_hidden_states: Masks extracted features along the time axis and/or the feature axis according to SpecAugment.construct: Constructs the model by processing the input values and returns the model outputs.
Please note that the freeze_feature_extractor() method is deprecated and will be removed in Transformers v5.
The equivalent freeze_feature_encoder() method should be used instead.
For more information about the Wav2Vec2 model, please refer to the official paper [SpecAugment] (https://arxiv.org/abs/1904.08779).
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2Model.__init__(config)
¶
Initializes a new instance of the Wav2Vec2Model class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the Wav2Vec2Model class.
|
config |
An instance of the Wav2Vec2Config class containing the configuration parameters for the model.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
| RAISES | DESCRIPTION |
|---|---|
TypeError
|
If the config parameter is not of type Wav2Vec2Config. |
ValueError
|
If the config parameters mask_time_prob or mask_feature_prob are less than 0.0. |
ValueError
|
If the config parameter do_stable_layer_norm is not a boolean value. |
ValueError
|
If the config parameter hidden_size is not defined. |
ValueError
|
If an error occurs during the initialization process. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2Model.construct(input_values, attention_mask=None, mask_time_indices=None, output_attentions=None, output_hidden_states=None, return_dict=None)
¶
Constructs the Wav2Vec2 model for processing input audio data.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the Wav2Vec2Model class.
TYPE:
|
input_values |
The input audio data values with shape (batch_size, audio_length).
TYPE:
|
attention_mask |
The attention mask for the input audio data with shape (batch_size, audio_length).
TYPE:
|
mask_time_indices |
The mask for time indices with shape (batch_size, audio_length).
TYPE:
|
output_attentions |
Whether to output attentions. Defaults to None.
TYPE:
|
output_hidden_states |
Whether to output hidden states. Defaults to None.
TYPE:
|
return_dict |
Whether to return a dictionary of output. Defaults to None.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Union[Tuple, Wav2Vec2BaseModelOutput]
|
Union[Tuple, Wav2Vec2BaseModelOutput]: The constructed model output, which can be a tuple or a Wav2Vec2BaseModelOutput object. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the input_values and attention_mask have mismatched shapes. |
TypeError
|
If the input_values or attention_mask is not a Tensor. |
RuntimeError
|
If the encoder fails to process the input audio data. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2Model.freeze_feature_encoder()
¶
Calling this function will disable the gradient computation for the feature encoder so that its parameter will not be updated during training.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
2311 2312 2313 2314 2315 2316 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2Model.freeze_feature_extractor()
¶
Calling this function will disable the gradient computation for the feature encoder so that its parameters will not be updated during training.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2NoLayerNormConvLayer
¶
Bases: Cell
Wav2Vec2NoLayerNormConvLayer is a Python class representing a convolutional layer without layer normalization for the Wav2Vec2 model. This class inherits from nn.Cell and is used for processing audio features.
| ATTRIBUTE | DESCRIPTION |
|---|---|
config |
The configuration object for the Wav2Vec2 model.
TYPE:
|
layer_id |
The index of the convolutional layer.
TYPE:
|
in_conv_dim |
The input dimension of the convolutional layer.
TYPE:
|
out_conv_dim |
The output dimension of the convolutional layer.
TYPE:
|
conv |
The 1D convolutional operation applied to the input.
TYPE:
|
activation |
The activation function used to process the convolutional output.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
__init__ |
Initializes the Wav2Vec2NoLayerNormConvLayer with the provided configuration and layer index. |
construct |
Applies the convolutional and activation operations to the input hidden_states. |
Note
This class is part of the Wav2Vec2 model and is specifically designed for processing audio features without layer normalization.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2NoLayerNormConvLayer.__init__(config, layer_id=0)
¶
init(self, config: Wav2Vec2Config, layer_id=0)
Initializes a new instance of the Wav2Vec2NoLayerNormConvLayer class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the class.
|
config |
An instance of the Wav2Vec2Config class containing the configuration parameters for the Wav2Vec2 model.
TYPE:
|
layer_id |
The index of the layer. Defaults to 0. Specifies the layer for which the convolutional layer is initialized.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the layer_id is less than 0. |
AttributeError
|
If the layer_id exceeds the maximum index available in the configuration parameters. |
TypeError
|
If the provided config parameter is not an instance of the Wav2Vec2Config class. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2NoLayerNormConvLayer.construct(hidden_states)
¶
Constructs the hidden states using convolutional layer and activation function.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the Wav2Vec2NoLayerNormConvLayer class. |
hidden_states |
The input hidden states tensor.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
torch.Tensor: The constructed hidden states after applying convolution and activation. |
| RAISES | DESCRIPTION |
|---|---|
TypeError
|
If the input hidden_states is not a torch.Tensor. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2PositionalConvEmbedding
¶
Bases: Cell
This class represents a positional convolutional embedding layer in the Wav2Vec2 model architecture. It inherits from nn.Cell and is designed to process hidden states through convolutional and activation operations.
| ATTRIBUTE | DESCRIPTION |
|---|---|
config |
Wav2Vec2Config An instance of Wav2Vec2Config containing configuration parameters for the layer.
|
| METHOD | DESCRIPTION |
|---|---|
__init__ |
Initializes the Wav2Vec2PositionalConvEmbedding with the provided configuration. |
construct |
Applies positional convolutional embedding operations on the input hidden_states and returns the transformed output. |
Usage
Instantiate this class by providing a Wav2Vec2Config object as configuration, then call the construct method with hidden states to process them.
Note
This class utilizes a convolutional layer, padding layer, and activation function to process hidden states efficiently.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2PositionalConvEmbedding.__init__(config)
¶
Initializes a new instance of the Wav2Vec2PositionalConvEmbedding class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
An instance of the Wav2Vec2PositionalConvEmbedding class.
|
config |
The configuration object containing various settings for the Wav2Vec2 model.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2PositionalConvEmbedding.construct(hidden_states)
¶
This method constructs the positional convolutional embedding for the Wav2Vec2 model.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The instance of the Wav2Vec2PositionalConvEmbedding class. |
hidden_states |
The input hidden states with shape (batch_size, sequence_length, hidden_size).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
None
|
This method does not return any value. The positional convolutional embedding is applied to the input hidden states in place. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the input hidden_states is not in the expected format or shape. |
RuntimeError
|
If an error occurs during the convolution or activation process. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2PreTrainedModel
¶
Bases: PreTrainedModel
An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2PreTrainedModel.init_adapter_layers()
¶
(Re-)initialize attention adapter layers and lm head for adapter-only fine-tuning
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2PreTrainedModel.load_adapter(target_lang, force_load=True, **kwargs)
¶
Load a language adapter model from a pre-trained adapter model.
| PARAMETER | DESCRIPTION |
|---|---|
target_lang |
Has to be a language id of an existing adapter weight. Adapter weights are stored in the format
adapter.
TYPE:
|
force_load |
Whether the weights shall be loaded even if
TYPE:
|
cache_dir |
Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
TYPE:
|
force_download |
Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
TYPE:
|
resume_download |
Whether or not to delete incompletely received files. Will attempt to resume the download if such a file exists.
TYPE:
|
proxies |
A dictionary of proxy servers to use by protocol or endpoint, e.g.,
TYPE:
|
local_files_only(`bool`, |
Whether or not to only look at local files (i.e., do not try to download the model).
TYPE:
|
token |
The token to use as HTTP bearer authorization for remote files. If
TYPE:
|
revision |
The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a
git-based system for storing models and other artifacts on hf-mirror.com, so To test a pull request you made on the Hub, you can pass `revision="refs/pr/
TYPE:
|
mirror |
Mirror source to accelerate downloads in China. If you are from China and have an accessibility problem, you can set this option to resolve it. Note that we do not guarantee the timeliness or safety. Please refer to the mirror site for more information.
TYPE:
|
Activate the special "offline-mode" to use this method in a firewalled environment.
Example
>>> from transformers import Wav2Vec2ForCTC, AutoProcessor
...
>>> ckpt = "facebook/mms-1b-all"
>>> processor = AutoProcessor.from_pretrained(ckpt)
>>> model = Wav2Vec2ForCTC.from_pretrained(ckpt, target_lang="eng")
>>> # set specific language
>>> processor.tokenizer.set_target_lang("spa")
>>> model.load_adapter("spa")
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2SamePadLayer
¶
Bases: Cell
This class represents a layer in the Wav2Vec2 model that performs padding removal.
Wav2Vec2SamePadLayer is a subclass of nn.Cell and is designed to remove padding from hidden states in the Wav2Vec2 model. It is primarily used in the Wav2Vec2 model for speech recognition tasks.
| ATTRIBUTE | DESCRIPTION |
|---|---|
num_pad_remove |
The number of padding elements to remove from the hidden states.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
__init__ |
Initializes a new instance of the Wav2Vec2SamePadLayer class. |
construct |
Removes padding elements from the hidden states. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2SamePadLayer.__init__(num_conv_pos_embeddings)
¶
Initializes an instance of the Wav2Vec2SamePadLayer class.
| PARAMETER | DESCRIPTION |
|---|---|
self |
The current instance of the Wav2Vec2SamePadLayer class.
TYPE:
|
num_conv_pos_embeddings |
The number of convolutional positional embeddings. It is used to determine the value of the num_pad_remove attribute. The value must be a non-negative integer.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 | |
mindnlp.transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2SamePadLayer.construct(hidden_states)
¶
Constructs the hidden states of the Wav2Vec2SamePadLayer.
| PARAMETER | DESCRIPTION |
|---|---|
self |
An instance of the Wav2Vec2SamePadLayer class.
TYPE:
|
hidden_states |
The hidden states to be processed.
Expected shape is (batch_size, sequence_length, hidden_size).
The hidden states are processed based on the
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
Source code in mindnlp/transformers/models/wav2vec2/modeling_wav2vec2.py
611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 | |