The models module¶
This module contains the models that can be used as the four main components
that will comprise a Wide and Deep model (wide, deeptabular,
deeptext, deepimage), as well as the WideDeep "constructor"
class. Note that each of the four components can be used independently. It
also contains all the documentation for the models that can be used for
self-supervised pre-training with tabular data.
Wide ¶
Bases: Module
Defines a Wide (linear) model where the non-linearities are
captured via the so-called crossed-columns. This can be used as the
wide component of a Wide & Deep model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_dim
|
int
|
size of the Linear layer (implemented via an Embedding layer).
|
required |
pred_dim
|
int
|
size of the ouput tensor containing the predictions. Note that unlike
all the other models, the wide model is connected directly to the
output neuron(s) when used to build a Wide and Deep model. Therefore,
it requires the |
1
|
Attributes:
| Name | Type | Description |
|---|---|---|
wide_linear |
Module
|
the linear layer that comprises the wide branch of the model |
Examples:
>>> import torch
>>> from pytorch_widedeep.models import Wide
>>> X = torch.empty(4, 4).random_(20)
>>> wide = Wide(input_dim=int(X.max().item()), pred_dim=1)
>>> out = wide(X)
Source code in pytorch_widedeep/models/tabular/linear/wide.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | |
forward ¶
forward(X)
Forward pass. Simply connecting the Embedding layer with the ouput neuron(s)
Source code in pytorch_widedeep/models/tabular/linear/wide.py
65 66 67 68 69 | |
TabMlp ¶
Bases: BaseTabularModelWithoutAttention
Defines a TabMlp model that can be used as the deeptabular
component of a Wide & Deep model or independently by itself.
This class combines embedding representations of the categorical features with numerical (aka continuous) features, embedded or not. These are then passed through a series of dense layers (i.e. a MLP).
Most of the parameters for this class are Optional since the use of
categorical or continuous is in fact optional (i.e. one can use
categorical features only, continuous features only or both).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
column_idx
|
Dict[str, int]
|
Dict containing the index of the columns that will be passed through
the |
required |
cat_embed_input
|
Optional[List[Tuple[str, int, int]]]
|
List of Tuples with the column name, number of unique values and embedding dimension. e.g. [(education, 11, 32), ...] |
None
|
cat_embed_dropout
|
Optional[float]
|
Categorical embeddings dropout. If |
None
|
use_cat_bias
|
Optional[bool]
|
Boolean indicating if bias will be used for the categorical embeddings.
If |
None
|
cat_embed_activation
|
Optional[str]
|
Activation function for the categorical embeddings, if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported |
None
|
continuous_cols
|
Optional[List[str]]
|
List with the name of the numeric (aka continuous) columns |
None
|
cont_norm_layer
|
Optional[Literal[batchnorm, layernorm]]
|
Type of normalization layer applied to the continuous features.
Options are: 'layernorm' and 'batchnorm'. if |
None
|
embed_continuous
|
Optional[bool]
|
Boolean indicating if the continuous columns will be embedded using
one of the available methods: 'standard', 'periodic'
or 'piecewise'. If |
None
|
embed_continuous_method
|
Optional[Literal[standard, piecewise, periodic]]
|
Method to use to embed the continuous features. Options are: 'standard', 'periodic' or 'piecewise'. The 'standard' embedding method is based on the FT-Transformer implementation presented in the paper: Revisiting Deep Learning Models for Tabular Data. The 'periodic' and_'piecewise'_ methods were presented in the paper: On Embeddings for Numerical Features in Tabular Deep Learning. Please, read the papers for details. |
None
|
cont_embed_dim
|
Optional[int]
|
Size of the continuous embeddings. If the continuous columns are
embedded, |
None
|
cont_embed_dropout
|
Optional[float]
|
Dropout for the continuous embeddings. If |
None
|
cont_embed_activation
|
Optional[str]
|
Activation function for the continuous embeddings if any. Currently
'tanh', 'relu', 'leaky_relu' and 'gelu' are supported.
If |
None
|
quantization_setup
|
Optional[Dict[str, List[float]]]
|
This parameter is used when the 'piecewise' method is used to embed the continuous cols. It is a dict where keys are the name of the continuous columns and values are lists with the boundaries for the quantization of the continuous_cols. See the examples for details. If If the 'piecewise' method is used, this parameter is required. |
None
|
n_frequencies
|
Optional[int]
|
This is the so called 'k' in their paper On Embeddings for Numerical Features in Tabular Deep Learning, and is the number of 'frequencies' that will be used to represent each continuous column. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required. |
None
|
sigma
|
Optional[float]
|
This is the sigma parameter in the paper mentioned when describing the previous parameters and it is used to initialise the 'frequency weights'. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required. |
None
|
share_last_layer
|
Optional[bool]
|
This parameter is not present in the before mentioned paper but it is implemented in
the official repo.
If |
None
|
full_embed_dropout
|
Optional[bool]
|
If |
None
|
mlp_hidden_dims
|
List[int]
|
List with the number of neurons per dense layer in the mlp. |
[200, 100]
|
mlp_activation
|
str
|
Activation function for the dense layers of the MLP. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported |
'relu'
|
mlp_dropout
|
Union[float, List[float]]
|
float or List of floats with the dropout between the dense layers. e.g: [0.5,0.5] |
0.1
|
mlp_batchnorm
|
bool
|
Boolean indicating whether or not batch normalization will be applied to the dense layers |
False
|
mlp_batchnorm_last
|
bool
|
Boolean indicating whether or not batch normalization will be applied to the last of the dense layers |
False
|
mlp_linear_first
|
bool
|
Boolean indicating the order of the operations in the dense
layer. If |
True
|
Attributes:
| Name | Type | Description |
|---|---|---|
encoder |
Module
|
mlp model that will receive the concatenation of the embeddings and the continuous columns |
Examples:
>>> import torch
>>> from pytorch_widedeep.models import TabMlp
>>> X_tab = torch.cat((torch.empty(5, 4).random_(4), torch.rand(5, 1)), axis=1)
>>> colnames = ["a", "b", "c", "d", "e"]
>>> cat_embed_input = [(u, i, j) for u, i, j in zip(colnames[:4], [4] * 4, [8] * 4)]
>>> column_idx = {k: v for v, k in enumerate(colnames)}
>>> model = TabMlp(mlp_hidden_dims=[8, 4], column_idx=column_idx, cat_embed_input=cat_embed_input,
... continuous_cols=["e"])
>>> out = model(X_tab)
Source code in pytorch_widedeep/models/tabular/mlp/tab_mlp.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 | |
output_dim
property
¶
output_dim
The output dimension of the model. This is a required property
neccesary to build the WideDeep class
TabMlpDecoder ¶
Bases: Module
Companion decoder model for the TabMlp model (which can be considered
an encoder itself).
This class is designed to be used with the EncoderDecoderTrainer when
using self-supervised pre-training (see the corresponding section in the
docs). The TabMlpDecoder will receive the output from the MLP
and 'reconstruct' the embeddings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
embed_dim
|
int
|
Size of the embeddings tensor that needs to be reconstructed. |
required |
mlp_hidden_dims
|
List[int]
|
List with the number of neurons per dense layer in the mlp. |
[100, 200]
|
mlp_activation
|
str
|
Activation function for the dense layers of the MLP. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported |
'relu'
|
mlp_dropout
|
Union[float, List[float]]
|
float or List of floats with the dropout between the dense layers. e.g: [0.5,0.5] |
0.1
|
mlp_batchnorm
|
bool
|
Boolean indicating whether or not batch normalization will be applied to the dense layers |
False
|
mlp_batchnorm_last
|
bool
|
Boolean indicating whether or not batch normalization will be applied to the last of the dense layers |
False
|
mlp_linear_first
|
bool
|
Boolean indicating the order of the operations in the dense
layer. If |
True
|
Attributes:
| Name | Type | Description |
|---|---|---|
decoder |
Module
|
mlp model that will receive the output of the encoder |
Examples:
>>> import torch
>>> from pytorch_widedeep.models import TabMlpDecoder
>>> x_inp = torch.rand(3, 8)
>>> decoder = TabMlpDecoder(embed_dim=32, mlp_hidden_dims=[8,16])
>>> res = decoder(x_inp)
>>> res.shape
torch.Size([3, 32])
Source code in pytorch_widedeep/models/tabular/mlp/tab_mlp.py
223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 | |
TabResnet ¶
Bases: BaseTabularModelWithoutAttention
Defines a TabResnet model that can be used as the deeptabular
component of a Wide & Deep model or independently by itself.
This class combines embedding representations of the categorical features
with numerical (aka continuous) features, embedded or not. These are then
passed through a series of Resnet blocks. See
pytorch_widedeep.models.tab_resnet._layers for details on the
structure of each block.
Most of the parameters for this class are Optional since the use of
categorical or continuous is in fact optional (i.e. one can use
categorical features only, continuous features only or both).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
column_idx
|
Dict[str, int]
|
Dict containing the index of the columns that will be passed through
the |
required |
cat_embed_input
|
Optional[List[Tuple[str, int, int]]]
|
List of Tuples with the column name, number of unique values and embedding dimension. e.g. [(education, 11, 32), ...] |
None
|
cat_embed_dropout
|
Optional[float]
|
Categorical embeddings dropout. If |
None
|
use_cat_bias
|
Optional[bool]
|
Boolean indicating if bias will be used for the categorical embeddings.
If |
None
|
cat_embed_activation
|
Optional[str]
|
Activation function for the categorical embeddings, if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported |
None
|
continuous_cols
|
Optional[List[str]]
|
List with the name of the numeric (aka continuous) columns |
None
|
cont_norm_layer
|
Optional[Literal[batchnorm, layernorm]]
|
Type of normalization layer applied to the continuous features.
Options are: 'layernorm' and 'batchnorm'. if |
None
|
embed_continuous
|
Optional[bool]
|
Boolean indicating if the continuous columns will be embedded using
one of the available methods: 'standard', 'periodic'
or 'piecewise'. If |
None
|
embed_continuous_method
|
Optional[Literal[standard, piecewise, periodic]]
|
Method to use to embed the continuous features. Options are: 'standard', 'periodic' or 'piecewise'. The 'standard' embedding method is based on the FT-Transformer implementation presented in the paper: Revisiting Deep Learning Models for Tabular Data. The 'periodic' and_'piecewise'_ methods were presented in the paper: On Embeddings for Numerical Features in Tabular Deep Learning. Please, read the papers for details. |
None
|
cont_embed_dim
|
Optional[int]
|
Size of the continuous embeddings. If the continuous columns are
embedded, |
None
|
cont_embed_dropout
|
Optional[float]
|
Dropout for the continuous embeddings. If |
None
|
cont_embed_activation
|
Optional[str]
|
Activation function for the continuous embeddings if any. Currently
'tanh', 'relu', 'leaky_relu' and 'gelu' are supported.
If |
None
|
quantization_setup
|
Optional[Dict[str, List[float]]]
|
This parameter is used when the 'piecewise' method is used to embed the continuous cols. It is a dict where keys are the name of the continuous columns and values are lists with the boundaries for the quantization of the continuous_cols. See the examples for details. If If the 'piecewise' method is used, this parameter is required. |
None
|
n_frequencies
|
Optional[int]
|
This is the so called 'k' in their paper On Embeddings for Numerical Features in Tabular Deep Learning, and is the number of 'frequencies' that will be used to represent each continuous column. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required. |
None
|
sigma
|
Optional[float]
|
This is the sigma parameter in the paper mentioned when describing the previous parameters and it is used to initialise the 'frequency weights'. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required. |
None
|
share_last_layer
|
Optional[bool]
|
This parameter is not present in the before mentioned paper but it is implemented in
the official repo.
If |
None
|
full_embed_dropout
|
Optional[bool]
|
If |
None
|
blocks_dims
|
List[int]
|
List of integers that define the input and output units of each block.
For example: [200, 100, 100] will generate 2 blocks. The first will
receive a tensor of size 200 and output a tensor of size 100, and the
second will receive a tensor of size 100 and output a tensor of size
100. See |
[200, 100, 100]
|
blocks_dropout
|
float
|
Block's internal dropout. |
0.1
|
simplify_blocks
|
bool
|
Boolean indicating if the simplest possible residual blocks ( |
False
|
mlp_hidden_dims
|
Optional[List[int]]
|
List with the number of neurons per dense layer in the MLP. e.g:
[64, 32]. If |
None
|
mlp_activation
|
Optional[str]
|
Activation function for the dense layers of the MLP. Currently
'tanh', 'relu', 'leaky'_relu' and _'gelu' are supported.
If 'mlp_hidden_dims' is not |
None
|
mlp_dropout
|
Optional[float]
|
float with the dropout between the dense layers of the MLP.
If 'mlp_hidden_dims' is not |
None
|
mlp_batchnorm
|
Optional[bool]
|
Boolean indicating whether or not batch normalization will be applied
to the dense layers
If 'mlp_hidden_dims' is not |
None
|
mlp_batchnorm_last
|
Optional[bool]
|
Boolean indicating whether or not batch normalization will be applied
to the last of the dense layers
If 'mlp_hidden_dims' is not |
None
|
mlp_linear_first
|
Optional[bool]
|
Boolean indicating the order of the operations in the dense
layer. If |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
encoder |
Module
|
deep dense Resnet model that will receive the concatenation of the embeddings and the continuous columns |
mlp |
Module
|
if |
Examples:
>>> import torch
>>> from pytorch_widedeep.models import TabResnet
>>> X_deep = torch.cat((torch.empty(5, 4).random_(4), torch.rand(5, 1)), axis=1)
>>> colnames = ['a', 'b', 'c', 'd', 'e']
>>> cat_embed_input = [(u,i,j) for u,i,j in zip(colnames[:4], [4]*4, [8]*4)]
>>> column_idx = {k:v for v,k in enumerate(colnames)}
>>> model = TabResnet(blocks_dims=[16,4], column_idx=column_idx, cat_embed_input=cat_embed_input,
... continuous_cols = ['e'])
>>> out = model(X_deep)
Source code in pytorch_widedeep/models/tabular/resnet/tab_resnet.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 | |
output_dim
property
¶
output_dim
The output dimension of the model. This is a required property
neccesary to build the WideDeep class
TabResnetDecoder ¶
Bases: Module
Companion decoder model for the TabResnet model (which can be
considered an encoder itself)
This class is designed to be used with the EncoderDecoderTrainer when
using self-supervised pre-training (see the corresponding section in the
docs). This class will receive the output from the ResNet blocks or the
MLP(if present) and 'reconstruct' the embeddings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
embed_dim
|
int
|
Size of the embeddings tensor to be reconstructed. |
required |
blocks_dims
|
List[int]
|
List of integers that define the input and output units of each block.
For example: [200, 100, 100] will generate 2 blocks. The first will
receive a tensor of size 200 and output a tensor of size 100, and the
second will receive a tensor of size 100 and output a tensor of size
100. See |
[100, 100, 200]
|
blocks_dropout
|
float
|
Block's internal dropout. |
0.1
|
simplify_blocks
|
bool
|
Boolean indicating if the simplest possible residual blocks ( |
False
|
mlp_hidden_dims
|
Optional[List[int]]
|
List with the number of neurons per dense layer in the MLP. e.g:
[64, 32]. If |
None
|
mlp_activation
|
Optional[str]
|
Activation function for the dense layers of the MLP. Currently
'tanh', 'relu', 'leaky'_relu' and _'gelu' are supported.
If 'mlp_hidden_dims' is not |
None
|
mlp_dropout
|
Optional[float]
|
float with the dropout between the dense layers of the MLP.
If 'mlp_hidden_dims' is not |
None
|
mlp_batchnorm
|
Optional[bool]
|
Boolean indicating whether or not batch normalization will be applied
to the dense layers
If 'mlp_hidden_dims' is not |
None
|
mlp_batchnorm_last
|
Optional[bool]
|
Boolean indicating whether or not batch normalization will be applied
to the last of the dense layers
If 'mlp_hidden_dims' is not |
None
|
mlp_linear_first
|
Optional[bool]
|
Boolean indicating the order of the operations in the dense
layer. If |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
decoder |
Module
|
deep dense Resnet model that will receive the output of the encoder IF
|
mlp |
Module
|
if |
Examples:
>>> import torch
>>> from pytorch_widedeep.models import TabResnetDecoder
>>> x_inp = torch.rand(3, 8)
>>> decoder = TabResnetDecoder(embed_dim=32, blocks_dims=[8, 16, 16])
>>> res = decoder(x_inp)
>>> res.shape
torch.Size([3, 32])
Source code in pytorch_widedeep/models/tabular/resnet/tab_resnet.py
284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 | |
TabNet ¶
Bases: BaseTabularModelWithoutAttention
Defines a TabNet model that
can be used as the deeptabular component of a Wide & Deep model or
independently by itself.
The implementation in this library is fully based on that
here by the dreamquark-ai team,
simply adapted so that it can work within the WideDeep frame.
Therefore, ALL CREDIT TO THE DREAMQUARK-AI TEAM.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
column_idx
|
Dict[str, int]
|
Dict containing the index of the columns that will be passed through
the |
required |
cat_embed_input
|
Optional[List[Tuple[str, int, int]]]
|
List of Tuples with the column name, number of unique values and embedding dimension. e.g. [(education, 11, 32), ...] |
None
|
cat_embed_dropout
|
Optional[float]
|
Categorical embeddings dropout. If |
None
|
use_cat_bias
|
Optional[bool]
|
Boolean indicating if bias will be used for the categorical embeddings.
If |
None
|
cat_embed_activation
|
Optional[str]
|
Activation function for the categorical embeddings, if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported |
None
|
continuous_cols
|
Optional[List[str]]
|
List with the name of the numeric (aka continuous) columns |
None
|
cont_norm_layer
|
Optional[Literal[batchnorm, layernorm]]
|
Type of normalization layer applied to the continuous features.
Options are: 'layernorm' and 'batchnorm'. if |
None
|
embed_continuous
|
Optional[bool]
|
Boolean indicating if the continuous columns will be embedded using
one of the available methods: 'standard', 'periodic'
or 'piecewise'. If |
None
|
embed_continuous_method
|
Optional[Literal[standard, piecewise, periodic]]
|
Method to use to embed the continuous features. Options are: 'standard', 'periodic' or 'piecewise'. The 'standard' embedding method is based on the FT-Transformer implementation presented in the paper: Revisiting Deep Learning Models for Tabular Data. The 'periodic' and_'piecewise'_ methods were presented in the paper: On Embeddings for Numerical Features in Tabular Deep Learning. Please, read the papers for details. |
None
|
cont_embed_dim
|
Optional[int]
|
Size of the continuous embeddings. If the continuous columns are
embedded, |
None
|
cont_embed_dropout
|
Optional[float]
|
Dropout for the continuous embeddings. If |
None
|
cont_embed_activation
|
Optional[str]
|
Activation function for the continuous embeddings if any. Currently
'tanh', 'relu', 'leaky_relu' and 'gelu' are supported.
If |
None
|
quantization_setup
|
Optional[Dict[str, List[float]]]
|
This parameter is used when the 'piecewise' method is used to embed the continuous cols. It is a dict where keys are the name of the continuous columns and values are lists with the boundaries for the quantization of the continuous_cols. See the examples for details. If If the 'piecewise' method is used, this parameter is required. |
None
|
n_frequencies
|
Optional[int]
|
This is the so called 'k' in their paper On Embeddings for Numerical Features in Tabular Deep Learning, and is the number of 'frequencies' that will be used to represent each continuous column. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required. |
None
|
sigma
|
Optional[float]
|
This is the sigma parameter in the paper mentioned when describing the previous parameters and it is used to initialise the 'frequency weights'. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required. |
None
|
share_last_layer
|
Optional[bool]
|
This parameter is not present in the before mentioned paper but it is implemented in
the official repo.
If |
None
|
full_embed_dropout
|
Optional[bool]
|
If |
None
|
n_steps
|
int
|
number of decision steps. For a better understanding of the function
of |
3
|
step_dim
|
int
|
Step's output dimension. This is the output dimension that
|
8
|
attn_dim
|
int
|
Attention dimension |
8
|
dropout
|
float
|
GLU block's internal dropout |
0.0
|
n_glu_step_dependent
|
int
|
number of GLU Blocks ( |
2
|
n_glu_shared
|
int
|
number of GLU Blocks ( |
2
|
ghost_bn
|
bool
|
Boolean indicating if Ghost Batch Normalization will be used. |
True
|
virtual_batch_size
|
int
|
Batch size when using Ghost Batch Normalization |
128
|
momentum
|
float
|
Ghost Batch Normalization's momentum. The dreamquark-ai advises for very low values. However high values are used in the original publication. During our tests higher values lead to better results |
0.02
|
gamma
|
float
|
Relaxation parameter in the paper. When gamma = 1, a feature is enforced to be used only at one decision step. As gamma increases, more flexibility is provided to use a feature at multiple decision steps |
1.3
|
epsilon
|
float
|
Float to avoid log(0). Always keep low |
1e-15
|
mask_type
|
str
|
Mask function to use. Either 'sparsemax' or 'entmax' |
'sparsemax'
|
Attributes:
| Name | Type | Description |
|---|---|---|
encoder |
Module
|
the TabNet encoder. For details see the original publication. |
Examples:
>>> X_tab = torch.cat((torch.empty(5, 4).random_(4), torch.rand(5, 1)), axis=1)
>>> colnames = ["a", "b", "c", "d", "e"]
>>> cat_embed_input = [(u, i, j) for u, i, j in zip(colnames[:4], [4] * 4, [8] * 4)]
>>> column_idx = {k: v for v, k in enumerate(colnames)}
>>> model = TabNet(column_idx=column_idx, cat_embed_input=cat_embed_input, continuous_cols=["e"])
>>> out = model(X_tab)
Source code in pytorch_widedeep/models/tabular/tabnet/tab_net.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 | |
output_dim
property
¶
output_dim
The output dimension of the model. This is a required property
neccesary to build the WideDeep class
TabNetDecoder ¶
Bases: Module
Companion decoder model for the TabNet model (which can be
considered an encoder itself)
This class is designed to be used with the EncoderDecoderTrainer when
using self-supervised pre-training (see the corresponding section in the
docs). This class will receive the output from the TabNet encoder
(i.e. the output from the so called 'steps') and 'reconstruct' the
embeddings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
embed_dim
|
int
|
Size of the embeddings tensor to be reconstructed. |
required |
n_steps
|
int
|
number of decision steps. For a better understanding of the function
of |
3
|
step_dim
|
int
|
Step's output dimension. This is the output dimension that
|
8
|
dropout
|
float
|
GLU block's internal dropout |
0.0
|
n_glu_step_dependent
|
int
|
number of GLU Blocks ( |
2
|
n_glu_shared
|
int
|
number of GLU Blocks ( |
2
|
ghost_bn
|
bool
|
Boolean indicating if Ghost Batch Normalization will be used. |
True
|
virtual_batch_size
|
int
|
Batch size when using Ghost Batch Normalization |
128
|
momentum
|
float
|
Ghost Batch Normalization's momentum. The dreamquark-ai advises for very low values. However high values are used in the original publication. During our tests higher values lead to better results |
0.02
|
Attributes:
| Name | Type | Description |
|---|---|---|
decoder |
Module
|
decoder that will receive the output from the encoder's steps and will reconstruct the embeddings |
Examples:
>>> import torch
>>> from pytorch_widedeep.models import TabNetDecoder
>>> x_inp = [torch.rand(3, 8), torch.rand(3, 8), torch.rand(3, 8)]
>>> decoder = TabNetDecoder(embed_dim=32, ghost_bn=False)
>>> res = decoder(x_inp)
>>> res.shape
torch.Size([3, 32])
Source code in pytorch_widedeep/models/tabular/tabnet/tab_net.py
280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 | |
ContextAttentionMLP ¶
Bases: BaseTabularModelWithAttention
Defines a ContextAttentionMLP model that can be used as the
deeptabular component of a Wide & Deep model or independently by
itself.
This class combines embedding representations of the categorical features
with numerical (aka continuous) features that are also embedded. These
are then passed through a series of attention blocks. Each attention
block is comprised by a ContextAttentionEncoder. Such encoder is in
part inspired by the attention mechanism described in
Hierarchical Attention Networks for Document
Classification.
See pytorch_widedeep.models.tabular.mlp._attention_layers for details.
Most of the parameters for this class are Optional since the use of
categorical or continuous is in fact optional (i.e. one can use
categorical features only, continuous features only or both).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
column_idx
|
Dict[str, int]
|
Dict containing the index of the columns that will be passed through
the |
required |
cat_embed_input
|
Optional[List[Tuple[str, int]]]
|
List of Tuples with the column name and number of unique values and embedding dimension. e.g. [(education, 11), ...] |
None
|
cat_embed_dropout
|
Optional[float]
|
Categorical embeddings dropout. If |
None
|
use_cat_bias
|
Optional[bool]
|
Boolean indicating if bias will be used for the categorical embeddings.
If |
None
|
cat_embed_activation
|
Optional[str]
|
Activation function for the categorical embeddings, if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported |
None
|
shared_embed
|
Optional[bool]
|
Boolean indicating if the embeddings will be "shared". The idea behind |
None
|
add_shared_embed
|
Optional[bool]
|
The two embedding sharing strategies are: 1) add the shared embeddings
to the column embeddings or 2) to replace the first
|
None
|
frac_shared_embed
|
Optional[float]
|
The fraction of embeddings that will be shared (if |
None
|
continuous_cols
|
Optional[List[str]]
|
List with the name of the numeric (aka continuous) columns |
None
|
cont_norm_layer
|
Optional[Literal[batchnorm, layernorm]]
|
Type of normalization layer applied to the continuous features.
Options are: 'layernorm' and 'batchnorm'. if |
None
|
embed_continuous_method
|
Optional[Literal[standard, piecewise, periodic]]
|
Method to use to embed the continuous features. Options are: 'standard', 'periodic' or 'piecewise'. The 'standard' embedding method is based on the FT-Transformer implementation presented in the paper: Revisiting Deep Learning Models for Tabular Data. The 'periodic' and_'piecewise'_ methods were presented in the paper: On Embeddings for Numerical Features in Tabular Deep Learning. Please, read the papers for details. |
'standard'
|
cont_embed_dropout
|
Optional[float]
|
Dropout for the continuous embeddings. If |
None
|
cont_embed_activation
|
Optional[str]
|
Activation function for the continuous embeddings if any. Currently
'tanh', 'relu', 'leaky_relu' and 'gelu' are supported.
If |
None
|
quantization_setup
|
Optional[Dict[str, List[float]]]
|
This parameter is used when the 'piecewise' method is used to embed the continuous cols. It is a dict where keys are the name of the continuous columns and values are lists with the boundaries for the quantization of the continuous_cols. See the examples for details. If If the 'piecewise' method is used, this parameter is required. |
None
|
n_frequencies
|
Optional[int]
|
This is the so called 'k' in their paper On Embeddings for Numerical Features in Tabular Deep Learning, and is the number of 'frequencies' that will be used to represent each continuous column. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required. |
None
|
sigma
|
Optional[float]
|
This is the sigma parameter in the paper mentioned when describing the previous parameters and it is used to initialise the 'frequency weights'. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required. |
None
|
share_last_layer
|
Optional[bool]
|
This parameter is not present in the before mentioned paper but it is implemented in
the official repo.
If |
None
|
full_embed_dropout
|
Optional[bool]
|
If |
None
|
input_dim
|
int
|
The so-called dimension of the model. Is the number of embeddings used to encode the categorical and/or continuous columns |
32
|
attn_dropout
|
float
|
Dropout for each attention block |
0.2
|
with_addnorm
|
bool
|
Boolean indicating if residual connections will be used in the attention blocks |
False
|
attn_activation
|
str
|
String indicating the activation function to be applied to the dense layer in each attention encoder. 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported. |
'leaky_relu'
|
n_blocks
|
int
|
Number of attention blocks |
3
|
Attributes:
| Name | Type | Description |
|---|---|---|
encoder |
Module
|
Sequence of attention encoders. |
Examples:
>>> import torch
>>> from pytorch_widedeep.models import ContextAttentionMLP
>>> X_tab = torch.cat((torch.empty(5, 4).random_(4), torch.rand(5, 1)), axis=1)
>>> colnames = ['a', 'b', 'c', 'd', 'e']
>>> cat_embed_input = [(u,i,j) for u,i,j in zip(colnames[:4], [4]*4, [8]*4)]
>>> column_idx = {k:v for v,k in enumerate(colnames)}
>>> model = ContextAttentionMLP(column_idx=column_idx, cat_embed_input=cat_embed_input, continuous_cols = ['e'])
>>> out = model(X_tab)
Source code in pytorch_widedeep/models/tabular/mlp/context_attention_mlp.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 | |
output_dim
property
¶
output_dim
The output dimension of the model. This is a required property
neccesary to build the WideDeep class
attention_weights
property
¶
attention_weights
List with the attention weights per block
The shape of the attention weights is \((N, F)\), where \(N\) is the batch size and \(F\) is the number of features/columns in the dataset
SelfAttentionMLP ¶
Bases: BaseTabularModelWithAttention
Defines a SelfAttentionMLP model that can be used as the
deeptabular component of a Wide & Deep model or independently by
itself.
This class combines embedding representations of the categorical features
with numerical (aka continuous) features that are also embedded. These
are then passed through a series of attention blocks. Each attention
block is comprised by what we would refer as a simplified
SelfAttentionEncoder. See
pytorch_widedeep.models.tabular.mlp._attention_layers for details. The
reason to use a simplified version of self attention is because we
observed that the 'standard' attention mechanism used in the
TabTransformer has a notable tendency to overfit.
In more detail, this model only uses Q and K (and not V). If we think about it as in terms of text (and intuitively), the Softmax(QK^T) is the attention mechanism that tells us how much, at each position in the input sentence, each word is represented or 'expressed'. We refer to that as 'attention weights'. These attention weighst are normally multiplied by a Value matrix to further strength the focus on the words that each word should be attending to (again, intuitively).
In this implementation we skip this last multiplication and instead we multiply the attention weights directly by the input tensor. This is a simplification that we expect is beneficial in terms of avoiding overfitting for tabular data.
Most of the parameters for this class are Optional since the use of
categorical or continuous is in fact optional (i.e. one can use
categorical features only, continuous features only or both).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
column_idx
|
Dict[str, int]
|
Dict containing the index of the columns that will be passed through
the |
required |
cat_embed_input
|
Optional[List[Tuple[str, int]]]
|
List of Tuples with the column name and number of unique values and embedding dimension. e.g. [(education, 11), ...] |
None
|
cat_embed_dropout
|
Optional[float]
|
Categorical embeddings dropout. If |
None
|
use_cat_bias
|
Optional[bool]
|
Boolean indicating if bias will be used for the categorical embeddings.
If |
None
|
cat_embed_activation
|
Optional[str]
|
Activation function for the categorical embeddings, if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported |
None
|
shared_embed
|
Optional[bool]
|
Boolean indicating if the embeddings will be "shared". The idea behind |
None
|
add_shared_embed
|
Optional[bool]
|
The two embedding sharing strategies are: 1) add the shared embeddings
to the column embeddings or 2) to replace the first
|
None
|
frac_shared_embed
|
Optional[float]
|
The fraction of embeddings that will be shared (if |
None
|
continuous_cols
|
Optional[List[str]]
|
List with the name of the numeric (aka continuous) columns |
None
|
cont_norm_layer
|
Optional[Literal[batchnorm, layernorm]]
|
Type of normalization layer applied to the continuous features.
Options are: 'layernorm' and 'batchnorm'. if |
None
|
embed_continuous_method
|
Optional[Literal[standard, piecewise, periodic]]
|
Method to use to embed the continuous features. Options are: 'standard', 'periodic' or 'piecewise'. The 'standard' embedding method is based on the FT-Transformer implementation presented in the paper: Revisiting Deep Learning Models for Tabular Data. The 'periodic' and_'piecewise'_ methods were presented in the paper: On Embeddings for Numerical Features in Tabular Deep Learning. Please, read the papers for details. |
'standard'
|
cont_embed_dropout
|
Optional[float]
|
Dropout for the continuous embeddings. If |
None
|
cont_embed_activation
|
Optional[str]
|
Activation function for the continuous embeddings if any. Currently
'tanh', 'relu', 'leaky_relu' and 'gelu' are supported.
If |
None
|
quantization_setup
|
Optional[Dict[str, List[float]]]
|
This parameter is used when the 'piecewise' method is used to embed the continuous cols. It is a dict where keys are the name of the continuous columns and values are lists with the boundaries for the quantization of the continuous_cols. See the examples for details. If If the 'piecewise' method is used, this parameter is required. |
None
|
n_frequencies
|
Optional[int]
|
This is the so called 'k' in their paper On Embeddings for Numerical Features in Tabular Deep Learning, and is the number of 'frequencies' that will be used to represent each continuous column. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required. |
None
|
sigma
|
Optional[float]
|
This is the sigma parameter in the paper mentioned when describing the previous parameters and it is used to initialise the 'frequency weights'. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required. |
None
|
share_last_layer
|
Optional[bool]
|
This parameter is not present in the before mentioned paper but it is implemented in
the official repo.
If |
None
|
full_embed_dropout
|
Optional[bool]
|
If |
None
|
input_dim
|
int
|
The so-called dimension of the model. Is the number of embeddings used to encode the categorical and/or continuous columns |
32
|
attn_dropout
|
float
|
Dropout for each attention block |
0.2
|
n_heads
|
int
|
Number of attention heads per attention block. |
8
|
use_bias
|
bool
|
Boolean indicating whether or not to use bias in the Q, K projection layers. |
False
|
with_addnorm
|
bool
|
Boolean indicating if residual connections will be used in the attention blocks |
False
|
attn_activation
|
str
|
String indicating the activation function to be applied to the dense layer in each attention encoder. 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported. |
'leaky_relu'
|
n_blocks
|
int
|
Number of attention blocks |
3
|
Attributes:
| Name | Type | Description |
|---|---|---|
cat_and_cont_embed |
Module
|
This is the module that processes the categorical and continuous columns |
encoder |
Module
|
Sequence of attention encoders. |
Examples:
>>> import torch
>>> from pytorch_widedeep.models import SelfAttentionMLP
>>> X_tab = torch.cat((torch.empty(5, 4).random_(4), torch.rand(5, 1)), axis=1)
>>> colnames = ['a', 'b', 'c', 'd', 'e']
>>> cat_embed_input = [(u,i,j) for u,i,j in zip(colnames[:4], [4]*4, [8]*4)]
>>> column_idx = {k:v for v,k in enumerate(colnames)}
>>> model = SelfAttentionMLP(column_idx=column_idx, cat_embed_input=cat_embed_input, continuous_cols = ['e'])
>>> out = model(X_tab)
Source code in pytorch_widedeep/models/tabular/mlp/self_attention_mlp.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 | |
output_dim
property
¶
output_dim
The output dimension of the model. This is a required property neccesary to build the WideDeep class
attention_weights
property
¶
attention_weights
List with the attention weights per block
The shape of the attention weights is \((N, H, F, F)\), where \(N\) is the batch size, \(H\) is the number of attention heads and \(F\) is the number of features/columns in the dataset
TabTransformer ¶
Bases: BaseTabularModelWithAttention
Defines our adptation of the
TabTransformer model
that can be used as the deeptabular component of a
Wide & Deep model or independently by itself.
Most of the parameters for this class are Optional since the use of
categorical or continuous is in fact optional (i.e. one can use
categorical features only, continuous features only or both).
NOTE:
This is an enhanced adaptation of the model described in the paper. It can
be considered as the flagship of our transformer family of models for
tabular data and offers mutiple, additional features relative to the
original publication(and some other models in the library)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
column_idx
|
Dict[str, int]
|
Dict containing the index of the columns that will be passed through
the |
required |
cat_embed_input
|
Optional[List[Tuple[str, int]]]
|
List of Tuples with the column name and number of unique values and embedding dimension. e.g. [(education, 11), ...] |
None
|
cat_embed_dropout
|
Optional[float]
|
Categorical embeddings dropout. If |
None
|
use_cat_bias
|
Optional[bool]
|
Boolean indicating if bias will be used for the categorical embeddings.
If |
None
|
cat_embed_activation
|
Optional[str]
|
Activation function for the categorical embeddings, if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported |
None
|
shared_embed
|
Optional[bool]
|
Boolean indicating if the embeddings will be "shared". The idea behind |
None
|
add_shared_embed
|
Optional[bool]
|
The two embedding sharing strategies are: 1) add the shared embeddings
to the column embeddings or 2) to replace the first
|
None
|
frac_shared_embed
|
Optional[float]
|
The fraction of embeddings that will be shared (if |
None
|
continuous_cols
|
Optional[List[str]]
|
List with the name of the numeric (aka continuous) columns |
None
|
cont_norm_layer
|
Optional[Literal[batchnorm, layernorm]]
|
Type of normalization layer applied to the continuous features.
Options are: 'layernorm' and 'batchnorm'. if |
None
|
embed_continuous_method
|
Optional[Literal[standard, piecewise, periodic]]
|
Method to use to embed the continuous features. Options are: 'standard', 'periodic' or 'piecewise'. The 'standard' embedding method is based on the FT-Transformer implementation presented in the paper: Revisiting Deep Learning Models for Tabular Data. The 'periodic' and_'piecewise'_ methods were presented in the paper: On Embeddings for Numerical Features in Tabular Deep Learning. Please, read the papers for details. |
None
|
cont_embed_dropout
|
Optional[float]
|
Dropout for the continuous embeddings. If |
None
|
cont_embed_activation
|
Optional[str]
|
Activation function for the continuous embeddings if any. Currently
'tanh', 'relu', 'leaky_relu' and 'gelu' are supported.
If |
None
|
quantization_setup
|
Optional[Dict[str, List[float]]]
|
This parameter is used when the 'piecewise' method is used to embed the continuous cols. It is a dict where keys are the name of the continuous columns and values are lists with the boundaries for the quantization of the continuous_cols. See the examples for details. If If the 'piecewise' method is used, this parameter is required. |
None
|
n_frequencies
|
Optional[int]
|
This is the so called 'k' in their paper On Embeddings for Numerical Features in Tabular Deep Learning, and is the number of 'frequencies' that will be used to represent each continuous column. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required. |
None
|
sigma
|
Optional[float]
|
This is the sigma parameter in the paper mentioned when describing the previous parameters and it is used to initialise the 'frequency weights'. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required. |
None
|
share_last_layer
|
Optional[bool]
|
This parameter is not present in the before mentioned paper but it is implemented in
the official repo.
If |
None
|
full_embed_dropout
|
Optional[bool]
|
If |
None
|
input_dim
|
int
|
The so-called dimension of the model. Is the number of embeddings used to encode the categorical and/or continuous columns |
32
|
n_heads
|
int
|
Number of attention heads per Transformer block |
8
|
use_qkv_bias
|
bool
|
Boolean indicating whether or not to use bias in the Q, K, and V projection layers. |
False
|
n_blocks
|
int
|
Number of Transformer blocks |
4
|
attn_dropout
|
float
|
Dropout that will be applied to the Multi-Head Attention layers |
0.2
|
ff_dropout
|
float
|
Dropout that will be applied to the FeedForward network |
0.1
|
ff_factor
|
int
|
Multiplicative factor applied to the first layer of the FF network in each Transformer block, This is normally set to 4. |
4
|
transformer_activation
|
str
|
Transformer Encoder activation function. 'tanh', 'relu', 'leaky_relu', 'gelu', 'geglu' and 'reglu' are supported |
'gelu'
|
use_linear_attention
|
bool
|
Boolean indicating if Linear Attention (from Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention) will be used. The inclusing of this mode of attention is inspired by this post, where the Uber team finds that this attention mechanism leads to the best results for their tabular data. |
False
|
use_flash_attention
|
bool
|
Boolean indicating if
Flash Attention
will be used. |
False
|
mlp_hidden_dims
|
Optional[List[int]]
|
List with the number of neurons per dense layer in the MLP. e.g: [64, 32]. If not provided no MLP on top of the final Transformer block will be used. |
None
|
mlp_activation
|
str
|
Activation function for the dense layers of the MLP. Currently
'tanh', 'relu', 'leaky'_relu' and _'gelu' are supported.
If 'mlp_hidden_dims' is not |
'relu'
|
mlp_dropout
|
float
|
float with the dropout between the dense layers of the MLP.
If 'mlp_hidden_dims' is not |
0.1
|
mlp_batchnorm
|
bool
|
Boolean indicating whether or not batch normalization will be applied
to the dense layers
If 'mlp_hidden_dims' is not |
False
|
mlp_batchnorm_last
|
bool
|
Boolean indicating whether or not batch normalization will be applied
to the last of the dense layers
If 'mlp_hidden_dims' is not |
False
|
mlp_linear_first
|
bool
|
Boolean indicating the order of the operations in the dense
layer. If |
True
|
Attributes:
| Name | Type | Description |
|---|---|---|
encoder |
Module
|
Sequence of Transformer blocks |
mlp |
Module
|
MLP component in the model |
Examples:
>>> import torch
>>> from pytorch_widedeep.models import TabTransformer
>>> X_tab = torch.cat((torch.empty(5, 4).random_(4), torch.rand(5, 1)), axis=1)
>>> colnames = ['a', 'b', 'c', 'd', 'e']
>>> cat_embed_input = [(u,i) for u,i in zip(colnames[:4], [4]*4)]
>>> continuous_cols = ['e']
>>> column_idx = {k:v for v,k in enumerate(colnames)}
>>> model = TabTransformer(column_idx=column_idx, cat_embed_input=cat_embed_input, continuous_cols=continuous_cols)
>>> out = model(X_tab)
Source code in pytorch_widedeep/models/tabular/transformers/tab_transformer.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 | |
output_dim
property
¶
output_dim
The output dimension of the model. This is a required property
neccesary to build the WideDeep class
attention_weights
property
¶
attention_weights
List with the attention weights per block
The shape of the attention weights is \((N, H, F, F)\), where \(N\) is the batch size, \(H\) is the number of attention heads and \(F\) is the number of features/columns in the dataset
NOTE:
if flash attention or linear attention
are used, no attention weights are saved during the training process
and calling this property will throw a ValueError
SAINT ¶
Bases: BaseTabularModelWithAttention
Defines a SAINT model that
can be used as the deeptabular component of a Wide & Deep model or
independently by itself.
Most of the parameters for this class are Optional since the use of
categorical or continuous is in fact optional (i.e. one can use
categorical features only, continuous features only or both).
NOTE: This is an slightly modified and enhanced
version of the model described in the paper,
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
column_idx
|
Dict[str, int]
|
Dict containing the index of the columns that will be passed through
the |
required |
cat_embed_input
|
Optional[List[Tuple[str, int]]]
|
List of Tuples with the column name and number of unique values and embedding dimension. e.g. [(education, 11), ...] |
None
|
cat_embed_dropout
|
Optional[float]
|
Categorical embeddings dropout. If |
None
|
use_cat_bias
|
Optional[bool]
|
Boolean indicating if bias will be used for the categorical embeddings.
If |
None
|
cat_embed_activation
|
Optional[str]
|
Activation function for the categorical embeddings, if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported |
None
|
shared_embed
|
Optional[bool]
|
Boolean indicating if the embeddings will be "shared". The idea behind |
None
|
add_shared_embed
|
Optional[bool]
|
The two embedding sharing strategies are: 1) add the shared embeddings
to the column embeddings or 2) to replace the first
|
None
|
frac_shared_embed
|
Optional[float]
|
The fraction of embeddings that will be shared (if |
None
|
continuous_cols
|
Optional[List[str]]
|
List with the name of the numeric (aka continuous) columns |
None
|
cont_norm_layer
|
Optional[Literal[batchnorm, layernorm]]
|
Type of normalization layer applied to the continuous features.
Options are: 'layernorm' and 'batchnorm'. if |
None
|
embed_continuous_method
|
Optional[Literal[standard, piecewise, periodic]]
|
Method to use to embed the continuous features. Options are: 'standard', 'periodic' or 'piecewise'. The 'standard' embedding method is based on the FT-Transformer implementation presented in the paper: Revisiting Deep Learning Models for Tabular Data. The 'periodic' and_'piecewise'_ methods were presented in the paper: On Embeddings for Numerical Features in Tabular Deep Learning. Please, read the papers for details. |
'standard'
|
cont_embed_dropout
|
Optional[float]
|
Dropout for the continuous embeddings. If |
None
|
cont_embed_activation
|
Optional[str]
|
Activation function for the continuous embeddings if any. Currently
'tanh', 'relu', 'leaky_relu' and 'gelu' are supported.
If |
None
|
quantization_setup
|
Optional[Dict[str, List[float]]]
|
This parameter is used when the 'piecewise' method is used to embed the continuous cols. It is a dict where keys are the name of the continuous columns and values are lists with the boundaries for the quantization of the continuous_cols. See the examples for details. If If the 'piecewise' method is used, this parameter is required. |
None
|
n_frequencies
|
Optional[int]
|
This is the so called 'k' in their paper On Embeddings for Numerical Features in Tabular Deep Learning, and is the number of 'frequencies' that will be used to represent each continuous column. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required. |
None
|
sigma
|
Optional[float]
|
This is the sigma parameter in the paper mentioned when describing the previous parameters and it is used to initialise the 'frequency weights'. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required. |
None
|
share_last_layer
|
Optional[bool]
|
This parameter is not present in the before mentioned paper but it is implemented in
the official repo.
If |
None
|
full_embed_dropout
|
Optional[bool]
|
If |
None
|
input_dim
|
int
|
The so-called dimension of the model. Is the number of embeddings used to encode the categorical and/or continuous columns |
32
|
n_heads
|
int
|
Number of attention heads per Transformer block |
8
|
use_qkv_bias
|
bool
|
Boolean indicating whether or not to use bias in the Q, K, and V projection layers |
False
|
n_blocks
|
int
|
Number of SAINT-Transformer blocks. |
2
|
attn_dropout
|
float
|
Dropout that will be applied to the Multi-Head Attention column and row layers |
0.1
|
ff_dropout
|
float
|
Dropout that will be applied to the FeedForward network |
0.2
|
ff_factor
|
int
|
Multiplicative factor applied to the first layer of the FF network in each Transformer block, This is normally set to 4. |
4
|
transformer_activation
|
str
|
Transformer Encoder activation function. 'tanh', 'relu', 'leaky_relu', 'gelu', 'geglu' and 'reglu' are supported |
'gelu'
|
mlp_hidden_dims
|
Optional[List[int]]
|
List with the number of neurons per dense layer in the MLP. e.g: [64, 32]. If not provided no MLP on top of the final Transformer block will be used. |
None
|
mlp_activation
|
Optional[str]
|
Activation function for the dense layers of the MLP. Currently
'tanh', 'relu', 'leaky'_relu' and _'gelu' are supported.
If 'mlp_hidden_dims' is not |
None
|
mlp_dropout
|
Optional[float]
|
float with the dropout between the dense layers of the MLP.
If 'mlp_hidden_dims' is not |
None
|
mlp_batchnorm
|
Optional[bool]
|
Boolean indicating whether or not batch normalization will be applied
to the dense layers
If 'mlp_hidden_dims' is not |
None
|
mlp_batchnorm_last
|
Optional[bool]
|
Boolean indicating whether or not batch normalization will be applied
to the last of the dense layers
If 'mlp_hidden_dims' is not |
None
|
mlp_linear_first
|
Optional[bool]
|
Boolean indicating the order of the operations in the dense
layer. If |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
encoder |
Module
|
Sequence of SAINT-Transformer blocks |
mlp |
Module
|
MLP component in the model |
Examples:
>>> import torch
>>> from pytorch_widedeep.models import SAINT
>>> X_tab = torch.cat((torch.empty(5, 4).random_(4), torch.rand(5, 1)), axis=1)
>>> colnames = ['a', 'b', 'c', 'd', 'e']
>>> cat_embed_input = [(u,i) for u,i in zip(colnames[:4], [4]*4)]
>>> continuous_cols = ['e']
>>> column_idx = {k:v for v,k in enumerate(colnames)}
>>> model = SAINT(column_idx=column_idx, cat_embed_input=cat_embed_input, continuous_cols=continuous_cols)
>>> out = model(X_tab)
Source code in pytorch_widedeep/models/tabular/transformers/saint.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 | |
output_dim
property
¶
output_dim
The output dimension of the model. This is a required property
neccesary to build the WideDeep class
attention_weights
property
¶
attention_weights
List with the attention weights. Each element of the list is a tuple where the first and the second elements are the column and row attention weights respectively
The shape of the attention weights is:
-
column attention: \((N, H, F, F)\)
-
row attention: \((1, H, N, N)\)
where \(N\) is the batch size, \(H\) is the number of heads and \(F\) is the number of features/columns in the dataset
FTTransformer ¶
Bases: BaseTabularModelWithAttention
Defines a FTTransformer model that
can be used as the deeptabular component of a Wide & Deep model or
independently by itself.
Most of the parameters for this class are Optional since the use of
categorical or continuous is in fact optional (i.e. one can use
categorical features only, continuous features only or both).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
column_idx
|
Dict[str, int]
|
Dict containing the index of the columns that will be passed through
the |
required |
cat_embed_input
|
Optional[List[Tuple[str, int]]]
|
List of Tuples with the column name and number of unique values and embedding dimension. e.g. [(education, 11), ...] |
None
|
cat_embed_dropout
|
Optional[float]
|
Categorical embeddings dropout. If |
None
|
use_cat_bias
|
Optional[bool]
|
Boolean indicating if bias will be used for the categorical embeddings.
If |
None
|
cat_embed_activation
|
Optional[str]
|
Activation function for the categorical embeddings, if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported |
None
|
shared_embed
|
Optional[bool]
|
Boolean indicating if the embeddings will be "shared". The idea behind |
None
|
add_shared_embed
|
Optional[bool]
|
The two embedding sharing strategies are: 1) add the shared embeddings
to the column embeddings or 2) to replace the first
|
None
|
frac_shared_embed
|
Optional[float]
|
The fraction of embeddings that will be shared (if |
None
|
continuous_cols
|
Optional[List[str]]
|
List with the name of the numeric (aka continuous) columns |
None
|
cont_norm_layer
|
Optional[Literal[batchnorm, layernorm]]
|
Type of normalization layer applied to the continuous features.
Options are: 'layernorm' and 'batchnorm'. if |
None
|
embed_continuous_method
|
Optional[Literal[standard, piecewise, periodic]]
|
Method to use to embed the continuous features. Options are: 'standard', 'periodic' or 'piecewise'. The 'standard' embedding method is based on the FT-Transformer implementation presented in the paper: Revisiting Deep Learning Models for Tabular Data. The 'periodic' and_'piecewise'_ methods were presented in the paper: On Embeddings for Numerical Features in Tabular Deep Learning. Please, read the papers for details. |
'standard'
|
cont_embed_dropout
|
Optional[float]
|
Dropout for the continuous embeddings. If |
None
|
cont_embed_activation
|
Optional[str]
|
Activation function for the continuous embeddings if any. Currently
'tanh', 'relu', 'leaky_relu' and 'gelu' are supported.
If |
None
|
quantization_setup
|
Optional[Dict[str, List[float]]]
|
This parameter is used when the 'piecewise' method is used to embed the continuous cols. It is a dict where keys are the name of the continuous columns and values are lists with the boundaries for the quantization of the continuous_cols. See the examples for details. If If the 'piecewise' method is used, this parameter is required. |
None
|
n_frequencies
|
Optional[int]
|
This is the so called 'k' in their paper On Embeddings for Numerical Features in Tabular Deep Learning, and is the number of 'frequencies' that will be used to represent each continuous column. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required. |
None
|
sigma
|
Optional[float]
|
This is the sigma parameter in the paper mentioned when describing the previous parameters and it is used to initialise the 'frequency weights'. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required. |
None
|
share_last_layer
|
Optional[bool]
|
This parameter is not present in the before mentioned paper but it is implemented in
the official repo.
If |
None
|
full_embed_dropout
|
Optional[bool]
|
If |
None
|
input_dim
|
int
|
The so-called dimension of the model. Is the number of embeddings used to encode the categorical and/or continuous columns. |
64
|
kv_compression_factor
|
float
|
By default, the FTTransformer uses Linear Attention (See Linformer: Self-Attention with Linear Complexity ). The compression factor that will be used to reduce the input sequence length. If we denote the resulting sequence length as \(k = int(kv_{compression \space factor} \times s)\) where \(s\) is the input sequence length. |
0.5
|
kv_sharing
|
bool
|
Boolean indicating if the \(E\) and \(F\) projection matrices will share weights. See Linformer: Self-Attention with Linear Complexity for details |
False
|
n_heads
|
int
|
Number of attention heads per FTTransformer block |
8
|
use_qkv_bias
|
bool
|
Boolean indicating whether or not to use bias in the Q, K, and V projection layers |
False
|
n_blocks
|
int
|
Number of FTTransformer blocks |
4
|
attn_dropout
|
float
|
Dropout that will be applied to the Linear-Attention layers |
0.2
|
ff_dropout
|
float
|
Dropout that will be applied to the FeedForward network |
0.1
|
ff_factor
|
float
|
Multiplicative factor applied to the first layer of the FF network in each Transformer block, This is normally set to 4, but they use 4/3 in the paper. |
1.33
|
transformer_activation
|
str
|
Transformer Encoder activation function. 'tanh', 'relu', 'leaky_relu', 'gelu', 'geglu' and 'reglu' are supported |
'reglu'
|
mlp_hidden_dims
|
Optional[List[int]]
|
List with the number of neurons per dense layer in the MLP. e.g: [64, 32]. If not provided no MLP on top of the final FTTransformer block will be used. |
None
|
mlp_activation
|
Optional[str]
|
Activation function for the dense layers of the MLP. Currently
'tanh', 'relu', 'leaky'_relu' and _'gelu' are supported.
If 'mlp_hidden_dims' is not |
None
|
mlp_dropout
|
Optional[float]
|
float with the dropout between the dense layers of the MLP.
If 'mlp_hidden_dims' is not |
None
|
mlp_batchnorm
|
Optional[bool]
|
Boolean indicating whether or not batch normalization will be applied
to the dense layers
If 'mlp_hidden_dims' is not |
None
|
mlp_batchnorm_last
|
Optional[bool]
|
Boolean indicating whether or not batch normalization will be applied
to the last of the dense layers
If 'mlp_hidden_dims' is not |
None
|
mlp_linear_first
|
Optional[bool]
|
Boolean indicating the order of the operations in the dense
layer. If |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
encoder |
Module
|
Sequence of FTTransformer blocks |
mlp |
Module
|
MLP component in the model |
Examples:
>>> import torch
>>> from pytorch_widedeep.models import FTTransformer
>>> X_tab = torch.cat((torch.empty(5, 4).random_(4), torch.rand(5, 1)), axis=1)
>>> colnames = ['a', 'b', 'c', 'd', 'e']
>>> cat_embed_input = [(u,i) for u,i in zip(colnames[:4], [4]*4)]
>>> continuous_cols = ['e']
>>> column_idx = {k:v for v,k in enumerate(colnames)}
>>> model = FTTransformer(column_idx=column_idx, cat_embed_input=cat_embed_input, continuous_cols=continuous_cols)
>>> out = model(X_tab)
Source code in pytorch_widedeep/models/tabular/transformers/ft_transformer.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 | |
output_dim
property
¶
output_dim
The output dimension of the model. This is a required property
neccesary to build the WideDeep class
attention_weights
property
¶
attention_weights
List with the attention weights per block
The shape of the attention weights is: \((N, H, F, k)\), where \(N\) is the batch size, \(H\) is the number of attention heads, \(F\) is the number of features/columns and \(k\) is the reduced sequence length or dimension, i.e. \(k = int(kv_{compression \space factor} \times s)\)
TabPerceiver ¶
Bases: BaseTabularModelWithAttention
Defines an adaptation of a Perceiver
that can be used as the deeptabular component of a Wide & Deep model
or independently by itself.
Most of the parameters for this class are Optional since the use of
categorical or continuous is in fact optional (i.e. one can use
categorical features only, continuous features only or both).
NOTE: while there are scientific publications for
the TabTransformer, SAINT and FTTransformer, the TabPerceiver
and the TabFastFormer are our own adaptations of the
Perceiver and the
FastFormer for tabular data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
column_idx
|
Dict[str, int]
|
Dict containing the index of the columns that will be passed through
the |
required |
cat_embed_input
|
Optional[List[Tuple[str, int]]]
|
List of Tuples with the column name and number of unique values and embedding dimension. e.g. [(education, 11), ...] |
None
|
cat_embed_dropout
|
Optional[float]
|
Categorical embeddings dropout. If |
None
|
use_cat_bias
|
Optional[bool]
|
Boolean indicating if bias will be used for the categorical embeddings.
If |
None
|
cat_embed_activation
|
Optional[str]
|
Activation function for the categorical embeddings, if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported |
None
|
shared_embed
|
Optional[bool]
|
Boolean indicating if the embeddings will be "shared". The idea behind |
None
|
add_shared_embed
|
Optional[bool]
|
The two embedding sharing strategies are: 1) add the shared embeddings
to the column embeddings or 2) to replace the first
|
None
|
frac_shared_embed
|
Optional[float]
|
The fraction of embeddings that will be shared (if |
None
|
continuous_cols
|
Optional[List[str]]
|
List with the name of the numeric (aka continuous) columns |
None
|
cont_norm_layer
|
Optional[Literal[batchnorm, layernorm]]
|
Type of normalization layer applied to the continuous features.
Options are: 'layernorm' and 'batchnorm'. if |
None
|
embed_continuous_method
|
Optional[Literal[standard, piecewise, periodic]]
|
Method to use to embed the continuous features. Options are: 'standard', 'periodic' or 'piecewise'. The 'standard' embedding method is based on the FT-Transformer implementation presented in the paper: Revisiting Deep Learning Models for Tabular Data. The 'periodic' and_'piecewise'_ methods were presented in the paper: On Embeddings for Numerical Features in Tabular Deep Learning. Please, read the papers for details. |
'standard'
|
cont_embed_dropout
|
Optional[float]
|
Dropout for the continuous embeddings. If |
None
|
cont_embed_activation
|
Optional[str]
|
Activation function for the continuous embeddings if any. Currently
'tanh', 'relu', 'leaky_relu' and 'gelu' are supported.
If |
None
|
quantization_setup
|
Optional[Dict[str, List[float]]]
|
This parameter is used when the 'piecewise' method is used to embed the continuous cols. It is a dict where keys are the name of the continuous columns and values are lists with the boundaries for the quantization of the continuous_cols. See the examples for details. If If the 'piecewise' method is used, this parameter is required. |
None
|
n_frequencies
|
Optional[int]
|
This is the so called 'k' in their paper On Embeddings for Numerical Features in Tabular Deep Learning, and is the number of 'frequencies' that will be used to represent each continuous column. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required. |
None
|
sigma
|
Optional[float]
|
This is the sigma parameter in the paper mentioned when describing the previous parameters and it is used to initialise the 'frequency weights'. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required. |
None
|
share_last_layer
|
Optional[bool]
|
This parameter is not present in the before mentioned paper but it is implemented in
the official repo.
If |
None
|
full_embed_dropout
|
Optional[bool]
|
If |
None
|
input_dim
|
int
|
The so-called dimension of the model. Is the number of embeddings used to encode the categorical and/or continuous columns. |
32
|
n_cross_attns
|
int
|
Number of times each perceiver block will cross attend to the input
data (i.e. number of cross attention components per perceiver block).
This should normally be 1. However, in the paper they describe some
architectures (normally computer vision-related problems) where the
Perceiver attends multiple times to the input array. Therefore, maybe
multiple cross attention to the input array is also useful in some
cases for tabular data |
1
|
n_cross_attn_heads
|
int
|
Number of attention heads for the cross attention component |
4
|
n_latents
|
int
|
Number of latents. This is the \(N\) parameter in the paper. As indicated in the paper, this number should be significantly lower than \(M\) (the number of columns in the dataset). Setting \(N\) closer to \(M\) defies the main purpose of the Perceiver, which is to overcome the transformer quadratic bottleneck |
16
|
latent_dim
|
int
|
Latent dimension. |
128
|
n_latent_heads
|
int
|
Number of attention heads per Latent Transformer |
4
|
n_latent_blocks
|
int
|
Number of transformer encoder blocks (normalised MHA + normalised FF) per Latent Transformer |
4
|
n_perceiver_blocks
|
int
|
Number of Perceiver blocks defined as [Cross Attention + Latent Transformer] |
4
|
share_weights
|
bool
|
Boolean indicating if the weights will be shared between Perceiver blocks |
False
|
attn_dropout
|
float
|
Dropout that will be applied to the Multi-Head Attention layers |
0.1
|
ff_dropout
|
float
|
Dropout that will be applied to the FeedForward network |
0.1
|
ff_factor
|
int
|
Multiplicative factor applied to the first layer of the FF network in each Transformer block, This is normally set to 4. |
4
|
transformer_activation
|
str
|
Transformer Encoder activation function. 'tanh', 'relu', 'leaky_relu', 'gelu', 'geglu' and 'reglu' are supported |
'geglu'
|
mlp_hidden_dims
|
Optional[List[int]]
|
List with the number of neurons per dense layer in the MLP. e.g: [64, 32]. If not provided no MLP on top of the final Transformer block will be used. |
None
|
mlp_activation
|
Optional[str]
|
Activation function for the dense layers of the MLP. Currently
'tanh', 'relu', 'leaky'_relu' and _'gelu' are supported.
If 'mlp_hidden_dims' is not |
None
|
mlp_dropout
|
Optional[float]
|
float with the dropout between the dense layers of the MLP.
If 'mlp_hidden_dims' is not |
None
|
mlp_batchnorm
|
Optional[bool]
|
Boolean indicating whether or not batch normalization will be applied
to the dense layers
If 'mlp_hidden_dims' is not |
None
|
mlp_batchnorm_last
|
Optional[bool]
|
Boolean indicating whether or not batch normalization will be applied
to the last of the dense layers
If 'mlp_hidden_dims' is not |
None
|
mlp_linear_first
|
Optional[bool]
|
Boolean indicating the order of the operations in the dense
layer. If |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
encoder |
ModuleDict
|
ModuleDict with the Perceiver blocks |
latents |
Parameter
|
Latents that will be used for prediction |
mlp |
Module
|
MLP component in the model |
Examples:
>>> import torch
>>> from pytorch_widedeep.models import TabPerceiver
>>> X_tab = torch.cat((torch.empty(5, 4).random_(4), torch.rand(5, 1)), axis=1)
>>> colnames = ['a', 'b', 'c', 'd', 'e']
>>> cat_embed_input = [(u,i) for u,i in zip(colnames[:4], [4]*4)]
>>> continuous_cols = ['e']
>>> column_idx = {k:v for v,k in enumerate(colnames)}
>>> model = TabPerceiver(column_idx=column_idx, cat_embed_input=cat_embed_input,
... continuous_cols=continuous_cols, n_latents=2, latent_dim=16,
... n_perceiver_blocks=2)
>>> out = model(X_tab)
Source code in pytorch_widedeep/models/tabular/transformers/tab_perceiver.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 | |
output_dim
property
¶
output_dim
The output dimension of the model. This is a required property
neccesary to build the WideDeep class
attention_weights
property
¶
attention_weights
List with the attention weights. If the weights are not shared between perceiver blocks each element of the list will be a list itself containing the Cross Attention and Latent Transformer attention weights respectively
The shape of the attention weights is:
-
Cross Attention: \((N, C, L, F)\)
-
Latent Attention: \((N, T, L, L)\)
WHere \(N\) is the batch size, \(C\) is the number of Cross Attention heads, \(L\) is the number of Latents, \(F\) is the number of features/columns in the dataset and \(T\) is the number of Latent Attention heads
TabFastFormer ¶
Bases: BaseTabularModelWithAttention
Defines an adaptation of a FastFormer
that can be used as the deeptabular component of a Wide & Deep model
or independently by itself.
Most of the parameters for this class are Optional since the use of
categorical or continuous is in fact optional (i.e. one can use
categorical features only, continuous features only or both).
NOTE: while there are scientific publications for
the TabTransformer, SAINT and FTTransformer, the TabPerceiver
and the TabFastFormer are our own adaptations of the
Perceiver and the
FastFormer for tabular data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
column_idx
|
Dict[str, int]
|
Dict containing the index of the columns that will be passed through
the |
required |
cat_embed_input
|
Optional[List[Tuple[str, int]]]
|
List of Tuples with the column name and number of unique values and embedding dimension. e.g. [(education, 11), ...] |
None
|
cat_embed_dropout
|
Optional[float]
|
Categorical embeddings dropout. If |
None
|
use_cat_bias
|
Optional[bool]
|
Boolean indicating if bias will be used for the categorical embeddings.
If |
None
|
cat_embed_activation
|
Optional[str]
|
Activation function for the categorical embeddings, if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported |
None
|
shared_embed
|
Optional[bool]
|
Boolean indicating if the embeddings will be "shared". The idea behind |
None
|
add_shared_embed
|
Optional[bool]
|
The two embedding sharing strategies are: 1) add the shared embeddings
to the column embeddings or 2) to replace the first
|
None
|
frac_shared_embed
|
Optional[float]
|
The fraction of embeddings that will be shared (if |
None
|
continuous_cols
|
Optional[List[str]]
|
List with the name of the numeric (aka continuous) columns |
None
|
cont_norm_layer
|
Optional[Literal[batchnorm, layernorm]]
|
Type of normalization layer applied to the continuous features.
Options are: 'layernorm' and 'batchnorm'. if |
None
|
embed_continuous_method
|
Optional[Literal[standard, piecewise, periodic]]
|
Method to use to embed the continuous features. Options are: 'standard', 'periodic' or 'piecewise'. The 'standard' embedding method is based on the FT-Transformer implementation presented in the paper: Revisiting Deep Learning Models for Tabular Data. The 'periodic' and_'piecewise'_ methods were presented in the paper: On Embeddings for Numerical Features in Tabular Deep Learning. Please, read the papers for details. |
'standard'
|
cont_embed_dropout
|
Optional[float]
|
Dropout for the continuous embeddings. If |
None
|
cont_embed_activation
|
Optional[str]
|
Activation function for the continuous embeddings if any. Currently
'tanh', 'relu', 'leaky_relu' and 'gelu' are supported.
If |
None
|
quantization_setup
|
Optional[Dict[str, List[float]]]
|
This parameter is used when the 'piecewise' method is used to embed the continuous cols. It is a dict where keys are the name of the continuous columns and values are lists with the boundaries for the quantization of the continuous_cols. See the examples for details. If If the 'piecewise' method is used, this parameter is required. |
None
|
n_frequencies
|
Optional[int]
|
This is the so called 'k' in their paper On Embeddings for Numerical Features in Tabular Deep Learning, and is the number of 'frequencies' that will be used to represent each continuous column. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required. |
None
|
sigma
|
Optional[float]
|
This is the sigma parameter in the paper mentioned when describing the previous parameters and it is used to initialise the 'frequency weights'. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required. |
None
|
share_last_layer
|
Optional[bool]
|
This parameter is not present in the before mentioned paper but it is implemented in
the official repo.
If |
None
|
full_embed_dropout
|
Optional[bool]
|
If |
None
|
input_dim
|
int
|
The so-called dimension of the model. Is the number of embeddings used to encode the categorical and/or continuous columns |
32
|
n_heads
|
int
|
Number of attention heads per FastFormer block |
8
|
use_bias
|
bool
|
Boolean indicating whether or not to use bias in the Q, K, and V projection layers |
False
|
n_blocks
|
int
|
Number of FastFormer blocks |
4
|
attn_dropout
|
float
|
Dropout that will be applied to the Additive Attention layers |
0.1
|
ff_dropout
|
float
|
Dropout that will be applied to the FeedForward network |
0.2
|
ff_factor
|
int
|
Multiplicative factor applied to the first layer of the FF network in each Transformer block, This is normally set to 4. |
4
|
share_qv_weights
|
bool
|
Following the paper, this is a boolean indicating if the Value (\(V\)) and the Query (\(Q\)) transformation parameters will be shared. |
False
|
share_weights
|
bool
|
In addition to sharing the \(V\) and \(Q\) transformation parameters, the
parameters across different Fastformer layers can also be shared.
Please, see
|
False
|
transformer_activation
|
str
|
Transformer Encoder activation function. 'tanh', 'relu', 'leaky_relu', 'gelu', 'geglu' and 'reglu' are supported |
'relu'
|
mlp_hidden_dims
|
Optional[List[int]]
|
MLP hidden dimensions. If not provided no MLP on top of the final FTTransformer block will be used |
None
|
mlp_hidden_dims
|
Optional[List[int]]
|
List with the number of neurons per dense layer in the MLP. e.g: [64, 32]. If not provided no MLP on top of the final Transformer block will be used. |
None
|
mlp_activation
|
Optional[str]
|
Activation function for the dense layers of the MLP. Currently
'tanh', 'relu', 'leaky'_relu' and _'gelu' are supported.
If 'mlp_hidden_dims' is not |
None
|
mlp_dropout
|
Optional[float]
|
float with the dropout between the dense layers of the MLP.
If 'mlp_hidden_dims' is not |
None
|
mlp_batchnorm
|
Optional[bool]
|
Boolean indicating whether or not batch normalization will be applied
to the dense layers
If 'mlp_hidden_dims' is not |
None
|
mlp_batchnorm_last
|
Optional[bool]
|
Boolean indicating whether or not batch normalization will be applied
to the last of the dense layers
If 'mlp_hidden_dims' is not |
None
|
mlp_linear_first
|
Optional[bool]
|
Boolean indicating the order of the operations in the dense
layer. If |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
encoder |
Module
|
Sequence of FasFormer blocks. |
mlp |
Module
|
MLP component in the model |
Examples:
>>> import torch
>>> from pytorch_widedeep.models import TabFastFormer
>>> X_tab = torch.cat((torch.empty(5, 4).random_(4), torch.rand(5, 1)), axis=1)
>>> colnames = ['a', 'b', 'c', 'd', 'e']
>>> cat_embed_input = [(u,i) for u,i in zip(colnames[:4], [4]*4)]
>>> continuous_cols = ['e']
>>> column_idx = {k:v for v,k in enumerate(colnames)}
>>> model = TabFastFormer(column_idx=column_idx, cat_embed_input=cat_embed_input, continuous_cols=continuous_cols)
>>> out = model(X_tab)
Source code in pytorch_widedeep/models/tabular/transformers/tab_fastformer.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 | |
output_dim
property
¶
output_dim
The output dimension of the model. This is a required property
neccesary to build the WideDeep class
attention_weights
property
¶
attention_weights
List with the attention weights. Each element of the list is a tuple where the first and second elements are the \(\alpha\) and \(\beta\) attention weights in the paper.
The shape of the attention weights is \((N, H, F)\) where \(N\) is the batch size, \(H\) is the number of attention heads and \(F\) is the number of features/columns in the dataset
BasicRNN ¶
Bases: BaseWDModelComponent
Standard text classifier/regressor comprised by a stack of RNNs
(LSTMs or GRUs) that can be used as the deeptext component of a Wide &
Deep model or independently by itself.
In addition, there is the option to add a Fully Connected (FC) set of dense layers on top of the stack of RNNs
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vocab_size
|
int
|
Number of words in the vocabulary |
required |
embed_dim
|
Optional[int]
|
Dimension of the word embeddings if non-pretained word vectors are used |
None
|
embed_matrix
|
Optional[ndarray]
|
Pretrained word embeddings |
None
|
embed_trainable
|
bool
|
Boolean indicating if the pretrained embeddings are trainable |
True
|
rnn_type
|
Literal[lstm, gru]
|
String indicating the type of RNN to use. One of 'lstm' or 'gru' |
'lstm'
|
hidden_dim
|
int
|
Hidden dim of the RNN |
64
|
n_layers
|
int
|
Number of recurrent layers |
3
|
rnn_dropout
|
float
|
Dropout for each RNN layer except the last layer |
0.0
|
bidirectional
|
bool
|
Boolean indicating whether the staked RNNs are bidirectional |
False
|
use_hidden_state
|
bool
|
Boolean indicating whether to use the final hidden state or the RNN's output as predicting features. Typically the former is used. |
True
|
padding_idx
|
int
|
index of the padding token in the padded-tokenised sequences. The
|
1
|
head_hidden_dims
|
Optional[List[int]]
|
List with the sizes of the dense layers in the head e.g: [128, 64] |
None
|
head_activation
|
str
|
Activation function for the dense layers in the head. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported |
'relu'
|
head_dropout
|
Optional[float]
|
Dropout of the dense layers in the head |
None
|
head_batchnorm
|
bool
|
Boolean indicating whether or not to include batch normalization in the dense layers that form the 'rnn_mlp' |
False
|
head_batchnorm_last
|
bool
|
Boolean indicating whether or not to apply batch normalization to the last of the dense layers in the head |
False
|
head_linear_first
|
bool
|
Boolean indicating whether the order of the operations in the dense
layer. If |
False
|
Attributes:
| Name | Type | Description |
|---|---|---|
word_embed |
Module
|
word embedding matrix |
rnn |
Module
|
Stack of RNNs |
rnn_mlp |
Module
|
Stack of dense layers on top of the RNN. This will only exists if
|
Examples:
>>> import torch
>>> from pytorch_widedeep.models import BasicRNN
>>> X_text = torch.cat((torch.zeros([5,1]), torch.empty(5, 4).random_(1,4)), axis=1)
>>> model = BasicRNN(vocab_size=4, hidden_dim=4, n_layers=2, padding_idx=0, embed_dim=4)
>>> out = model(X_text)
Source code in pytorch_widedeep/models/text/rnns/basic_rnn.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 | |
output_dim
property
¶
output_dim
The output dimension of the model. This is a required property
neccesary to build the WideDeep class
AttentiveRNN ¶
Bases: BasicRNN
Text classifier/regressor comprised by a stack of RNNs
(LSTMs or GRUs) plus an attention layer. This model can be used as the
deeptext component of a Wide & Deep model or independently by
itself.
In addition, there is the option to add a Fully Connected (FC) set of dense layers on top of attention layer
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vocab_size
|
int
|
Number of words in the vocabulary |
required |
embed_dim
|
Optional[int]
|
Dimension of the word embeddings if non-pretained word vectors are used |
None
|
embed_matrix
|
Optional[ndarray]
|
Pretrained word embeddings |
None
|
embed_trainable
|
bool
|
Boolean indicating if the pretrained embeddings are trainable |
True
|
rnn_type
|
Literal[lstm, gru]
|
String indicating the type of RNN to use. One of 'lstm' or 'gru' |
'lstm'
|
hidden_dim
|
int
|
Hidden dim of the RNN |
64
|
n_layers
|
int
|
Number of recurrent layers |
3
|
rnn_dropout
|
float
|
Dropout for each RNN layer except the last layer |
0.1
|
bidirectional
|
bool
|
Boolean indicating whether the staked RNNs are bidirectional |
False
|
use_hidden_state
|
bool
|
Boolean indicating whether to use the final hidden state or the RNN's output as predicting features. Typically the former is used. |
True
|
padding_idx
|
int
|
index of the padding token in the padded-tokenised sequences. The
|
1
|
attn_concatenate
|
bool
|
Boolean indicating if the input to the attention mechanism will be the output of the RNN or the output of the RNN concatenated with the last hidden state. |
True
|
attn_dropout
|
float
|
Internal dropout for the attention mechanism |
0.1
|
head_hidden_dims
|
Optional[List[int]]
|
List with the sizes of the dense layers in the head e.g: [128, 64] |
None
|
head_activation
|
str
|
Activation function for the dense layers in the head. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported |
'relu'
|
head_dropout
|
Optional[float]
|
Dropout of the dense layers in the head |
None
|
head_batchnorm
|
bool
|
Boolean indicating whether or not to include batch normalization in the dense layers that form the 'rnn_mlp' |
False
|
head_batchnorm_last
|
bool
|
Boolean indicating whether or not to apply batch normalization to the last of the dense layers in the head |
False
|
head_linear_first
|
bool
|
Boolean indicating whether the order of the operations in the dense
layer. If |
False
|
Attributes:
| Name | Type | Description |
|---|---|---|
word_embed |
Module
|
word embedding matrix |
rnn |
Module
|
Stack of RNNs |
rnn_mlp |
Module
|
Stack of dense layers on top of the RNN. This will only exists if
|
Examples:
>>> import torch
>>> from pytorch_widedeep.models import AttentiveRNN
>>> X_text = torch.cat((torch.zeros([5,1]), torch.empty(5, 4).random_(1,4)), axis=1)
>>> model = AttentiveRNN(vocab_size=4, hidden_dim=4, n_layers=2, padding_idx=0, embed_dim=4)
>>> out = model(X_text)
Source code in pytorch_widedeep/models/text/rnns/attentive_rnn.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 | |
attention_weights
property
¶
attention_weights
List with the attention weights
The shape of the attention weights is \((N, S)\), where \(N\) is the batch size and \(S\) is the length of the sequence
StackedAttentiveRNN ¶
Bases: BaseWDModelComponent
Text classifier/regressor comprised by a stack of blocks:
[RNN + Attention]. This can be used as the deeptext component of a
Wide & Deep model or independently by itself.
In addition, there is the option to add a Fully Connected (FC) set of dense layers on top of the attentiob blocks
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vocab_size
|
int
|
Number of words in the vocabulary |
required |
embed_dim
|
Optional[int]
|
Dimension of the word embeddings if non-pretained word vectors are used |
None
|
embed_matrix
|
Optional[ndarray]
|
Pretrained word embeddings |
None
|
embed_trainable
|
bool
|
Boolean indicating if the pretrained embeddings are trainable |
True
|
rnn_type
|
Literal[lstm, gru]
|
String indicating the type of RNN to use. One of 'lstm' or 'gru' |
'lstm'
|
hidden_dim
|
int
|
Hidden dim of the RNN |
64
|
bidirectional
|
bool
|
Boolean indicating whether the staked RNNs are bidirectional |
False
|
padding_idx
|
int
|
index of the padding token in the padded-tokenised sequences. The
|
1
|
n_blocks
|
int
|
Number of attention blocks. Each block is comprised by an RNN and a Context Attention Encoder |
3
|
attn_concatenate
|
bool
|
Boolean indicating if the input to the attention mechanism will be the output of the RNN or the output of the RNN concatenated with the last hidden state or simply |
False
|
attn_dropout
|
float
|
Internal dropout for the attention mechanism |
0.1
|
with_addnorm
|
bool
|
Boolean indicating if the output of each block will be added to the input and normalised |
False
|
head_hidden_dims
|
Optional[List[int]]
|
List with the sizes of the dense layers in the head e.g: [128, 64] |
None
|
head_activation
|
str
|
Activation function for the dense layers in the head. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported |
'relu'
|
head_dropout
|
Optional[float]
|
Dropout of the dense layers in the head |
None
|
head_batchnorm
|
bool
|
Boolean indicating whether or not to include batch normalization in the dense layers that form the 'rnn_mlp' |
False
|
head_batchnorm_last
|
bool
|
Boolean indicating whether or not to apply batch normalization to the last of the dense layers in the head |
False
|
head_linear_first
|
bool
|
Boolean indicating whether the order of the operations in the dense
layer. If |
False
|
Attributes:
| Name | Type | Description |
|---|---|---|
word_embed |
Module
|
word embedding matrix |
rnn |
Module
|
Stack of RNNs |
rnn_mlp |
Module
|
Stack of dense layers on top of the RNN. This will only exists if
|
Examples:
>>> import torch
>>> from pytorch_widedeep.models import StackedAttentiveRNN
>>> X_text = torch.cat((torch.zeros([5,1]), torch.empty(5, 4).random_(1,4)), axis=1)
>>> model = StackedAttentiveRNN(vocab_size=4, hidden_dim=4, padding_idx=0, embed_dim=4)
>>> out = model(X_text)
Source code in pytorch_widedeep/models/text/rnns/stacked_attentive_rnn.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 | |
output_dim
property
¶
output_dim
The output dimension of the model. This is a required property
neccesary to build the WideDeep class
attention_weights
property
¶
attention_weights
List with the attention weights per block
The shape of the attention weights is \((N, S)\) Where \(N\) is the batch size and \(S\) is the length of the sequence
HFModel ¶
Bases: BaseWDModelComponent
This class is a wrapper around the Hugging Face transformers library. It can be used as the text component of a Wide & Deep model or independently by itself.
At the moment only models from the families BERT, RoBERTa, DistilBERT, ALBERT and ELECTRA are supported. This is because this library is designed to address classification and regression tasks and these are the most 'popular' encoder-only models, which have proved to be those that work best for these tasks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_name
|
str
|
The model name from the transformers library e.g. 'bert-base-uncased'. Currently supported models are those from the families: BERT, RoBERTa, DistilBERT, ALBERT and ELECTRA. |
required |
use_cls_token
|
bool
|
Boolean indicating whether to use the [CLS] token or the mean of the sequence of hidden states as the sentence embedding |
True
|
trainable_parameters
|
Optional[List[str]]
|
List with the names of the model parameters that will be trained. If None, none of the parameters will be trainable |
None
|
head_hidden_dims
|
Optional[List[int]]
|
List with the sizes of the dense layers in the head e.g: [128, 64] |
None
|
head_activation
|
str
|
Activation function for the dense layers in the head. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported |
'relu'
|
head_dropout
|
Optional[float]
|
Dropout of the dense layers in the head |
None
|
head_batchnorm
|
bool
|
Boolean indicating whether or not to include batch normalization in the dense layers that form the head |
False
|
head_batchnorm_last
|
bool
|
Boolean indicating whether or not to apply batch normalization to the last of the dense layers in the head |
False
|
head_linear_first
|
bool
|
Boolean indicating whether the order of the operations in the dense
layer. If |
False
|
verbose
|
bool
|
If True, it will print information about the model |
False
|
**kwargs
|
Additional kwargs to be passed to the model |
{}
|
Attributes:
| Name | Type | Description |
|---|---|---|
head |
Module
|
Stack of dense layers on top of the transformer. This will only exists
if |
Examples:
>>> import torch
>>> from pytorch_widedeep.models import HFModel
>>> X_text = torch.cat((torch.zeros([5,1]), torch.empty(5, 4).random_(1,4)), axis=1).long()
>>> model = HFModel(model_name='bert-base-uncased')
>>> out = model(X_text)
Source code in pytorch_widedeep/models/text/huggingface_transformers/hf_model.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 | |
attention_weight
property
¶
attention_weight
Returns the attention weights if the model was created with the output_attention_weights=True argument. If not, it will raise an AttributeError.
The shape of the attention weights is \((N, H, F, F)\), where \(N\) is the batch size, \(H\) is the number of attention heads and \(F\) is the sequence length.
Vision ¶
Bases: BaseWDModelComponent
Defines a standard image classifier/regressor using a pretrained
network or a sequence of convolution layers that can be used as the
deepimage component of a Wide & Deep model or independently by
itself.
NOTE: this class represents the integration
between pytorch-widedeep and torchvision. New architectures will be
available as they are added to torchvision. In a distant future we aim
to bring transformer-based architectures as well. However, simple
CNN-based architectures (and even MLP-based) seem to produce SoTA
results. For the time being, we describe below the options available
through this class
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pretrained_model_setup
|
Union[str, Dict[str, Union[str, WeightsEnum]]]
|
Name of the pretrained model. Should be a variant of the following
architectures: 'resnet', 'shufflenet', 'resnext',
'wide_resnet', 'regnet', 'densenet', 'mobilenetv3',
'mobilenetv2', 'mnasnet', 'efficientnet' and 'squeezenet'. if
|
None
|
n_trainable
|
Optional[int]
|
Number of trainable layers starting from the layer closer to the
output neuron(s). Note that this number DOES NOT take into account
the so-called 'head' which is ALWAYS trainable. If
|
None
|
trainable_params
|
Optional[List[str]]
|
List of strings containing the names (or substring within the name) of
the parameters that will be trained. For example, if we use a
'resnet18' pretrained model and we set |
None
|
channel_sizes
|
List[int]
|
List of integers with the channel sizes of a CNN in case we choose not to use a pretrained model |
[64, 128, 256, 512]
|
kernel_sizes
|
Union[int, List[int]]
|
List of integers with the kernel sizes of a CNN in case we choose not
to use a pretrained model. Must be of length equal to |
[7, 3, 3, 3]
|
strides
|
Union[int, List[int]]
|
List of integers with the stride sizes of a CNN in case we choose not
to use a pretrained model. Must be of length equal to |
[2, 1, 1, 1]
|
head_hidden_dims
|
Optional[List[int]]
|
List with the number of neurons per dense layer in the head. e.g: [64,32] |
None
|
head_activation
|
str
|
Activation function for the dense layers in the head. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported |
'relu'
|
head_dropout
|
Union[float, List[float]]
|
float indicating the dropout between the dense layers. |
0.1
|
head_batchnorm
|
bool
|
Boolean indicating whether or not batch normalization will be applied to the dense layers |
False
|
head_batchnorm_last
|
bool
|
Boolean indicating whether or not batch normalization will be applied to the last of the dense layers |
False
|
head_linear_first
|
bool
|
Boolean indicating the order of the operations in the dense
layer. If |
False
|
Attributes:
| Name | Type | Description |
|---|---|---|
features |
Module
|
The pretrained model or Standard CNN plus the optional head |
Examples:
>>> import torch
>>> from pytorch_widedeep.models import Vision
>>> X_img = torch.rand((2,3,224,224))
>>> model = Vision(channel_sizes=[64, 128], kernel_sizes = [3, 3], strides=[1, 1], head_hidden_dims=[32, 8])
>>> out = model(X_img)
Source code in pytorch_widedeep/models/image/vision.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 | |
output_dim
property
¶
output_dim
The output dimension of the model. This is a required property
neccesary to build the WideDeep class
ModelFuser ¶
Bases: BaseWDModelComponent
This class is a wrapper around a list of models that are associated to the different text and/or image columns (and datasets) The class is designed to 'fuse' the models using a variety of methods.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
models
|
List[BaseWDModelComponent]
|
List of models whose outputs will be fused |
required |
fusion_method
|
Union[Literal[concatenate, mean, max, sum, mult, dot, head], List[Literal[concatenate, mean, max, sum, mult, head]]]
|
Method to fuse the output of the models. It can be one of ['concatenate', 'mean', 'max', 'sum', 'mult', 'dot', 'head'] or a list of those, but 'dot'. If a list is provided the output of the models will be fused using all the methods in the list and the final output will be the concatenation of the outputs of each method |
required |
projection_method
|
Optional[Literal[min, max, mean]]
|
If the fusion_method is not 'concatenate', this parameter will determine how to project the output of the models to a common dimension. It can be one of ['min', 'max', 'mean']. Default is None |
None
|
custom_head
|
Optional[Union[BaseWDModelComponent, Module]]
|
Custom head to be used to fuse the output of the models. If provided, this will take precedence over head_hidden_dims. Also, if provided, 'projection_method' will be ignored. |
None
|
head_hidden_dims
|
Optional[List[int]]
|
List with the number of neurons per layer in the custom head. If custom_head is provided, this parameter will be ignored |
None
|
head_activation
|
Optional[str]
|
Activation function to be used in the custom head. Default is None |
None
|
head_dropout
|
Optional[float]
|
Dropout to be used in the custom head. Default is None |
None
|
head_batchnorm
|
Optional[bool]
|
Whether to use batchnorm in the custom head. Default is None |
None
|
head_batchnorm_last
|
Optional[bool]
|
Whether or not batch normalization will be applied to the last of the dense layers |
None
|
head_linear_first
|
Optional[bool]
|
Boolean indicating the order of the operations in the dense
layer. If |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
head |
Module or BaseWDModelComponent
|
Custom head to be used to fuse the output of the models. If custom_head is provided, this will take precedence over head_hidden_dims |
Examples:
>>> from pytorch_widedeep.preprocessing import TextPreprocessor
>>> from pytorch_widedeep.models import BasicRNN, ModelFuser
>>> import torch
>>> import pandas as pd
>>>
>>> df = pd.DataFrame({'text_col1': ['hello world', 'this is a test'],
... 'text_col2': ['goodbye world', 'this is another test']})
>>> text_preprocessor_1 = TextPreprocessor(
... text_col="text_col1",
... max_vocab=10,
... min_freq=1,
... maxlen=5,
... n_cpus=1,
... verbose=0)
>>> text_preprocessor_2 = TextPreprocessor(
... text_col="text_col2",
... max_vocab=10,
... min_freq=1,
... maxlen=5,
... n_cpus=1,
... verbose=0)
>>> X_text1 = text_preprocessor_1.fit_transform(df)
>>> X_text2 = text_preprocessor_2.fit_transform(df)
>>> X_text1_tnsr = torch.from_numpy(X_text1)
>>> X_text2_tnsr = torch.from_numpy(X_text2)
>>> rnn1 = BasicRNN(
... vocab_size=len(text_preprocessor_1.vocab.itos),
... embed_dim=4,
... hidden_dim=4,
... n_layers=1,
... bidirectional=False)
>>> rnn2 = BasicRNN(
... vocab_size=len(text_preprocessor_2.vocab.itos),
... embed_dim=4,
... hidden_dim=4,
... n_layers=1,
... bidirectional=False)
>>> fused_model = ModelFuser(models=[rnn1, rnn2], fusion_method='concatenate')
>>> out = fused_model([X_text1_tnsr, X_text2_tnsr])
Source code in pytorch_widedeep/models/model_fusion.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 | |
WideDeep ¶
Bases: Module
Main collector class that combines all wide, deeptabular
deeptext and deepimage models.
Note that all models described so far in this library must be passed to
the WideDeep class once constructed. This is because the models output
the last layer before the prediction layer. Such prediction layer is
added by the WideDeep class as it collects the components for every
data mode.
There are two options to combine these models that correspond to the
two main architectures that pytorch-widedeep can build.
-
Directly connecting the output of the model components to an ouput neuron(s).
-
Adding a
Fully-Connected Head(FC-Head) on top of the deep models. This FC-Head will combine the output form thedeeptabular,deeptextanddeepimageand will be then connected to the output neuron(s).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
wide
|
Optional[Module]
|
|
None
|
deeptabular
|
Optional[Union[BaseWDModelComponent, List[BaseWDModelComponent]]]
|
Currently this library implements a number of possible architectures
for the |
None
|
deeptext
|
Optional[Union[BaseWDModelComponent, List[BaseWDModelComponent]]]
|
Currently this library implements a number of possible architectures
for the |
None
|
deepimage
|
Optional[Union[BaseWDModelComponent, List[BaseWDModelComponent]]]
|
Currently this library uses |
None
|
deephead
|
Optional[BaseWDModelComponent]
|
Alternatively, the user can pass a custom model that will receive the
output of the deep component. If |
None
|
head_hidden_dims
|
Optional[List[int]]
|
List with the sizes of the dense layers in the head e.g: [128, 64] |
None
|
head_activation
|
str
|
Activation function for the dense layers in the head. Currently
|
'relu'
|
head_dropout
|
float
|
Dropout of the dense layers in the head |
0.1
|
head_batchnorm
|
bool
|
Boolean indicating whether or not to include batch normalization in
the dense layers that form the |
False
|
head_batchnorm_last
|
bool
|
Boolean indicating whether or not to apply batch normalization to the last of the dense layers in the head |
False
|
head_linear_first
|
bool
|
Boolean indicating whether the order of the operations in the dense
layer. If |
True
|
enforce_positive
|
bool
|
Boolean indicating if the output from the final layer must be positive. This is important if you are using loss functions with non-negative input restrictions, e.g. RMSLE, or if you know your predictions are bounded in between 0 and inf |
False
|
enforce_positive_activation
|
str
|
Activation function to enforce that the final layer has a positive
output. |
'softplus'
|
pred_dim
|
int
|
Size of the final wide and deep output layer containing the
predictions. |
1
|
Examples:
>>> from pytorch_widedeep.models import TabResnet, Vision, BasicRNN, Wide, WideDeep
>>> embed_input = [(u, i, j) for u, i, j in zip(["a", "b", "c"][:4], [4] * 3, [8] * 3)]
>>> column_idx = {k: v for v, k in enumerate(["a", "b", "c"])}
>>> wide = Wide(10, 1)
>>> deeptabular = TabResnet(blocks_dims=[8, 4], column_idx=column_idx, cat_embed_input=embed_input)
>>> deeptext = BasicRNN(vocab_size=10, embed_dim=4, padding_idx=0)
>>> deepimage = Vision()
>>> model = WideDeep(wide=wide, deeptabular=deeptabular, deeptext=deeptext, deepimage=deepimage)
NOTE: It is possible to use custom components to
build Wide & Deep models. Simply, build them and pass them as the
corresponding parameters. Note that the custom models MUST return a last
layer of activations(i.e. not the final prediction) so that these
activations are collected by WideDeep and combined accordingly. In
addition, the models MUST also contain an attribute output_dim with
the size of these last layers of activations. See for example
pytorch_widedeep.models.tab_mlp.TabMlp
Source code in pytorch_widedeep/models/wide_deep.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 | |
.