DenseMLP¤
connex.nn.DenseMLP (NeuralNetwork)
¤
A "Dense Multi-Layer Perceptron". Like a standard MLP, but every neuron is connected to every other neuron in all later layers, rather than only the next layer. That is, each layer uses the outputs from all previous layers, not just the most recent one, in a similar manner to DenseNet.
Cite
Densely Connected Convolutional Networks
@article{ba2016layer,
author={Gao Huang, Zhuang Liu, Laurens van der Maaten, Kilian Q. Weinberger}, # noqa: E501
title={Densely Connected Convolutional Networks},
year={2016},
journal={arXiv:1608.06993},
}
__init__(self, input_size: int, output_size: int, width: int, depth: int, hidden_activation: Callable = <function gelu>, output_transformation: Callable = <function _identity>, dropout_p: Union[float, Mapping[Any, float]] = 0.0, use_topo_norm: bool = False, use_topo_self_attention: bool = False, use_neuron_self_attention: bool = False, use_adaptive_activations: bool = False, *, key: Optional[PRNGKey] = None)
¤
Arguments:
input_size
: The number of neurons in the input layer.output_size
: The number of neurons in the output layer.width
: The number of neurons in each hidden layer.depth
: The number of hidden layers.hidden_activation
: The activation function applied element-wise to the hidden (i.e. non-input, non-output) neurons. It can itself be a trainableequinox Module
.output_transformation
: The transformation applied group-wise to the output neurons, e.g.jax.nn.softmax
. It can itself be a trainableequinox.Module
.dropout_p
: Dropout probability. If a singlefloat
, the same dropout probability will be applied to all hidden neurons. If aMapping[Any, float]
,dropout_p[i]
refers to the dropout probability of neuroni
. All neurons default to zero unless otherwise specified. Note that this allows dropout to be applied to input and output neurons as well.-
use_topo_norm
: Abool
indicating whether to apply a topological batch- version of Layer Norm,Cite
@article{ba2016layer, author={Jimmy Lei Ba, Jamie Ryan Kriso, Geoffrey E. Hinton}, title={Layer Normalization}, year={2016}, journal={arXiv:1607.06450}, }
where the collective inputs of each topological batch are standardized (made to have mean 0 and variance 1), with learnable elementwise-affine parameters
gamma
andbeta
. -use_topo_self_attention
: Abool
indicating whether to apply (single-headed) self-attention to each topological batch's collective inputs.Cite
@inproceedings{vaswani2017attention, author={Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, {\L}ukasz and Polosukhin, Illia}, booktitle={Advances in Neural Information Processing Systems}, publisher={Curran Associates, Inc.}, title={Attention is All You Need}, volume={30}, year={2017} }
-
use_neuron_self_attention
: Abool
indicating whether to apply neuron-wise self-attention, where each neuron applies (single-headed) self-attention to its inputs. If both_use_neuron_self_attention
anduse_neuron_norm
areTrue
, normalization is applied before self-attention.Warning
Neuron-level self-attention will use significantly more memory than than topo-level self-attention.
-
use_adaptive_activations
: A bool indicating whether to use neuron-wise adaptive activations, where all hidden activations transform asσ(x) -> a * σ(b * x)
, wherea
,b
are trainable scalar parameters unique to each neuron.Cite
Locally adaptive activation functions with slope recovery term for deep and physics-informed neural networks # noqa: E501
@article{Jagtap_2020, author={Ameya D. Jagtap, Kenji Kawaguchi, George Em Karniadakis}, title={Locally adaptive activation functions with slope recovery term for deep and physics-informed neural networks}, year={2020}, publisher={The Royal Society}, journal={Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences}, }
key
: ThePRNGKey
used to initialize parameters. Optional, keyword-only argument. Defaults tojax.random.PRNGKey(0)
.