We propose a new non-linear trainable activation function, called Sinu-sigmoidal Linear Unit (SinLU). Here, we aim to explore the sinusoidal properties in an activation function with maintaining a ReLU-like structure. SinLU is a continuous function with a buffer zone on the negative side of the X-axis similar to GELU and SiLU. Furthermore, SinLU is trainable which means it includes some parameters that get trained during the model training which alters its shape.
The proposed activation function is inspired by the properties of trainable parameters, sinusoid and ReLU like activation functions. In the ReLU activation function, the output of a neuron is multiplied by 1 or 0. This hard gating property often leads to some minor information loss. Introducing Cumulative Distribution Function (CDF) of the standard normal distribution to the ReLU helps in smoothing the output near x = 0. The Logistic Distribution CDF σ(x) can also be used which is known as SiLU x⋅ σ(x). We propose to introduce sinusoidal periodicity in this stage. Multiplying σ(x) with x+sin x instead of x adds a wiggle in SiLU resulting in a modified loss landscape. We define this function as SinLUbasic which is formulated in the equation below.
A more useful shape of the activation function can be devised by the introduction of some trainable parameters. We propose two such parameters a and b as shown in equation below.
The parameter a denotes the amplitude of the sine function which basically determines the participation of the sinusoid in the activation function. The parameter b determines the frequency of the sine wave. The figure below gives an idea about how the parameters shape the SinLU curve. A very high value of a may lead to a shape that is nowhere close to ReLU like function. This can be very easily avoided by proper initialization and hyperparameter controlled training. We start with the value 1 for both a and b and train these parameters with the same learning rate as used for the rest of the network.
Plot of SinLU for different values of its parameters. The subplot (a) refers to a SinLU curve with a=1.0,b=1.0 similarly (b) refers to a=5.0,b=1.0 and (c) refers to a=1.0,b=5.0