A one-layer object position estimator using the expected value
By Simon on Regards: Machine Learning; Neural Networks; Convolution Neural Networks;The position estimation of objects in images is usually implemented by multi-layer convolutional neural networks, which final layer is a linear function. These networks often support scale-invariance, due to the convolution layers.
For very basic applications, such an approach can be more than needed and lead to a long training time and high memory consumption.
If scale invariance is not needed, the following approach can be used to create a one-layer object position detection neural network. The input images are processed as follows:
- Apply a Sobel filter.
- Apply a convolutional layer, ReLU and divide by the maximum value.
- Raise the values of the output to the power of 5 or a higher number.
- Normalize the output.
- Calculate the expected value of the image.
(Note, that some steps have been omitted to provide a simpler explanation)
Steps 2 to 4 will enforce that the output is a probability distribution. Due to the exponentiation in step 3, the maximum will remain while local minimia will disappear.
The following picture shows a test picture (original is from Urlaubsguru.de). Ten smiley faces have been added to a background image. The model was trained to find the only happy face on this image.
![Original picture](assets/img/blog/20250102-detected-0.png)
The next picture shows the output of step 2.
![Output of step 2](assets/img/blog/20250102-filtered-0.png)
The last picture shows the output of step 4.
![Output of step 4](assets/img/blog/20250102-prob-0.png)
In a training with 30 images with the Adam optimizer, this approach converges after about 50 epochs, which shows that this method is suitable for this task.
Propose a change