Model zoo performance
Akida 1.0 models for models targetting the Akida Neuromorphic Processor IP 1.0 and the AKD1000 reference SoC,
Akida 2.0 models for models targetting the Akida Neuromorphic Processor IP 2.0,
Upgrading to Akida 2.0 tutorial to understand the architectural differences between 1.0 and 2.0 models and their respective workflows.
Note
The download links provided point towards standard Tensorflow Keras models that must be converted to Akida model using cnn2snn.convert.
Akida 1.0 models
For 1.0 models, 4-bit accuracy is provided and is always obtained through a QAT phase.
Note
The “8/4/4” quantization scheme stands for 8-bit weights in the input layer, 4-bit weights in other layers and 4-bit activations.
Note
The NPs column provides the minimal number of neural processors required for the model excecution on the Akida IP. The numbers given are the result of the map operation using the Minimal MapMode targetting AKD1000 reference SoC.
Image domain
Classification
Architecture |
Resolution |
Dataset |
#Params |
Quantization |
Top-1 accuracy |
Example |
Size (KB) |
NPs |
Download |
---|---|---|---|---|---|---|---|---|---|
AkidaNet 0.25 |
160 |
ImageNet |
480K |
8/4/4 |
42.58% |
403.3 |
20 |
||
AkidaNet 0.5 |
160 |
ImageNet |
1.4M |
8/4/4 |
57.80% |
1089.1 |
24 |
||
AkidaNet |
160 |
ImageNet |
4.4M |
8/4/4 |
66.94% |
4061.1 |
68 |
||
AkidaNet 0.25 |
224 |
ImageNet |
480K |
8/4/4 |
46.71% |
409.1 |
22 |
||
AkidaNet 0.5 |
224 |
ImageNet |
1.4M |
8/4/4 |
61.30% |
1202.2 |
32 |
||
AkidaNet |
224 |
ImageNet |
4.4M |
8/4/4 |
69.65% |
6294.0 |
116 |
||
AkidaNet 0.5 edge |
160 |
ImageNet |
4.0M |
8/4/4 |
51.66% |
2017.4 |
38 |
||
AkidaNet 0.5 edge |
224 |
ImageNet |
4.0M |
8/4/4 |
54.03% |
2130.5 |
46 |
||
AkidaNet 0.5 |
224 |
PlantVillage |
1.1M |
8/4/4 |
97.92% |
1019.1 |
33 |
||
AkidaNet 0.25 |
96 |
Visual Wake Words |
229K |
8/4/4 |
84.77% |
179.6 |
16 |
||
MobileNetV1 0.25 |
160 |
ImageNet |
467K |
8/4/4 |
36.05% |
376.4 |
20 |
||
MobileNetV1 0.5 |
160 |
ImageNet |
1.3M |
8/4/4 |
54.59% |
1007.0 |
24 |
||
MobileNetV1 |
160 |
ImageNet |
4.2M |
8/4/4 |
65.47% |
3525.8 |
65 |
||
MobileNetV1 0.25 |
224 |
ImageNet |
467K |
8/4/4 |
39.73% |
377.9 |
22 |
||
MobileNetV1 0.5 |
224 |
ImageNet |
1.3M |
8/4/4 |
58.50% |
1065.3 |
32 |
||
MobileNetV1 |
224 |
ImageNet |
4.2M |
8/4/4 |
68.76% |
5223.3 |
110 |
||
GXNOR |
28 |
MNIST |
1.6M |
2/2/1 |
98.03% |
412.8 |
3 |
Object detection
Architecture |
Resolution |
Dataset |
#Params |
Quantization |
mAP |
Example |
Size (KB) |
NPs |
Download |
---|---|---|---|---|---|---|---|---|---|
YOLOv2 |
224 |
PASCAL-VOC 2007 - person and car classes |
3.6M |
8/4/4 |
41.51% |
3061.4 |
71 |
||
YOLOv2 |
224 |
WIDER FACE |
3.5M |
8/4/4 |
77.63% |
3053.1 |
71 |
Regression
Architecture |
Resolution |
Dataset |
#Params |
Quantization |
MAE |
Example |
Size (KB) |
NPs |
Download |
---|---|---|---|---|---|---|---|---|---|
VGG-like |
32 |
UTKFace (age estimation) |
458K |
8/2/2 |
6.1791 |
138.6 |
6 |
Face recognition
Architecture |
Resolution |
Dataset |
#Params |
Quantization |
Accuracy |
Size (KB) |
NPs |
Download |
---|---|---|---|---|---|---|---|---|
AkidaNet 0.5 |
112×96 |
CASIA Webface face identification |
2.3M |
8/4/4 |
70.18% |
1930.1 |
21 |
|
AkidaNet 0.5 edge |
112×96 |
CASIA Webface face identification |
23.6M |
8/4/4 |
71.13% |
6980.2 |
34 |
Audio domain
Keyword spotting
Architecture |
Dataset |
#Params |
Quantization |
Top-1 accuracy |
Example |
Size (KB) |
NPs |
Download |
---|---|---|---|---|---|---|---|---|
DS-CNN |
Google speech command |
22.7K |
8/4/4 |
91.72% |
23.1 |
5 |
Point cloud
Classification
Architecture |
Dataset |
#Params |
Quantization |
Accuracy |
Size (KB) |
NPs |
Download |
---|---|---|---|---|---|---|---|
PointNet++ |
ModelNet40 3D Point Cloud |
602K |
8/4/4 |
79.78% |
490.9 |
12 |
Akida 2.0 models
For 2.0 models, both 8-bit PTQ and 4-bit QAT numbers are given. When not explicitely stated 8-bit PTQ accuracy is given as is (ie no further tuning/training, only quantization and calibration). The 4-bit QAT is the same as for 1.0.
Note
The digit for quantization scheme stands for both weights and activations bitwidth. Weights in the first layer are always quantized to 8-bit. When given, ‘edge’ means that the model backbone output (before classification layer) is quantized to 1-bit to allow Akida edge learning.
Image domain
Classification
CNNs
Architecture |
Resolution |
Dataset |
#Params |
Quantization |
Accuracy |
Download |
---|---|---|---|---|---|---|
AkidaNet 0.25 |
160 |
ImageNet |
483K |
8 4 |
48.50% 41.60% |
|
AkidaNet 0.5 |
160 |
ImageNet |
1.4M |
8 4 |
61.86% 57.93% |
|
AkidaNet |
160 |
ImageNet |
4.4M |
8 4 |
69.94% 67.23% |
|
AkidaNet 0.25 |
224 |
ImageNet |
483K |
8 4 |
52.39% 46.06% |
|
AkidaNet 0.5 |
224 |
ImageNet |
1.4M |
8 4 |
64.88% 61.47% |
|
AkidaNet |
224 |
ImageNet |
4.4M |
8 4 |
72.16% 70.11% |
|
AkidaNet 0.5 |
224 |
PlantVillage |
1.2M |
8 4 |
99.61% 98.25% |
|
AkidaNet 0.25 |
96 |
Visual Wake Words |
227K |
8 4 |
87.03% 85.80% |
|
AkidaNet18 |
160 |
ImageNet |
2.4M |
8 |
64.77% |
|
AkidaNet18 |
224 |
ImageNet |
2.4M |
8 |
67.32% |
|
MobileNetV1 0.25 |
160 |
ImageNet |
469K |
8 4 |
45.72% 37.51% |
|
MobileNetV1 0.5 |
160 |
ImageNet |
1.3M |
8 4 |
60.27% 54.81% |
|
MobileNetV1 |
160 |
ImageNet |
4.2M |
8 4 |
69.02% 65.28% |
|
MobileNetV1 0.25 |
224 |
ImageNet |
469K |
8 4 |
49.63% 42.08% |
|
MobileNetV1 0.5 |
224 |
ImageNet |
1.3M |
8 4 |
63.65% 59.20% |
|
MobileNetV1 |
224 |
ImageNet |
4.2M |
8 4 |
71.18% 68.52% |
|
GXNOR |
28 |
MNIST |
1.6M |
4 |
98.81% |
Transformers
Architecture |
Resolution |
Dataset |
#Params |
Quantization |
Accuracy |
Download |
---|---|---|---|---|---|---|
ViT |
224 |
ImageNet |
5.8M |
8 |
73.79% 1 |
|
DeiT-dist |
224 |
ImageNet |
6.0M |
8 |
74.34% 1 |
Object detection
Architecture |
Resolution |
Dataset |
#Params |
Quantization |
mAP 50 |
mAP 75 |
mAP |
Download |
---|---|---|---|---|---|---|---|---|
YOLOv2 (AkidaNet 0.5 backbone) |
224 |
PASCAL-VOC 2007 |
3.6M |
8 4 |
50.96% 47.21% |
27.40% 23.21% |
27.79% 24.66% |
|
CenterNet (AkidaNet18 backbone) |
384 |
PASCAL-VOC 2007 |
2.4M |
8 |
70.32% |
47.30% |
43.88% |
|
YOLOv2 (AkidaNet 0.5 backbone) |
224 |
WIDER FACE |
3.6M |
8 4 |
80.19% 78.60% |
Regression
Architecture |
Resolution |
Dataset |
#Params |
Quantization |
MAE |
Download |
---|---|---|---|---|---|---|
VGG-like |
32 |
UTKFace (age estimation) |
458K |
8 4 |
6.0304 5.8858 |
Face recognition
Architecture |
Resolution |
Dataset |
#Params |
Quantization |
Accuracy |
Download |
---|---|---|---|---|---|---|
AkidaNet 0.5 |
112×96 |
CASIA Webface face identification |
2.3M |
8 4 |
72.83% 69.79% |
Segmentation
Architecture |
Resolution |
Dataset |
#Params |
Quantization |
Binary IOU |
Download |
---|---|---|---|---|---|---|
AkidaUNet 0.5 |
128 |
Portrait128 |
1.1M |
8 |
0.9057 2 |
- 2
PTQ accuracy boosted with 1 epoch QAT.
Audio domain
Keyword spotting
Architecture |
Dataset |
#Params |
Quantization |
Top-1 accuracy |
Download |
---|---|---|---|---|---|
DS-CNN |
Google speech command |
23.8K |
8 4 4 + edge |
92.87% 92.67% 90.61% |
Classification
Architecture |
Resolution |
Dataset |
#Params |
Quantization |
Accuracy |
Download |
---|---|---|---|---|---|---|
ViT (1 block) |
224 |
Urbansound8k |
539.9K |
8 |
97.73% 3 |
- 3
PTQ accuracy boosted with 5 epochs QAT.
Point cloud
Classification
Architecture |
Dataset |
#Params |
Quantization |
Accuracy |
Download |
---|---|---|---|---|---|
PointNet++ |
ModelNet40 3D Point Cloud |
605K |
8 4 |
80.88% 4 81.56% |
- 4
PTQ accuracy boosted with 5 epochs QAT.