Note
Go to the end to download the full example code
Build Vision Transformers for Akida
The Vision Transformer, or ViT, is a model for image classification that employs a Transformer-like architecture over patches of the image. An image is split into fixed-size patches, each of them are then linearly embedded, position embeddings are added, and the resulting sequence of vectors are fed to a standard Transformer encoder. Please refer to https://arxiv.org/abs/2010.11929 for further details.
Akida 2.0 now supports patch and position embeddings, and the encoder block in hardware. This tutorial explains how to build an optimized ViT using Akida models python API for Akida 2.0 hardware.
1. Model selection
There are many variants of ViT. The choice of the model is typically influenced by the tradeoff among architecture size, accuracy, inference speed, and training capabilities.
The following table shows few variants of commonly used ViT:
Architecture |
Original accuracy |
#Params |
Architecture |
---|---|---|---|
ViT Base |
79.90% |
86M |
12 heads, 12 blocks, hidden size 768 |
ViT Tiny |
75.48% |
5.8M |
3 heads, 12 blocks, hidden size 192 |
DeiT-dist Tiny |
74.17% |
5.8M |
3 heads, 12 blocks, hidden size 192 |
Note
The Vision Transformers support has been introduced in Akida 2.0.
The Akida model zoo provides tiny ViT architectures that are optimized to run on Akida hardware:
Both architectures have been modified so that their layers can be quantized to integer only operations.
2. Model optimization for Akida hardware
ViT has many encoder blocks that perform self-attention to process visual data. Each encoder block consists of many different layers. To optimally run ViT at the edge using Akida requires transforming this encoder block in the following way:
replace LayerNormalization with LayerMadNormalization,
replace the last LayerNormalization previous to the classification head with a BatchNormalization,
replace Softmax operation in Attention with a shiftmax operation.
Note
Sections below show different ways to train a ViT for Akida which uses the above transformations.
3. Model Training
Akida accelerates ViT model that has the transformation mentioned in Section 2. Training a ViT that optimally runs on Akida can be made possible in the following two ways:
3.1 Option 1: Training a ViT (original) model first and then transforming each layer incrementally
First, train a ViT (original) model on a custom dataset until satisfactory accuracy. It is then possible to transform this model into an Akida optimized one as per Section 2. The layers mentioned in Section 2 are functionally equivalent to each of the layers present in the original model.
Note
To overcome the accuracy drop from the original when transforming the model as per Section 2, it is recommended to replace the original layers all at once and to fine-tune afterwards.
The example below shows the transformation of ViT (tiny) into an optimized model that can run on the Akida hardware.
The akida_models python package provides a Command Line Interface (CLI) to transform vit_ti16 and deit_ti16 model architectures and fine-tune them respectively.
$ akida_models create vit_ti16 -h
usage: akida_models create vit_ti16 [-h] [-c CLASSES] [-bw BASE_WEIGHTS] [--norm {LN,GN1,BN,LMN}]
[--last_norm {LN,BN}] [--softmax {softmax,softmax2}]
[--act {GeLU,ReLU8,swish}] [-i {224,384}]
optional arguments:
-h, --help show this help message and exit
-c CLASSES, --classes CLASSES
The number of classes, by default 1000.
-bw BASE_WEIGHTS, --base_weights BASE_WEIGHTS
Optional keras weights to load in the model, by default None.
--norm {LN,GN1,BN,LMN}
Replace normalization in model with a custom function, by default LN
--last_norm {LN,BN} Replace last normalization in model with a custom function, by default LN
--softmax {softmax,softmax2}
Replace softmax operation in model with custom function, by default softmax
--act {GeLU,ReLU8,swish}
Replace activation function in model with custom function, by default GeLU
-i {224,384}, --image_size {224,384}
The square input image size
The following shows the transformation of a vit_ti16 model architecture which was trained on ImageNet. The same methods can be applied for other datasets.
# download the pre-trained weights
wget https://data.brainchip.com/models/AkidaV2/vit/vit_ti16_224.h5
# transformations: replace layer normalization with mad norm layer, last layer normalization
# with batch normalization, GeLU layer with ReLU and softmax with shiftmax layer
akida_models create -s vit_ti16_transformed.h5 vit_ti16 --norm LMN --last_norm BN --act ReLU8 \
--softmax softmax2 -bw vit_ti16_224.h5
# fine-tuning
imagenet_train tune -m vit_ti16_transformed.h5 -e 30 --optim Adam --lr_policy cosine_decay \
-lr 6e-5 -s vit_ti16_transformed.h5
The above transformation generates a ViT model that is optimized to run efficiently on Akida hardware. Similar steps can also be applied to deit_ti16. The table below highlights the accuracy of the original and transformed models.
Architecture |
Original accuracy |
Transformed accuracy |
---|---|---|
ViT |
75.48% |
74.25% |
DeiT-dist |
74.17% |
75.03% |
Note
The models obtained above have floating point weights and are ready to be quantized. See Section 4.
3.2 Option 2: Transfer Learning using Pre-trained transformed model
The Akida models python package has APIs for ViTs which provides pre-trained models for vit_ti16 and deit_ti16. These models can be used for Transfer Learning on a custom dataset. Since the above models are already transformed, no further transformation is required.
Visit our Transfer Learning Example to learn more about Transfer Learning using the Akida models python package. The following code snippet downloads a pre-trained model that can be used for Transfer Learning.
# The following is the API download the vit_t16 model trained on ImageNet dataset
from akida_models import fetch_file
from akida_models.model_io import load_model
# Retrieve the float model with pretrained weights and load it
model_file = fetch_file(
fname="bc_vit_ti16_224.h5",
origin="https://data.brainchip.com/models/AkidaV2/vit/bc_vit_ti16_224.h5",
cache_subdir='models/akidanet_imagenet')
model_keras = load_model(model_file)
model_keras.summary()
Downloading data from https://data.brainchip.com/models/AkidaV2/vit/bc_vit_ti16_224.h5.
0/23695632 [..............................] - ETA: 0s
114688/23695632 [..............................] - ETA: 10s
630784/23695632 [..............................] - ETA: 3s
1212416/23695632 [>.............................] - ETA: 2s
1785856/23695632 [=>............................] - ETA: 2s
2359296/23695632 [=>............................] - ETA: 2s
2908160/23695632 [==>...........................] - ETA: 2s
3457024/23695632 [===>..........................] - ETA: 2s
4005888/23695632 [====>.........................] - ETA: 1s
4562944/23695632 [====>.........................] - ETA: 1s
5103616/23695632 [=====>........................] - ETA: 1s
5644288/23695632 [======>.......................] - ETA: 1s
6201344/23695632 [======>.......................] - ETA: 1s
6774784/23695632 [=======>......................] - ETA: 1s
7348224/23695632 [========>.....................] - ETA: 1s
7913472/23695632 [=========>....................] - ETA: 1s
8478720/23695632 [=========>....................] - ETA: 1s
9035776/23695632 [==========>...................] - ETA: 1s
9592832/23695632 [===========>..................] - ETA: 1s
10149888/23695632 [===========>..................] - ETA: 1s
10739712/23695632 [============>.................] - ETA: 1s
11329536/23695632 [=============>................] - ETA: 1s
11919360/23695632 [==============>...............] - ETA: 1s
12509184/23695632 [==============>...............] - ETA: 1s
13099008/23695632 [===============>..............] - ETA: 0s
13688832/23695632 [================>.............] - ETA: 0s
14278656/23695632 [=================>............] - ETA: 0s
14868480/23695632 [=================>............] - ETA: 0s
15458304/23695632 [==================>...........] - ETA: 0s
16048128/23695632 [===================>..........] - ETA: 0s
16637952/23695632 [====================>.........] - ETA: 0s
17227776/23695632 [====================>.........] - ETA: 0s
17817600/23695632 [=====================>........] - ETA: 0s
18407424/23695632 [======================>.......] - ETA: 0s
18997248/23695632 [=======================>......] - ETA: 0s
19587072/23695632 [=======================>......] - ETA: 0s
20176896/23695632 [========================>.....] - ETA: 0s
20766720/23695632 [=========================>....] - ETA: 0s
21356544/23695632 [==========================>...] - ETA: 0s
21946368/23695632 [==========================>...] - ETA: 0s
22306816/23695632 [===========================>..] - ETA: 0s
22454272/23695632 [===========================>..] - ETA: 0s
23011328/23695632 [============================>.] - ETA: 0s
23601152/23695632 [============================>.] - ETA: 0s
23695632/23695632 [==============================] - 2s 0us/step
/usr/local/lib/python3.8/dist-packages/keras/initializers/initializers.py:120: UserWarning: The initializer TruncatedNormal is unseeded and being called multiple times, which will return identical values each time (even if the initializer is unseeded). Please update your code to provide a seed to the initializer, or avoid using the same initalizer instance more than once.
warnings.warn(
Model: "vit-tiny"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input (InputLayer) [(None, 224, 224, 3 0 []
)]
Rescale (Rescaling) (None, 224, 224, 3) 0 ['input[0][0]']
Embedding (Conv2D) (None, 14, 14, 192) 147648 ['Rescale[0][0]']
reshape (Reshape) (None, 196, 192) 0 ['Embedding[0][0]']
ClassToken (ClassToken) (None, 197, 192) 192 ['reshape[0][0]']
Transformer/PosEmbed (AddPosit (None, 197, 192) 37824 ['ClassToken[0][0]']
ionEmbs)
Transformer/EncoderBlock_0/Lay (None, 197, 192) 384 ['Transformer/PosEmbed[0][0]']
erNorm_0 (LayerMadNormalizatio
n)
Transformer/EncoderBlock_0/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_0/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (Dense)
Transformer/EncoderBlock_0/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_0/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (Dense)
Transformer/EncoderBlock_0/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_0/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (Dense)
Transformer/EncoderBlock_0/Mul ((None, 197, 192), 0 ['Transformer/EncoderBlock_0/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (Attention) )) 0][0]',
'Transformer/EncoderBlock_0/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_0/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_0/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_0/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (Dense) ion[0][0]']
dropout (Dropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_0/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_0/add (None, 197, 192) 0 ['dropout[0][0]',
_1 (Add) 'Transformer/PosEmbed[0][0]']
Transformer/EncoderBlock_0/Lay (None, 197, 192) 384 ['Transformer/EncoderBlock_0/add_
erNorm_2 (LayerMadNormalizatio 1[0][0]']
n)
Transformer/EncoderBlock_0/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_0/Laye
Block/Dense_0 (Dense) rNorm_2[0][0]']
Transformer/EncoderBlock_0/Mlp (None, 197, 768) 0 ['Transformer/EncoderBlock_0/MlpB
Block/activation (ReLU) lock/Dense_0[0][0]']
dropout_1 (Dropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_0/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_0/Mlp (None, 197, 192) 147648 ['dropout_1[0][0]']
Block/Dense_1 (Dense)
dropout_2 (Dropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_0/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_0/add (None, 197, 192) 0 ['Transformer/EncoderBlock_0/add_
_2 (Add) 1[0][0]',
'dropout_2[0][0]']
Transformer/EncoderBlock_1/Lay (None, 197, 192) 384 ['Transformer/EncoderBlock_0/add_
erNorm_0 (LayerMadNormalizatio 2[0][0]']
n)
Transformer/EncoderBlock_1/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_1/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (Dense)
Transformer/EncoderBlock_1/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_1/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (Dense)
Transformer/EncoderBlock_1/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_1/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (Dense)
Transformer/EncoderBlock_1/Mul ((None, 197, 192), 0 ['Transformer/EncoderBlock_1/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (Attention) )) 0][0]',
'Transformer/EncoderBlock_1/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_1/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_1/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_1/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (Dense) ion[0][0]']
dropout_3 (Dropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_1/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_1/add (None, 197, 192) 0 ['dropout_3[0][0]',
_1 (Add) 'Transformer/EncoderBlock_0/add_
2[0][0]']
Transformer/EncoderBlock_1/Lay (None, 197, 192) 384 ['Transformer/EncoderBlock_1/add_
erNorm_2 (LayerMadNormalizatio 1[0][0]']
n)
Transformer/EncoderBlock_1/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_1/Laye
Block/Dense_0 (Dense) rNorm_2[0][0]']
Transformer/EncoderBlock_1/Mlp (None, 197, 768) 0 ['Transformer/EncoderBlock_1/MlpB
Block/activation (ReLU) lock/Dense_0[0][0]']
dropout_4 (Dropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_1/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_1/Mlp (None, 197, 192) 147648 ['dropout_4[0][0]']
Block/Dense_1 (Dense)
dropout_5 (Dropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_1/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_1/add (None, 197, 192) 0 ['Transformer/EncoderBlock_1/add_
_2 (Add) 1[0][0]',
'dropout_5[0][0]']
Transformer/EncoderBlock_2/Lay (None, 197, 192) 384 ['Transformer/EncoderBlock_1/add_
erNorm_0 (LayerMadNormalizatio 2[0][0]']
n)
Transformer/EncoderBlock_2/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_2/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (Dense)
Transformer/EncoderBlock_2/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_2/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (Dense)
Transformer/EncoderBlock_2/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_2/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (Dense)
Transformer/EncoderBlock_2/Mul ((None, 197, 192), 0 ['Transformer/EncoderBlock_2/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (Attention) )) 0][0]',
'Transformer/EncoderBlock_2/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_2/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_2/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_2/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (Dense) ion[0][0]']
dropout_6 (Dropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_2/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_2/add (None, 197, 192) 0 ['dropout_6[0][0]',
_1 (Add) 'Transformer/EncoderBlock_1/add_
2[0][0]']
Transformer/EncoderBlock_2/Lay (None, 197, 192) 384 ['Transformer/EncoderBlock_2/add_
erNorm_2 (LayerMadNormalizatio 1[0][0]']
n)
Transformer/EncoderBlock_2/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_2/Laye
Block/Dense_0 (Dense) rNorm_2[0][0]']
Transformer/EncoderBlock_2/Mlp (None, 197, 768) 0 ['Transformer/EncoderBlock_2/MlpB
Block/activation (ReLU) lock/Dense_0[0][0]']
dropout_7 (Dropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_2/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_2/Mlp (None, 197, 192) 147648 ['dropout_7[0][0]']
Block/Dense_1 (Dense)
dropout_8 (Dropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_2/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_2/add (None, 197, 192) 0 ['Transformer/EncoderBlock_2/add_
_2 (Add) 1[0][0]',
'dropout_8[0][0]']
Transformer/EncoderBlock_3/Lay (None, 197, 192) 384 ['Transformer/EncoderBlock_2/add_
erNorm_0 (LayerMadNormalizatio 2[0][0]']
n)
Transformer/EncoderBlock_3/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_3/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (Dense)
Transformer/EncoderBlock_3/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_3/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (Dense)
Transformer/EncoderBlock_3/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_3/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (Dense)
Transformer/EncoderBlock_3/Mul ((None, 197, 192), 0 ['Transformer/EncoderBlock_3/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (Attention) )) 0][0]',
'Transformer/EncoderBlock_3/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_3/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_3/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_3/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (Dense) ion[0][0]']
dropout_9 (Dropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_3/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_3/add (None, 197, 192) 0 ['dropout_9[0][0]',
_1 (Add) 'Transformer/EncoderBlock_2/add_
2[0][0]']
Transformer/EncoderBlock_3/Lay (None, 197, 192) 384 ['Transformer/EncoderBlock_3/add_
erNorm_2 (LayerMadNormalizatio 1[0][0]']
n)
Transformer/EncoderBlock_3/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_3/Laye
Block/Dense_0 (Dense) rNorm_2[0][0]']
Transformer/EncoderBlock_3/Mlp (None, 197, 768) 0 ['Transformer/EncoderBlock_3/MlpB
Block/activation (ReLU) lock/Dense_0[0][0]']
dropout_10 (Dropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_3/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_3/Mlp (None, 197, 192) 147648 ['dropout_10[0][0]']
Block/Dense_1 (Dense)
dropout_11 (Dropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_3/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_3/add (None, 197, 192) 0 ['Transformer/EncoderBlock_3/add_
_2 (Add) 1[0][0]',
'dropout_11[0][0]']
Transformer/EncoderBlock_4/Lay (None, 197, 192) 384 ['Transformer/EncoderBlock_3/add_
erNorm_0 (LayerMadNormalizatio 2[0][0]']
n)
Transformer/EncoderBlock_4/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_4/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (Dense)
Transformer/EncoderBlock_4/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_4/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (Dense)
Transformer/EncoderBlock_4/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_4/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (Dense)
Transformer/EncoderBlock_4/Mul ((None, 197, 192), 0 ['Transformer/EncoderBlock_4/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (Attention) )) 0][0]',
'Transformer/EncoderBlock_4/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_4/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_4/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_4/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (Dense) ion[0][0]']
dropout_12 (Dropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_4/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_4/add (None, 197, 192) 0 ['dropout_12[0][0]',
_1 (Add) 'Transformer/EncoderBlock_3/add_
2[0][0]']
Transformer/EncoderBlock_4/Lay (None, 197, 192) 384 ['Transformer/EncoderBlock_4/add_
erNorm_2 (LayerMadNormalizatio 1[0][0]']
n)
Transformer/EncoderBlock_4/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_4/Laye
Block/Dense_0 (Dense) rNorm_2[0][0]']
Transformer/EncoderBlock_4/Mlp (None, 197, 768) 0 ['Transformer/EncoderBlock_4/MlpB
Block/activation (ReLU) lock/Dense_0[0][0]']
dropout_13 (Dropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_4/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_4/Mlp (None, 197, 192) 147648 ['dropout_13[0][0]']
Block/Dense_1 (Dense)
dropout_14 (Dropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_4/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_4/add (None, 197, 192) 0 ['Transformer/EncoderBlock_4/add_
_2 (Add) 1[0][0]',
'dropout_14[0][0]']
Transformer/EncoderBlock_5/Lay (None, 197, 192) 384 ['Transformer/EncoderBlock_4/add_
erNorm_0 (LayerMadNormalizatio 2[0][0]']
n)
Transformer/EncoderBlock_5/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_5/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (Dense)
Transformer/EncoderBlock_5/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_5/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (Dense)
Transformer/EncoderBlock_5/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_5/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (Dense)
Transformer/EncoderBlock_5/Mul ((None, 197, 192), 0 ['Transformer/EncoderBlock_5/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (Attention) )) 0][0]',
'Transformer/EncoderBlock_5/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_5/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_5/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_5/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (Dense) ion[0][0]']
dropout_15 (Dropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_5/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_5/add (None, 197, 192) 0 ['dropout_15[0][0]',
_1 (Add) 'Transformer/EncoderBlock_4/add_
2[0][0]']
Transformer/EncoderBlock_5/Lay (None, 197, 192) 384 ['Transformer/EncoderBlock_5/add_
erNorm_2 (LayerMadNormalizatio 1[0][0]']
n)
Transformer/EncoderBlock_5/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_5/Laye
Block/Dense_0 (Dense) rNorm_2[0][0]']
Transformer/EncoderBlock_5/Mlp (None, 197, 768) 0 ['Transformer/EncoderBlock_5/MlpB
Block/activation (ReLU) lock/Dense_0[0][0]']
dropout_16 (Dropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_5/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_5/Mlp (None, 197, 192) 147648 ['dropout_16[0][0]']
Block/Dense_1 (Dense)
dropout_17 (Dropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_5/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_5/add (None, 197, 192) 0 ['Transformer/EncoderBlock_5/add_
_2 (Add) 1[0][0]',
'dropout_17[0][0]']
Transformer/EncoderBlock_6/Lay (None, 197, 192) 384 ['Transformer/EncoderBlock_5/add_
erNorm_0 (LayerMadNormalizatio 2[0][0]']
n)
Transformer/EncoderBlock_6/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_6/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (Dense)
Transformer/EncoderBlock_6/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_6/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (Dense)
Transformer/EncoderBlock_6/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_6/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (Dense)
Transformer/EncoderBlock_6/Mul ((None, 197, 192), 0 ['Transformer/EncoderBlock_6/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (Attention) )) 0][0]',
'Transformer/EncoderBlock_6/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_6/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_6/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_6/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (Dense) ion[0][0]']
dropout_18 (Dropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_6/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_6/add (None, 197, 192) 0 ['dropout_18[0][0]',
_1 (Add) 'Transformer/EncoderBlock_5/add_
2[0][0]']
Transformer/EncoderBlock_6/Lay (None, 197, 192) 384 ['Transformer/EncoderBlock_6/add_
erNorm_2 (LayerMadNormalizatio 1[0][0]']
n)
Transformer/EncoderBlock_6/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_6/Laye
Block/Dense_0 (Dense) rNorm_2[0][0]']
Transformer/EncoderBlock_6/Mlp (None, 197, 768) 0 ['Transformer/EncoderBlock_6/MlpB
Block/activation (ReLU) lock/Dense_0[0][0]']
dropout_19 (Dropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_6/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_6/Mlp (None, 197, 192) 147648 ['dropout_19[0][0]']
Block/Dense_1 (Dense)
dropout_20 (Dropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_6/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_6/add (None, 197, 192) 0 ['Transformer/EncoderBlock_6/add_
_2 (Add) 1[0][0]',
'dropout_20[0][0]']
Transformer/EncoderBlock_7/Lay (None, 197, 192) 384 ['Transformer/EncoderBlock_6/add_
erNorm_0 (LayerMadNormalizatio 2[0][0]']
n)
Transformer/EncoderBlock_7/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_7/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (Dense)
Transformer/EncoderBlock_7/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_7/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (Dense)
Transformer/EncoderBlock_7/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_7/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (Dense)
Transformer/EncoderBlock_7/Mul ((None, 197, 192), 0 ['Transformer/EncoderBlock_7/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (Attention) )) 0][0]',
'Transformer/EncoderBlock_7/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_7/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_7/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_7/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (Dense) ion[0][0]']
dropout_21 (Dropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_7/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_7/add (None, 197, 192) 0 ['dropout_21[0][0]',
_1 (Add) 'Transformer/EncoderBlock_6/add_
2[0][0]']
Transformer/EncoderBlock_7/Lay (None, 197, 192) 384 ['Transformer/EncoderBlock_7/add_
erNorm_2 (LayerMadNormalizatio 1[0][0]']
n)
Transformer/EncoderBlock_7/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_7/Laye
Block/Dense_0 (Dense) rNorm_2[0][0]']
Transformer/EncoderBlock_7/Mlp (None, 197, 768) 0 ['Transformer/EncoderBlock_7/MlpB
Block/activation (ReLU) lock/Dense_0[0][0]']
dropout_22 (Dropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_7/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_7/Mlp (None, 197, 192) 147648 ['dropout_22[0][0]']
Block/Dense_1 (Dense)
dropout_23 (Dropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_7/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_7/add (None, 197, 192) 0 ['Transformer/EncoderBlock_7/add_
_2 (Add) 1[0][0]',
'dropout_23[0][0]']
Transformer/EncoderBlock_8/Lay (None, 197, 192) 384 ['Transformer/EncoderBlock_7/add_
erNorm_0 (LayerMadNormalizatio 2[0][0]']
n)
Transformer/EncoderBlock_8/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_8/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (Dense)
Transformer/EncoderBlock_8/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_8/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (Dense)
Transformer/EncoderBlock_8/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_8/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (Dense)
Transformer/EncoderBlock_8/Mul ((None, 197, 192), 0 ['Transformer/EncoderBlock_8/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (Attention) )) 0][0]',
'Transformer/EncoderBlock_8/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_8/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_8/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_8/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (Dense) ion[0][0]']
dropout_24 (Dropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_8/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_8/add (None, 197, 192) 0 ['dropout_24[0][0]',
_1 (Add) 'Transformer/EncoderBlock_7/add_
2[0][0]']
Transformer/EncoderBlock_8/Lay (None, 197, 192) 384 ['Transformer/EncoderBlock_8/add_
erNorm_2 (LayerMadNormalizatio 1[0][0]']
n)
Transformer/EncoderBlock_8/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_8/Laye
Block/Dense_0 (Dense) rNorm_2[0][0]']
Transformer/EncoderBlock_8/Mlp (None, 197, 768) 0 ['Transformer/EncoderBlock_8/MlpB
Block/activation (ReLU) lock/Dense_0[0][0]']
dropout_25 (Dropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_8/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_8/Mlp (None, 197, 192) 147648 ['dropout_25[0][0]']
Block/Dense_1 (Dense)
dropout_26 (Dropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_8/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_8/add (None, 197, 192) 0 ['Transformer/EncoderBlock_8/add_
_2 (Add) 1[0][0]',
'dropout_26[0][0]']
Transformer/EncoderBlock_9/Lay (None, 197, 192) 384 ['Transformer/EncoderBlock_8/add_
erNorm_0 (LayerMadNormalizatio 2[0][0]']
n)
Transformer/EncoderBlock_9/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_9/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (Dense)
Transformer/EncoderBlock_9/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_9/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (Dense)
Transformer/EncoderBlock_9/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_9/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (Dense)
Transformer/EncoderBlock_9/Mul ((None, 197, 192), 0 ['Transformer/EncoderBlock_9/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (Attention) )) 0][0]',
'Transformer/EncoderBlock_9/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_9/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_9/Mul (None, 197, 192) 37056 ['Transformer/EncoderBlock_9/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (Dense) ion[0][0]']
dropout_27 (Dropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_9/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_9/add (None, 197, 192) 0 ['dropout_27[0][0]',
_1 (Add) 'Transformer/EncoderBlock_8/add_
2[0][0]']
Transformer/EncoderBlock_9/Lay (None, 197, 192) 384 ['Transformer/EncoderBlock_9/add_
erNorm_2 (LayerMadNormalizatio 1[0][0]']
n)
Transformer/EncoderBlock_9/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_9/Laye
Block/Dense_0 (Dense) rNorm_2[0][0]']
Transformer/EncoderBlock_9/Mlp (None, 197, 768) 0 ['Transformer/EncoderBlock_9/MlpB
Block/activation (ReLU) lock/Dense_0[0][0]']
dropout_28 (Dropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_9/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_9/Mlp (None, 197, 192) 147648 ['dropout_28[0][0]']
Block/Dense_1 (Dense)
dropout_29 (Dropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_9/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_9/add (None, 197, 192) 0 ['Transformer/EncoderBlock_9/add_
_2 (Add) 1[0][0]',
'dropout_29[0][0]']
Transformer/EncoderBlock_10/La (None, 197, 192) 384 ['Transformer/EncoderBlock_9/add_
yerNorm_0 (LayerMadNormalizati 2[0][0]']
on)
Transformer/EncoderBlock_10/Mu (None, 197, 192) 37056 ['Transformer/EncoderBlock_10/Lay
ltiHeadDotProductAttention_1/q erNorm_0[0][0]']
uery (Dense)
Transformer/EncoderBlock_10/Mu (None, 197, 192) 37056 ['Transformer/EncoderBlock_10/Lay
ltiHeadDotProductAttention_1/k erNorm_0[0][0]']
ey (Dense)
Transformer/EncoderBlock_10/Mu (None, 197, 192) 37056 ['Transformer/EncoderBlock_10/Lay
ltiHeadDotProductAttention_1/v erNorm_0[0][0]']
alue (Dense)
Transformer/EncoderBlock_10/Mu ((None, 197, 192), 0 ['Transformer/EncoderBlock_10/Mul
ltiHeadDotProductAttention_1/a (None, 3, 197, 197 tiHeadDotProductAttention_1/query
ttention (Attention) )) [0][0]',
'Transformer/EncoderBlock_10/Mul
tiHeadDotProductAttention_1/key[0
][0]',
'Transformer/EncoderBlock_10/Mul
tiHeadDotProductAttention_1/value
[0][0]']
Transformer/EncoderBlock_10/Mu (None, 197, 192) 37056 ['Transformer/EncoderBlock_10/Mul
ltiHeadDotProductAttention_1/o tiHeadDotProductAttention_1/atten
ut (Dense) tion[0][0]']
dropout_30 (Dropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_10/Mul
tiHeadDotProductAttention_1/out[0
][0]']
Transformer/EncoderBlock_10/ad (None, 197, 192) 0 ['dropout_30[0][0]',
d_1 (Add) 'Transformer/EncoderBlock_9/add_
2[0][0]']
Transformer/EncoderBlock_10/La (None, 197, 192) 384 ['Transformer/EncoderBlock_10/add
yerNorm_2 (LayerMadNormalizati _1[0][0]']
on)
Transformer/EncoderBlock_10/Ml (None, 197, 768) 148224 ['Transformer/EncoderBlock_10/Lay
pBlock/Dense_0 (Dense) erNorm_2[0][0]']
Transformer/EncoderBlock_10/Ml (None, 197, 768) 0 ['Transformer/EncoderBlock_10/Mlp
pBlock/activation (ReLU) Block/Dense_0[0][0]']
dropout_31 (Dropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_10/Mlp
Block/activation[0][0]']
Transformer/EncoderBlock_10/Ml (None, 197, 192) 147648 ['dropout_31[0][0]']
pBlock/Dense_1 (Dense)
dropout_32 (Dropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_10/Mlp
Block/Dense_1[0][0]']
Transformer/EncoderBlock_10/ad (None, 197, 192) 0 ['Transformer/EncoderBlock_10/add
d_2 (Add) _1[0][0]',
'dropout_32[0][0]']
Transformer/EncoderBlock_11/La (None, 197, 192) 384 ['Transformer/EncoderBlock_10/add
yerNorm_0 (LayerMadNormalizati _2[0][0]']
on)
Transformer/EncoderBlock_11/Mu (None, 197, 192) 37056 ['Transformer/EncoderBlock_11/Lay
ltiHeadDotProductAttention_1/q erNorm_0[0][0]']
uery (Dense)
Transformer/EncoderBlock_11/Mu (None, 197, 192) 37056 ['Transformer/EncoderBlock_11/Lay
ltiHeadDotProductAttention_1/k erNorm_0[0][0]']
ey (Dense)
Transformer/EncoderBlock_11/Mu (None, 197, 192) 37056 ['Transformer/EncoderBlock_11/Lay
ltiHeadDotProductAttention_1/v erNorm_0[0][0]']
alue (Dense)
Transformer/EncoderBlock_11/Mu ((None, 197, 192), 0 ['Transformer/EncoderBlock_11/Mul
ltiHeadDotProductAttention_1/a (None, 3, 197, 197 tiHeadDotProductAttention_1/query
ttention (Attention) )) [0][0]',
'Transformer/EncoderBlock_11/Mul
tiHeadDotProductAttention_1/key[0
][0]',
'Transformer/EncoderBlock_11/Mul
tiHeadDotProductAttention_1/value
[0][0]']
Transformer/EncoderBlock_11/Mu (None, 197, 192) 37056 ['Transformer/EncoderBlock_11/Mul
ltiHeadDotProductAttention_1/o tiHeadDotProductAttention_1/atten
ut (Dense) tion[0][0]']
dropout_33 (Dropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_11/Mul
tiHeadDotProductAttention_1/out[0
][0]']
Transformer/EncoderBlock_11/ad (None, 197, 192) 0 ['dropout_33[0][0]',
d_1 (Add) 'Transformer/EncoderBlock_10/add
_2[0][0]']
Transformer/EncoderBlock_11/La (None, 197, 192) 384 ['Transformer/EncoderBlock_11/add
yerNorm_2 (LayerMadNormalizati _1[0][0]']
on)
Transformer/EncoderBlock_11/Ml (None, 197, 768) 148224 ['Transformer/EncoderBlock_11/Lay
pBlock/Dense_0 (Dense) erNorm_2[0][0]']
Transformer/EncoderBlock_11/Ml (None, 197, 768) 0 ['Transformer/EncoderBlock_11/Mlp
pBlock/activation (ReLU) Block/Dense_0[0][0]']
dropout_34 (Dropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_11/Mlp
Block/activation[0][0]']
Transformer/EncoderBlock_11/Ml (None, 197, 192) 147648 ['dropout_34[0][0]']
pBlock/Dense_1 (Dense)
dropout_35 (Dropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_11/Mlp
Block/Dense_1[0][0]']
Transformer/EncoderBlock_11/ad (None, 197, 192) 0 ['Transformer/EncoderBlock_11/add
d_2 (Add) _1[0][0]',
'dropout_35[0][0]']
Transformer/EncoderNorm (Batch (None, 197, 192) 768 ['Transformer/EncoderBlock_11/add
Normalization) _2[0][0]']
ExtractToken (ExtractToken) (None, 192) 0 ['Transformer/EncoderNorm[0][0]']
Head (Dense) (None, 1000) 193000 ['ExtractToken[0][0]']
==================================================================================================
Total params: 5,717,800
Trainable params: 5,717,416
Non-trainable params: 384
__________________________________________________________________________________________________
Note
The models in Section 3 have floating point weights. Once the desired accuracy is obtained, these models should go through quantization before converting to Akida.
4. Model quantization
Akida 2.0 hardware adds efficient processing of 8-bit weights and activations for Vision Transformer models. This requires models in Section 3 to be quantized to 8-bit integer numbers. This means both weights and activation outputs become 8-bit integer numbers. This results in a smaller model with minimal to no drop in accuracy and achieves improvements in latency and power when running on Akida hardware.
Quantization of ViT models can be done using QuantizeML python package using either Post Training Quantization (PTQ) or Quantization Aware Training (QAT) methods. The following section shows quantization an example, quantization of vit_ti16 trained on ImageNet dataset.
4.1 Post-Training Quantization
Using QuantizeML python package, ViT model can be quantized to 8-bit integer numbers (both weights and activation outputs). PTQ requires calibration (ideally using reference data) which helps to determine optimal quantization ranges. To learn more about PTQ, refer to Advanced QuantizeML tutorial.
# Using QuantizeML to perform quantization
from quantizeml.models import quantize
from quantizeml.layers import QuantizationParams
# Define the quantization parameters.
qparams = QuantizationParams(weight_bits=8, activation_bits=8)
# Quantize the model defined in Section 3.2
model_quantized = quantize(model_keras, qparams=qparams)
model_quantized.summary()
1/1024 [..............................] - ETA: 1:08:19
6/1024 [..............................] - ETA: 12s
11/1024 [..............................] - ETA: 12s
16/1024 [..............................] - ETA: 12s
21/1024 [..............................] - ETA: 11s
26/1024 [..............................] - ETA: 11s
31/1024 [..............................] - ETA: 11s
36/1024 [>.............................] - ETA: 11s
41/1024 [>.............................] - ETA: 11s
46/1024 [>.............................] - ETA: 11s
51/1024 [>.............................] - ETA: 11s
56/1024 [>.............................] - ETA: 11s
61/1024 [>.............................] - ETA: 11s
66/1024 [>.............................] - ETA: 11s
71/1024 [=>............................] - ETA: 11s
76/1024 [=>............................] - ETA: 11s
81/1024 [=>............................] - ETA: 11s
86/1024 [=>............................] - ETA: 11s
91/1024 [=>............................] - ETA: 11s
96/1024 [=>............................] - ETA: 11s
101/1024 [=>............................] - ETA: 10s
106/1024 [==>...........................] - ETA: 10s
111/1024 [==>...........................] - ETA: 10s
116/1024 [==>...........................] - ETA: 10s
121/1024 [==>...........................] - ETA: 10s
126/1024 [==>...........................] - ETA: 10s
131/1024 [==>...........................] - ETA: 10s
136/1024 [==>...........................] - ETA: 10s
141/1024 [===>..........................] - ETA: 10s
146/1024 [===>..........................] - ETA: 10s
151/1024 [===>..........................] - ETA: 10s
156/1024 [===>..........................] - ETA: 10s
161/1024 [===>..........................] - ETA: 10s
166/1024 [===>..........................] - ETA: 10s
171/1024 [====>.........................] - ETA: 10s
176/1024 [====>.........................] - ETA: 10s
181/1024 [====>.........................] - ETA: 10s
186/1024 [====>.........................] - ETA: 9s
191/1024 [====>.........................] - ETA: 9s
196/1024 [====>.........................] - ETA: 9s
201/1024 [====>.........................] - ETA: 9s
206/1024 [=====>........................] - ETA: 9s
211/1024 [=====>........................] - ETA: 9s
216/1024 [=====>........................] - ETA: 9s
221/1024 [=====>........................] - ETA: 9s
226/1024 [=====>........................] - ETA: 9s
231/1024 [=====>........................] - ETA: 9s
236/1024 [=====>........................] - ETA: 9s
241/1024 [======>.......................] - ETA: 9s
246/1024 [======>.......................] - ETA: 9s
251/1024 [======>.......................] - ETA: 9s
256/1024 [======>.......................] - ETA: 9s
261/1024 [======>.......................] - ETA: 9s
266/1024 [======>.......................] - ETA: 9s
271/1024 [======>.......................] - ETA: 8s
276/1024 [=======>......................] - ETA: 8s
281/1024 [=======>......................] - ETA: 8s
286/1024 [=======>......................] - ETA: 8s
291/1024 [=======>......................] - ETA: 8s
296/1024 [=======>......................] - ETA: 8s
301/1024 [=======>......................] - ETA: 8s
306/1024 [=======>......................] - ETA: 8s
311/1024 [========>.....................] - ETA: 8s
316/1024 [========>.....................] - ETA: 8s
321/1024 [========>.....................] - ETA: 8s
326/1024 [========>.....................] - ETA: 8s
331/1024 [========>.....................] - ETA: 8s
336/1024 [========>.....................] - ETA: 8s
341/1024 [========>.....................] - ETA: 8s
346/1024 [=========>....................] - ETA: 8s
351/1024 [=========>....................] - ETA: 8s
356/1024 [=========>....................] - ETA: 7s
361/1024 [=========>....................] - ETA: 7s
366/1024 [=========>....................] - ETA: 7s
371/1024 [=========>....................] - ETA: 7s
376/1024 [==========>...................] - ETA: 7s
381/1024 [==========>...................] - ETA: 7s
386/1024 [==========>...................] - ETA: 7s
391/1024 [==========>...................] - ETA: 7s
396/1024 [==========>...................] - ETA: 7s
401/1024 [==========>...................] - ETA: 7s
406/1024 [==========>...................] - ETA: 7s
411/1024 [===========>..................] - ETA: 7s
416/1024 [===========>..................] - ETA: 7s
421/1024 [===========>..................] - ETA: 7s
426/1024 [===========>..................] - ETA: 7s
431/1024 [===========>..................] - ETA: 7s
436/1024 [===========>..................] - ETA: 7s
441/1024 [===========>..................] - ETA: 6s
446/1024 [============>.................] - ETA: 6s
451/1024 [============>.................] - ETA: 6s
456/1024 [============>.................] - ETA: 6s
461/1024 [============>.................] - ETA: 6s
466/1024 [============>.................] - ETA: 6s
471/1024 [============>.................] - ETA: 6s
476/1024 [============>.................] - ETA: 6s
481/1024 [=============>................] - ETA: 6s
486/1024 [=============>................] - ETA: 6s
491/1024 [=============>................] - ETA: 6s
496/1024 [=============>................] - ETA: 6s
501/1024 [=============>................] - ETA: 6s
506/1024 [=============>................] - ETA: 6s
511/1024 [=============>................] - ETA: 6s
516/1024 [==============>...............] - ETA: 6s
521/1024 [==============>...............] - ETA: 5s
526/1024 [==============>...............] - ETA: 5s
531/1024 [==============>...............] - ETA: 5s
536/1024 [==============>...............] - ETA: 5s
541/1024 [==============>...............] - ETA: 5s
546/1024 [==============>...............] - ETA: 5s
551/1024 [===============>..............] - ETA: 5s
556/1024 [===============>..............] - ETA: 5s
561/1024 [===============>..............] - ETA: 5s
566/1024 [===============>..............] - ETA: 5s
571/1024 [===============>..............] - ETA: 5s
576/1024 [===============>..............] - ETA: 5s
581/1024 [================>.............] - ETA: 5s
586/1024 [================>.............] - ETA: 5s
591/1024 [================>.............] - ETA: 5s
596/1024 [================>.............] - ETA: 5s
601/1024 [================>.............] - ETA: 5s
606/1024 [================>.............] - ETA: 4s
611/1024 [================>.............] - ETA: 4s
616/1024 [=================>............] - ETA: 4s
621/1024 [=================>............] - ETA: 4s
626/1024 [=================>............] - ETA: 4s
631/1024 [=================>............] - ETA: 4s
636/1024 [=================>............] - ETA: 4s
641/1024 [=================>............] - ETA: 4s
646/1024 [=================>............] - ETA: 4s
651/1024 [==================>...........] - ETA: 4s
656/1024 [==================>...........] - ETA: 4s
661/1024 [==================>...........] - ETA: 4s
666/1024 [==================>...........] - ETA: 4s
671/1024 [==================>...........] - ETA: 4s
676/1024 [==================>...........] - ETA: 4s
681/1024 [==================>...........] - ETA: 4s
686/1024 [===================>..........] - ETA: 4s
691/1024 [===================>..........] - ETA: 3s
696/1024 [===================>..........] - ETA: 3s
701/1024 [===================>..........] - ETA: 3s
706/1024 [===================>..........] - ETA: 3s
711/1024 [===================>..........] - ETA: 3s
716/1024 [===================>..........] - ETA: 3s
721/1024 [====================>.........] - ETA: 3s
726/1024 [====================>.........] - ETA: 3s
731/1024 [====================>.........] - ETA: 3s
736/1024 [====================>.........] - ETA: 3s
741/1024 [====================>.........] - ETA: 3s
746/1024 [====================>.........] - ETA: 3s
751/1024 [=====================>........] - ETA: 3s
756/1024 [=====================>........] - ETA: 3s
761/1024 [=====================>........] - ETA: 3s
766/1024 [=====================>........] - ETA: 3s
771/1024 [=====================>........] - ETA: 3s
776/1024 [=====================>........] - ETA: 2s
781/1024 [=====================>........] - ETA: 2s
786/1024 [======================>.......] - ETA: 2s
791/1024 [======================>.......] - ETA: 2s
796/1024 [======================>.......] - ETA: 2s
801/1024 [======================>.......] - ETA: 2s
806/1024 [======================>.......] - ETA: 2s
811/1024 [======================>.......] - ETA: 2s
816/1024 [======================>.......] - ETA: 2s
821/1024 [=======================>......] - ETA: 2s
826/1024 [=======================>......] - ETA: 2s
831/1024 [=======================>......] - ETA: 2s
836/1024 [=======================>......] - ETA: 2s
841/1024 [=======================>......] - ETA: 2s
846/1024 [=======================>......] - ETA: 2s
851/1024 [=======================>......] - ETA: 2s
856/1024 [========================>.....] - ETA: 2s
861/1024 [========================>.....] - ETA: 1s
866/1024 [========================>.....] - ETA: 1s
871/1024 [========================>.....] - ETA: 1s
876/1024 [========================>.....] - ETA: 1s
881/1024 [========================>.....] - ETA: 1s
886/1024 [========================>.....] - ETA: 1s
891/1024 [=========================>....] - ETA: 1s
896/1024 [=========================>....] - ETA: 1s
901/1024 [=========================>....] - ETA: 1s
906/1024 [=========================>....] - ETA: 1s
911/1024 [=========================>....] - ETA: 1s
916/1024 [=========================>....] - ETA: 1s
921/1024 [=========================>....] - ETA: 1s
926/1024 [==========================>...] - ETA: 1s
931/1024 [==========================>...] - ETA: 1s
936/1024 [==========================>...] - ETA: 1s
941/1024 [==========================>...] - ETA: 0s
946/1024 [==========================>...] - ETA: 0s
951/1024 [==========================>...] - ETA: 0s
956/1024 [===========================>..] - ETA: 0s
961/1024 [===========================>..] - ETA: 0s
966/1024 [===========================>..] - ETA: 0s
971/1024 [===========================>..] - ETA: 0s
976/1024 [===========================>..] - ETA: 0s
981/1024 [===========================>..] - ETA: 0s
986/1024 [===========================>..] - ETA: 0s
991/1024 [============================>.] - ETA: 0s
996/1024 [============================>.] - ETA: 0s
1001/1024 [============================>.] - ETA: 0s
1006/1024 [============================>.] - ETA: 0s
1011/1024 [============================>.] - ETA: 0s
1016/1024 [============================>.] - ETA: 0s
1021/1024 [============================>.] - ETA: 0s
1024/1024 [==============================] - 16s 12ms/step
Model: "vit-tiny"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input (InputLayer) [(None, 224, 224, 3 0 []
)]
Rescale (QuantizedRescaling) (None, 224, 224, 3) 0 ['input[0][0]']
Embedding (QuantizedConv2D) (None, 14, 14, 192) 147648 ['Rescale[0][0]']
reshape (QuantizedReshape) (None, 196, 192) 0 ['Embedding[0][0]']
ClassToken (QuantizedClassToke (None, 197, 192) 192 ['reshape[0][0]']
n)
Transformer/PosEmbed (Quantize (None, 197, 192) 38208 ['ClassToken[0][0]']
dAddPositionEmbs)
Transformer/EncoderBlock_0/Lay (None, 197, 192) 768 ['Transformer/PosEmbed[0][0]']
erNorm_0 (QuantizedLayerNormal
ization)
Transformer/EncoderBlock_0/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_0/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (QuantizedDense)
Transformer/EncoderBlock_0/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_0/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (QuantizedDense)
Transformer/EncoderBlock_0/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_0/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (QuantizedDense)
Transformer/EncoderBlock_0/Mul ((None, 197, 192), 384 ['Transformer/EncoderBlock_0/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (QuantizedAttention) )) 0][0]',
'Transformer/EncoderBlock_0/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_0/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_0/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_0/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (QuantizedDense) ion[0][0]']
dropout (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_0/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_0/add (None, 197, 192) 384 ['dropout[0][0]',
_1 (QuantizedAdd) 'Transformer/PosEmbed[0][0]']
Transformer/EncoderBlock_0/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_0/add_
erNorm_2 (QuantizedLayerNormal 1[0][0]']
ization)
Transformer/EncoderBlock_0/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_0/Laye
Block/Dense_0 (QuantizedDense) rNorm_2[0][0]']
Transformer/EncoderBlock_0/Mlp (None, 197, 768) 1536 ['Transformer/EncoderBlock_0/MlpB
Block/activation (QuantizedReL lock/Dense_0[0][0]']
U)
dropout_1 (QuantizedDropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_0/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_0/Mlp (None, 197, 192) 148032 ['dropout_1[0][0]']
Block/Dense_1 (QuantizedDense)
dropout_2 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_0/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_0/add (None, 197, 192) 384 ['Transformer/EncoderBlock_0/add_
_2 (QuantizedAdd) 1[0][0]',
'dropout_2[0][0]']
Transformer/EncoderBlock_1/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_0/add_
erNorm_0 (QuantizedLayerNormal 2[0][0]']
ization)
Transformer/EncoderBlock_1/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_1/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (QuantizedDense)
Transformer/EncoderBlock_1/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_1/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (QuantizedDense)
Transformer/EncoderBlock_1/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_1/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (QuantizedDense)
Transformer/EncoderBlock_1/Mul ((None, 197, 192), 384 ['Transformer/EncoderBlock_1/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (QuantizedAttention) )) 0][0]',
'Transformer/EncoderBlock_1/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_1/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_1/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_1/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (QuantizedDense) ion[0][0]']
dropout_3 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_1/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_1/add (None, 197, 192) 384 ['dropout_3[0][0]',
_1 (QuantizedAdd) 'Transformer/EncoderBlock_0/add_
2[0][0]']
Transformer/EncoderBlock_1/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_1/add_
erNorm_2 (QuantizedLayerNormal 1[0][0]']
ization)
Transformer/EncoderBlock_1/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_1/Laye
Block/Dense_0 (QuantizedDense) rNorm_2[0][0]']
Transformer/EncoderBlock_1/Mlp (None, 197, 768) 1536 ['Transformer/EncoderBlock_1/MlpB
Block/activation (QuantizedReL lock/Dense_0[0][0]']
U)
dropout_4 (QuantizedDropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_1/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_1/Mlp (None, 197, 192) 148032 ['dropout_4[0][0]']
Block/Dense_1 (QuantizedDense)
dropout_5 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_1/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_1/add (None, 197, 192) 384 ['Transformer/EncoderBlock_1/add_
_2 (QuantizedAdd) 1[0][0]',
'dropout_5[0][0]']
Transformer/EncoderBlock_2/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_1/add_
erNorm_0 (QuantizedLayerNormal 2[0][0]']
ization)
Transformer/EncoderBlock_2/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_2/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (QuantizedDense)
Transformer/EncoderBlock_2/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_2/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (QuantizedDense)
Transformer/EncoderBlock_2/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_2/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (QuantizedDense)
Transformer/EncoderBlock_2/Mul ((None, 197, 192), 384 ['Transformer/EncoderBlock_2/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (QuantizedAttention) )) 0][0]',
'Transformer/EncoderBlock_2/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_2/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_2/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_2/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (QuantizedDense) ion[0][0]']
dropout_6 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_2/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_2/add (None, 197, 192) 384 ['dropout_6[0][0]',
_1 (QuantizedAdd) 'Transformer/EncoderBlock_1/add_
2[0][0]']
Transformer/EncoderBlock_2/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_2/add_
erNorm_2 (QuantizedLayerNormal 1[0][0]']
ization)
Transformer/EncoderBlock_2/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_2/Laye
Block/Dense_0 (QuantizedDense) rNorm_2[0][0]']
Transformer/EncoderBlock_2/Mlp (None, 197, 768) 1536 ['Transformer/EncoderBlock_2/MlpB
Block/activation (QuantizedReL lock/Dense_0[0][0]']
U)
dropout_7 (QuantizedDropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_2/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_2/Mlp (None, 197, 192) 148032 ['dropout_7[0][0]']
Block/Dense_1 (QuantizedDense)
dropout_8 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_2/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_2/add (None, 197, 192) 384 ['Transformer/EncoderBlock_2/add_
_2 (QuantizedAdd) 1[0][0]',
'dropout_8[0][0]']
Transformer/EncoderBlock_3/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_2/add_
erNorm_0 (QuantizedLayerNormal 2[0][0]']
ization)
Transformer/EncoderBlock_3/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_3/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (QuantizedDense)
Transformer/EncoderBlock_3/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_3/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (QuantizedDense)
Transformer/EncoderBlock_3/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_3/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (QuantizedDense)
Transformer/EncoderBlock_3/Mul ((None, 197, 192), 384 ['Transformer/EncoderBlock_3/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (QuantizedAttention) )) 0][0]',
'Transformer/EncoderBlock_3/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_3/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_3/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_3/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (QuantizedDense) ion[0][0]']
dropout_9 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_3/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_3/add (None, 197, 192) 384 ['dropout_9[0][0]',
_1 (QuantizedAdd) 'Transformer/EncoderBlock_2/add_
2[0][0]']
Transformer/EncoderBlock_3/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_3/add_
erNorm_2 (QuantizedLayerNormal 1[0][0]']
ization)
Transformer/EncoderBlock_3/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_3/Laye
Block/Dense_0 (QuantizedDense) rNorm_2[0][0]']
Transformer/EncoderBlock_3/Mlp (None, 197, 768) 1536 ['Transformer/EncoderBlock_3/MlpB
Block/activation (QuantizedReL lock/Dense_0[0][0]']
U)
dropout_10 (QuantizedDropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_3/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_3/Mlp (None, 197, 192) 148032 ['dropout_10[0][0]']
Block/Dense_1 (QuantizedDense)
dropout_11 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_3/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_3/add (None, 197, 192) 384 ['Transformer/EncoderBlock_3/add_
_2 (QuantizedAdd) 1[0][0]',
'dropout_11[0][0]']
Transformer/EncoderBlock_4/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_3/add_
erNorm_0 (QuantizedLayerNormal 2[0][0]']
ization)
Transformer/EncoderBlock_4/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_4/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (QuantizedDense)
Transformer/EncoderBlock_4/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_4/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (QuantizedDense)
Transformer/EncoderBlock_4/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_4/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (QuantizedDense)
Transformer/EncoderBlock_4/Mul ((None, 197, 192), 384 ['Transformer/EncoderBlock_4/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (QuantizedAttention) )) 0][0]',
'Transformer/EncoderBlock_4/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_4/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_4/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_4/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (QuantizedDense) ion[0][0]']
dropout_12 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_4/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_4/add (None, 197, 192) 384 ['dropout_12[0][0]',
_1 (QuantizedAdd) 'Transformer/EncoderBlock_3/add_
2[0][0]']
Transformer/EncoderBlock_4/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_4/add_
erNorm_2 (QuantizedLayerNormal 1[0][0]']
ization)
Transformer/EncoderBlock_4/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_4/Laye
Block/Dense_0 (QuantizedDense) rNorm_2[0][0]']
Transformer/EncoderBlock_4/Mlp (None, 197, 768) 1536 ['Transformer/EncoderBlock_4/MlpB
Block/activation (QuantizedReL lock/Dense_0[0][0]']
U)
dropout_13 (QuantizedDropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_4/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_4/Mlp (None, 197, 192) 148032 ['dropout_13[0][0]']
Block/Dense_1 (QuantizedDense)
dropout_14 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_4/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_4/add (None, 197, 192) 384 ['Transformer/EncoderBlock_4/add_
_2 (QuantizedAdd) 1[0][0]',
'dropout_14[0][0]']
Transformer/EncoderBlock_5/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_4/add_
erNorm_0 (QuantizedLayerNormal 2[0][0]']
ization)
Transformer/EncoderBlock_5/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_5/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (QuantizedDense)
Transformer/EncoderBlock_5/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_5/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (QuantizedDense)
Transformer/EncoderBlock_5/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_5/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (QuantizedDense)
Transformer/EncoderBlock_5/Mul ((None, 197, 192), 384 ['Transformer/EncoderBlock_5/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (QuantizedAttention) )) 0][0]',
'Transformer/EncoderBlock_5/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_5/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_5/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_5/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (QuantizedDense) ion[0][0]']
dropout_15 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_5/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_5/add (None, 197, 192) 384 ['dropout_15[0][0]',
_1 (QuantizedAdd) 'Transformer/EncoderBlock_4/add_
2[0][0]']
Transformer/EncoderBlock_5/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_5/add_
erNorm_2 (QuantizedLayerNormal 1[0][0]']
ization)
Transformer/EncoderBlock_5/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_5/Laye
Block/Dense_0 (QuantizedDense) rNorm_2[0][0]']
Transformer/EncoderBlock_5/Mlp (None, 197, 768) 1536 ['Transformer/EncoderBlock_5/MlpB
Block/activation (QuantizedReL lock/Dense_0[0][0]']
U)
dropout_16 (QuantizedDropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_5/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_5/Mlp (None, 197, 192) 148032 ['dropout_16[0][0]']
Block/Dense_1 (QuantizedDense)
dropout_17 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_5/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_5/add (None, 197, 192) 384 ['Transformer/EncoderBlock_5/add_
_2 (QuantizedAdd) 1[0][0]',
'dropout_17[0][0]']
Transformer/EncoderBlock_6/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_5/add_
erNorm_0 (QuantizedLayerNormal 2[0][0]']
ization)
Transformer/EncoderBlock_6/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_6/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (QuantizedDense)
Transformer/EncoderBlock_6/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_6/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (QuantizedDense)
Transformer/EncoderBlock_6/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_6/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (QuantizedDense)
Transformer/EncoderBlock_6/Mul ((None, 197, 192), 384 ['Transformer/EncoderBlock_6/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (QuantizedAttention) )) 0][0]',
'Transformer/EncoderBlock_6/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_6/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_6/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_6/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (QuantizedDense) ion[0][0]']
dropout_18 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_6/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_6/add (None, 197, 192) 384 ['dropout_18[0][0]',
_1 (QuantizedAdd) 'Transformer/EncoderBlock_5/add_
2[0][0]']
Transformer/EncoderBlock_6/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_6/add_
erNorm_2 (QuantizedLayerNormal 1[0][0]']
ization)
Transformer/EncoderBlock_6/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_6/Laye
Block/Dense_0 (QuantizedDense) rNorm_2[0][0]']
Transformer/EncoderBlock_6/Mlp (None, 197, 768) 1536 ['Transformer/EncoderBlock_6/MlpB
Block/activation (QuantizedReL lock/Dense_0[0][0]']
U)
dropout_19 (QuantizedDropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_6/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_6/Mlp (None, 197, 192) 148032 ['dropout_19[0][0]']
Block/Dense_1 (QuantizedDense)
dropout_20 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_6/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_6/add (None, 197, 192) 384 ['Transformer/EncoderBlock_6/add_
_2 (QuantizedAdd) 1[0][0]',
'dropout_20[0][0]']
Transformer/EncoderBlock_7/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_6/add_
erNorm_0 (QuantizedLayerNormal 2[0][0]']
ization)
Transformer/EncoderBlock_7/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_7/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (QuantizedDense)
Transformer/EncoderBlock_7/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_7/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (QuantizedDense)
Transformer/EncoderBlock_7/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_7/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (QuantizedDense)
Transformer/EncoderBlock_7/Mul ((None, 197, 192), 384 ['Transformer/EncoderBlock_7/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (QuantizedAttention) )) 0][0]',
'Transformer/EncoderBlock_7/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_7/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_7/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_7/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (QuantizedDense) ion[0][0]']
dropout_21 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_7/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_7/add (None, 197, 192) 384 ['dropout_21[0][0]',
_1 (QuantizedAdd) 'Transformer/EncoderBlock_6/add_
2[0][0]']
Transformer/EncoderBlock_7/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_7/add_
erNorm_2 (QuantizedLayerNormal 1[0][0]']
ization)
Transformer/EncoderBlock_7/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_7/Laye
Block/Dense_0 (QuantizedDense) rNorm_2[0][0]']
Transformer/EncoderBlock_7/Mlp (None, 197, 768) 1536 ['Transformer/EncoderBlock_7/MlpB
Block/activation (QuantizedReL lock/Dense_0[0][0]']
U)
dropout_22 (QuantizedDropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_7/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_7/Mlp (None, 197, 192) 148032 ['dropout_22[0][0]']
Block/Dense_1 (QuantizedDense)
dropout_23 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_7/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_7/add (None, 197, 192) 384 ['Transformer/EncoderBlock_7/add_
_2 (QuantizedAdd) 1[0][0]',
'dropout_23[0][0]']
Transformer/EncoderBlock_8/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_7/add_
erNorm_0 (QuantizedLayerNormal 2[0][0]']
ization)
Transformer/EncoderBlock_8/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_8/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (QuantizedDense)
Transformer/EncoderBlock_8/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_8/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (QuantizedDense)
Transformer/EncoderBlock_8/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_8/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (QuantizedDense)
Transformer/EncoderBlock_8/Mul ((None, 197, 192), 384 ['Transformer/EncoderBlock_8/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (QuantizedAttention) )) 0][0]',
'Transformer/EncoderBlock_8/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_8/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_8/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_8/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (QuantizedDense) ion[0][0]']
dropout_24 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_8/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_8/add (None, 197, 192) 384 ['dropout_24[0][0]',
_1 (QuantizedAdd) 'Transformer/EncoderBlock_7/add_
2[0][0]']
Transformer/EncoderBlock_8/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_8/add_
erNorm_2 (QuantizedLayerNormal 1[0][0]']
ization)
Transformer/EncoderBlock_8/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_8/Laye
Block/Dense_0 (QuantizedDense) rNorm_2[0][0]']
Transformer/EncoderBlock_8/Mlp (None, 197, 768) 1536 ['Transformer/EncoderBlock_8/MlpB
Block/activation (QuantizedReL lock/Dense_0[0][0]']
U)
dropout_25 (QuantizedDropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_8/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_8/Mlp (None, 197, 192) 148032 ['dropout_25[0][0]']
Block/Dense_1 (QuantizedDense)
dropout_26 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_8/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_8/add (None, 197, 192) 384 ['Transformer/EncoderBlock_8/add_
_2 (QuantizedAdd) 1[0][0]',
'dropout_26[0][0]']
Transformer/EncoderBlock_9/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_8/add_
erNorm_0 (QuantizedLayerNormal 2[0][0]']
ization)
Transformer/EncoderBlock_9/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_9/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (QuantizedDense)
Transformer/EncoderBlock_9/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_9/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (QuantizedDense)
Transformer/EncoderBlock_9/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_9/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (QuantizedDense)
Transformer/EncoderBlock_9/Mul ((None, 197, 192), 384 ['Transformer/EncoderBlock_9/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (QuantizedAttention) )) 0][0]',
'Transformer/EncoderBlock_9/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_9/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_9/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_9/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (QuantizedDense) ion[0][0]']
dropout_27 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_9/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_9/add (None, 197, 192) 384 ['dropout_27[0][0]',
_1 (QuantizedAdd) 'Transformer/EncoderBlock_8/add_
2[0][0]']
Transformer/EncoderBlock_9/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_9/add_
erNorm_2 (QuantizedLayerNormal 1[0][0]']
ization)
Transformer/EncoderBlock_9/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_9/Laye
Block/Dense_0 (QuantizedDense) rNorm_2[0][0]']
Transformer/EncoderBlock_9/Mlp (None, 197, 768) 1536 ['Transformer/EncoderBlock_9/MlpB
Block/activation (QuantizedReL lock/Dense_0[0][0]']
U)
dropout_28 (QuantizedDropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_9/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_9/Mlp (None, 197, 192) 148032 ['dropout_28[0][0]']
Block/Dense_1 (QuantizedDense)
dropout_29 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_9/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_9/add (None, 197, 192) 384 ['Transformer/EncoderBlock_9/add_
_2 (QuantizedAdd) 1[0][0]',
'dropout_29[0][0]']
Transformer/EncoderBlock_10/La (None, 197, 192) 768 ['Transformer/EncoderBlock_9/add_
yerNorm_0 (QuantizedLayerNorma 2[0][0]']
lization)
Transformer/EncoderBlock_10/Mu (None, 197, 192) 37058 ['Transformer/EncoderBlock_10/Lay
ltiHeadDotProductAttention_1/q erNorm_0[0][0]']
uery (QuantizedDense)
Transformer/EncoderBlock_10/Mu (None, 197, 192) 37058 ['Transformer/EncoderBlock_10/Lay
ltiHeadDotProductAttention_1/k erNorm_0[0][0]']
ey (QuantizedDense)
Transformer/EncoderBlock_10/Mu (None, 197, 192) 37440 ['Transformer/EncoderBlock_10/Lay
ltiHeadDotProductAttention_1/v erNorm_0[0][0]']
alue (QuantizedDense)
Transformer/EncoderBlock_10/Mu ((None, 197, 192), 384 ['Transformer/EncoderBlock_10/Mul
ltiHeadDotProductAttention_1/a (None, 3, 197, 197 tiHeadDotProductAttention_1/query
ttention (QuantizedAttention) )) [0][0]',
'Transformer/EncoderBlock_10/Mul
tiHeadDotProductAttention_1/key[0
][0]',
'Transformer/EncoderBlock_10/Mul
tiHeadDotProductAttention_1/value
[0][0]']
Transformer/EncoderBlock_10/Mu (None, 197, 192) 37440 ['Transformer/EncoderBlock_10/Mul
ltiHeadDotProductAttention_1/o tiHeadDotProductAttention_1/atten
ut (QuantizedDense) tion[0][0]']
dropout_30 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_10/Mul
tiHeadDotProductAttention_1/out[0
][0]']
Transformer/EncoderBlock_10/ad (None, 197, 192) 384 ['dropout_30[0][0]',
d_1 (QuantizedAdd) 'Transformer/EncoderBlock_9/add_
2[0][0]']
Transformer/EncoderBlock_10/La (None, 197, 192) 768 ['Transformer/EncoderBlock_10/add
yerNorm_2 (QuantizedLayerNorma _1[0][0]']
lization)
Transformer/EncoderBlock_10/Ml (None, 197, 768) 148224 ['Transformer/EncoderBlock_10/Lay
pBlock/Dense_0 (QuantizedDense erNorm_2[0][0]']
)
Transformer/EncoderBlock_10/Ml (None, 197, 768) 1536 ['Transformer/EncoderBlock_10/Mlp
pBlock/activation (QuantizedRe Block/Dense_0[0][0]']
LU)
dropout_31 (QuantizedDropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_10/Mlp
Block/activation[0][0]']
Transformer/EncoderBlock_10/Ml (None, 197, 192) 148032 ['dropout_31[0][0]']
pBlock/Dense_1 (QuantizedDense
)
dropout_32 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_10/Mlp
Block/Dense_1[0][0]']
Transformer/EncoderBlock_10/ad (None, 197, 192) 384 ['Transformer/EncoderBlock_10/add
d_2 (QuantizedAdd) _1[0][0]',
'dropout_32[0][0]']
Transformer/EncoderBlock_11/La (None, 197, 192) 768 ['Transformer/EncoderBlock_10/add
yerNorm_0 (QuantizedLayerNorma _2[0][0]']
lization)
Transformer/EncoderBlock_11/Mu (None, 197, 192) 37058 ['Transformer/EncoderBlock_11/Lay
ltiHeadDotProductAttention_1/q erNorm_0[0][0]']
uery (QuantizedDense)
Transformer/EncoderBlock_11/Mu (None, 197, 192) 37058 ['Transformer/EncoderBlock_11/Lay
ltiHeadDotProductAttention_1/k erNorm_0[0][0]']
ey (QuantizedDense)
Transformer/EncoderBlock_11/Mu (None, 197, 192) 37440 ['Transformer/EncoderBlock_11/Lay
ltiHeadDotProductAttention_1/v erNorm_0[0][0]']
alue (QuantizedDense)
Transformer/EncoderBlock_11/Mu ((None, 197, 192), 384 ['Transformer/EncoderBlock_11/Mul
ltiHeadDotProductAttention_1/a (None, 3, 197, 197 tiHeadDotProductAttention_1/query
ttention (QuantizedAttention) )) [0][0]',
'Transformer/EncoderBlock_11/Mul
tiHeadDotProductAttention_1/key[0
][0]',
'Transformer/EncoderBlock_11/Mul
tiHeadDotProductAttention_1/value
[0][0]']
Transformer/EncoderBlock_11/Mu (None, 197, 192) 37440 ['Transformer/EncoderBlock_11/Mul
ltiHeadDotProductAttention_1/o tiHeadDotProductAttention_1/atten
ut (QuantizedDense) tion[0][0]']
dropout_33 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_11/Mul
tiHeadDotProductAttention_1/out[0
][0]']
Transformer/EncoderBlock_11/ad (None, 197, 192) 384 ['dropout_33[0][0]',
d_1 (QuantizedAdd) 'Transformer/EncoderBlock_10/add
_2[0][0]']
Transformer/EncoderBlock_11/La (None, 197, 192) 768 ['Transformer/EncoderBlock_11/add
yerNorm_2 (QuantizedLayerNorma _1[0][0]']
lization)
Transformer/EncoderBlock_11/Ml (None, 197, 768) 148224 ['Transformer/EncoderBlock_11/Lay
pBlock/Dense_0 (QuantizedDense erNorm_2[0][0]']
)
Transformer/EncoderBlock_11/Ml (None, 197, 768) 1536 ['Transformer/EncoderBlock_11/Mlp
pBlock/activation (QuantizedRe Block/Dense_0[0][0]']
LU)
dropout_34 (QuantizedDropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_11/Mlp
Block/activation[0][0]']
Transformer/EncoderBlock_11/Ml (None, 197, 192) 148032 ['dropout_34[0][0]']
pBlock/Dense_1 (QuantizedDense
)
dropout_35 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_11/Mlp
Block/Dense_1[0][0]']
Transformer/EncoderBlock_11/ad (None, 197, 192) 0 ['Transformer/EncoderBlock_11/add
d_2 (QuantizedAdd) _1[0][0]',
'dropout_35[0][0]']
Transformer/EncoderNorm (Quant (None, 197, 192) 1152 ['Transformer/EncoderBlock_11/add
izedBatchNormalization) _2[0][0]']
ExtractToken (QuantizedExtract (None, 192) 0 ['Transformer/EncoderNorm[0][0]']
Token)
Head (QuantizedDense) (None, 1000) 193000 ['ExtractToken[0][0]']
dequantizer_4 (Dequantizer) [(None, 1000)] 0 ['Head[0][0]']
==================================================================================================
Total params: 5,773,528
Trainable params: 5,717,416
Non-trainable params: 56,112
__________________________________________________________________________________________________
The bc_vit_ti16_imagenet_pretrained helper was obtained with the same 8-bit quantization scheme but with an additional QAT step to further improve accuracy.
4.2 Quantization Aware Training (Optional)
In Section 4.1, we performed PTQ and converted the weights and activation outputs to 8-bit integer numbers. In most cases, there is no accuracy drop observed after quantization, however in cases where an accurary drop is observed, it is possible to further fine-tune this model using QAT.
The model that is obtained through QuantizeML python package is an instance of Keras. This allows the model to be fine-tuned using the original dataset to regain accuracy.
Akida models python package provides pre-trained models for vit_ti16 and deit_ti16 that have been trained using QAT method. It can be used in the following way:
from akida_models import bc_vit_ti16_imagenet_pretrained
# Load the pre-trained quantized model
model_quantized = bc_vit_ti16_imagenet_pretrained()
model_quantized.summary()
Downloading data from https://data.brainchip.com/models/AkidaV2/vit/bc_vit_ti16_224_i8_w8_a8.h5.
0/24405400 [..............................] - ETA: 0s
90112/24405400 [..............................] - ETA: 16s
516096/24405400 [..............................] - ETA: 5s
1097728/24405400 [>.............................] - ETA: 3s
1581056/24405400 [>.............................] - ETA: 3s
2072576/24405400 [=>............................] - ETA: 2s
2654208/24405400 [==>...........................] - ETA: 2s
3244032/24405400 [==>...........................] - ETA: 2s
3833856/24405400 [===>..........................] - ETA: 2s
4423680/24405400 [====>.........................] - ETA: 2s
5021696/24405400 [=====>........................] - ETA: 1s
5603328/24405400 [=====>........................] - ETA: 1s
6193152/24405400 [======>.......................] - ETA: 1s
6782976/24405400 [=======>......................] - ETA: 1s
7372800/24405400 [========>.....................] - ETA: 1s
7962624/24405400 [========>.....................] - ETA: 1s
8552448/24405400 [=========>....................] - ETA: 1s
9142272/24405400 [==========>...................] - ETA: 1s
9732096/24405400 [==========>...................] - ETA: 1s
10321920/24405400 [===========>..................] - ETA: 1s
10911744/24405400 [============>.................] - ETA: 1s
11501568/24405400 [=============>................] - ETA: 1s
12091392/24405400 [=============>................] - ETA: 1s
12681216/24405400 [==============>...............] - ETA: 1s
13271040/24405400 [===============>..............] - ETA: 1s
13860864/24405400 [================>.............] - ETA: 0s
14450688/24405400 [================>.............] - ETA: 0s
15040512/24405400 [=================>............] - ETA: 0s
15630336/24405400 [==================>...........] - ETA: 0s
16220160/24405400 [==================>...........] - ETA: 0s
16809984/24405400 [===================>..........] - ETA: 0s
17399808/24405400 [====================>.........] - ETA: 0s
17989632/24405400 [=====================>........] - ETA: 0s
18579456/24405400 [=====================>........] - ETA: 0s
19169280/24405400 [======================>.......] - ETA: 0s
19759104/24405400 [=======================>......] - ETA: 0s
20348928/24405400 [========================>.....] - ETA: 0s
20938752/24405400 [========================>.....] - ETA: 0s
21528576/24405400 [=========================>....] - ETA: 0s
22118400/24405400 [==========================>...] - ETA: 0s
22708224/24405400 [==========================>...] - ETA: 0s
23298048/24405400 [===========================>..] - ETA: 0s
23887872/24405400 [============================>.] - ETA: 0s
24405400/24405400 [==============================] - 2s 0us/step
Model: "vit-tiny"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input (InputLayer) [(None, 224, 224, 3 0 []
)]
Rescale (QuantizedRescaling) (None, 224, 224, 3) 0 ['input[0][0]']
Embedding (QuantizedConv2D) (None, 14, 14, 192) 147648 ['Rescale[0][0]']
reshape (QuantizedReshape) (None, 196, 192) 0 ['Embedding[0][0]']
ClassToken (QuantizedClassToke (None, 197, 192) 192 ['reshape[0][0]']
n)
Transformer/PosEmbed (Quantize (None, 197, 192) 38208 ['ClassToken[0][0]']
dAddPositionEmbs)
Transformer/EncoderBlock_0/Lay (None, 197, 192) 768 ['Transformer/PosEmbed[0][0]']
erNorm_0 (QuantizedLayerNormal
ization)
Transformer/EncoderBlock_0/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_0/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (QuantizedDense)
Transformer/EncoderBlock_0/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_0/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (QuantizedDense)
Transformer/EncoderBlock_0/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_0/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (QuantizedDense)
Transformer/EncoderBlock_0/Mul ((None, 197, 192), 384 ['Transformer/EncoderBlock_0/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (QuantizedAttention) )) 0][0]',
'Transformer/EncoderBlock_0/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_0/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_0/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_0/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (QuantizedDense) ion[0][0]']
dropout (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_0/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_0/add (None, 197, 192) 384 ['dropout[0][0]',
_1 (QuantizedAdd) 'Transformer/PosEmbed[0][0]']
Transformer/EncoderBlock_0/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_0/add_
erNorm_2 (QuantizedLayerNormal 1[0][0]']
ization)
Transformer/EncoderBlock_0/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_0/Laye
Block/Dense_0 (QuantizedDense) rNorm_2[0][0]']
Transformer/EncoderBlock_0/Mlp (None, 197, 768) 1536 ['Transformer/EncoderBlock_0/MlpB
Block/activation (QuantizedReL lock/Dense_0[0][0]']
U)
dropout_1 (QuantizedDropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_0/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_0/Mlp (None, 197, 192) 148032 ['dropout_1[0][0]']
Block/Dense_1 (QuantizedDense)
dropout_2 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_0/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_0/add (None, 197, 192) 384 ['Transformer/EncoderBlock_0/add_
_2 (QuantizedAdd) 1[0][0]',
'dropout_2[0][0]']
Transformer/EncoderBlock_1/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_0/add_
erNorm_0 (QuantizedLayerNormal 2[0][0]']
ization)
Transformer/EncoderBlock_1/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_1/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (QuantizedDense)
Transformer/EncoderBlock_1/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_1/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (QuantizedDense)
Transformer/EncoderBlock_1/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_1/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (QuantizedDense)
Transformer/EncoderBlock_1/Mul ((None, 197, 192), 384 ['Transformer/EncoderBlock_1/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (QuantizedAttention) )) 0][0]',
'Transformer/EncoderBlock_1/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_1/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_1/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_1/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (QuantizedDense) ion[0][0]']
dropout_3 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_1/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_1/add (None, 197, 192) 384 ['dropout_3[0][0]',
_1 (QuantizedAdd) 'Transformer/EncoderBlock_0/add_
2[0][0]']
Transformer/EncoderBlock_1/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_1/add_
erNorm_2 (QuantizedLayerNormal 1[0][0]']
ization)
Transformer/EncoderBlock_1/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_1/Laye
Block/Dense_0 (QuantizedDense) rNorm_2[0][0]']
Transformer/EncoderBlock_1/Mlp (None, 197, 768) 1536 ['Transformer/EncoderBlock_1/MlpB
Block/activation (QuantizedReL lock/Dense_0[0][0]']
U)
dropout_4 (QuantizedDropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_1/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_1/Mlp (None, 197, 192) 148032 ['dropout_4[0][0]']
Block/Dense_1 (QuantizedDense)
dropout_5 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_1/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_1/add (None, 197, 192) 384 ['Transformer/EncoderBlock_1/add_
_2 (QuantizedAdd) 1[0][0]',
'dropout_5[0][0]']
Transformer/EncoderBlock_2/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_1/add_
erNorm_0 (QuantizedLayerNormal 2[0][0]']
ization)
Transformer/EncoderBlock_2/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_2/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (QuantizedDense)
Transformer/EncoderBlock_2/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_2/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (QuantizedDense)
Transformer/EncoderBlock_2/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_2/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (QuantizedDense)
Transformer/EncoderBlock_2/Mul ((None, 197, 192), 384 ['Transformer/EncoderBlock_2/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (QuantizedAttention) )) 0][0]',
'Transformer/EncoderBlock_2/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_2/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_2/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_2/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (QuantizedDense) ion[0][0]']
dropout_6 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_2/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_2/add (None, 197, 192) 384 ['dropout_6[0][0]',
_1 (QuantizedAdd) 'Transformer/EncoderBlock_1/add_
2[0][0]']
Transformer/EncoderBlock_2/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_2/add_
erNorm_2 (QuantizedLayerNormal 1[0][0]']
ization)
Transformer/EncoderBlock_2/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_2/Laye
Block/Dense_0 (QuantizedDense) rNorm_2[0][0]']
Transformer/EncoderBlock_2/Mlp (None, 197, 768) 1536 ['Transformer/EncoderBlock_2/MlpB
Block/activation (QuantizedReL lock/Dense_0[0][0]']
U)
dropout_7 (QuantizedDropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_2/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_2/Mlp (None, 197, 192) 148032 ['dropout_7[0][0]']
Block/Dense_1 (QuantizedDense)
dropout_8 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_2/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_2/add (None, 197, 192) 384 ['Transformer/EncoderBlock_2/add_
_2 (QuantizedAdd) 1[0][0]',
'dropout_8[0][0]']
Transformer/EncoderBlock_3/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_2/add_
erNorm_0 (QuantizedLayerNormal 2[0][0]']
ization)
Transformer/EncoderBlock_3/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_3/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (QuantizedDense)
Transformer/EncoderBlock_3/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_3/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (QuantizedDense)
Transformer/EncoderBlock_3/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_3/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (QuantizedDense)
Transformer/EncoderBlock_3/Mul ((None, 197, 192), 384 ['Transformer/EncoderBlock_3/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (QuantizedAttention) )) 0][0]',
'Transformer/EncoderBlock_3/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_3/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_3/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_3/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (QuantizedDense) ion[0][0]']
dropout_9 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_3/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_3/add (None, 197, 192) 384 ['dropout_9[0][0]',
_1 (QuantizedAdd) 'Transformer/EncoderBlock_2/add_
2[0][0]']
Transformer/EncoderBlock_3/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_3/add_
erNorm_2 (QuantizedLayerNormal 1[0][0]']
ization)
Transformer/EncoderBlock_3/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_3/Laye
Block/Dense_0 (QuantizedDense) rNorm_2[0][0]']
Transformer/EncoderBlock_3/Mlp (None, 197, 768) 1536 ['Transformer/EncoderBlock_3/MlpB
Block/activation (QuantizedReL lock/Dense_0[0][0]']
U)
dropout_10 (QuantizedDropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_3/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_3/Mlp (None, 197, 192) 148032 ['dropout_10[0][0]']
Block/Dense_1 (QuantizedDense)
dropout_11 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_3/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_3/add (None, 197, 192) 384 ['Transformer/EncoderBlock_3/add_
_2 (QuantizedAdd) 1[0][0]',
'dropout_11[0][0]']
Transformer/EncoderBlock_4/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_3/add_
erNorm_0 (QuantizedLayerNormal 2[0][0]']
ization)
Transformer/EncoderBlock_4/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_4/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (QuantizedDense)
Transformer/EncoderBlock_4/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_4/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (QuantizedDense)
Transformer/EncoderBlock_4/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_4/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (QuantizedDense)
Transformer/EncoderBlock_4/Mul ((None, 197, 192), 384 ['Transformer/EncoderBlock_4/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (QuantizedAttention) )) 0][0]',
'Transformer/EncoderBlock_4/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_4/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_4/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_4/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (QuantizedDense) ion[0][0]']
dropout_12 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_4/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_4/add (None, 197, 192) 384 ['dropout_12[0][0]',
_1 (QuantizedAdd) 'Transformer/EncoderBlock_3/add_
2[0][0]']
Transformer/EncoderBlock_4/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_4/add_
erNorm_2 (QuantizedLayerNormal 1[0][0]']
ization)
Transformer/EncoderBlock_4/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_4/Laye
Block/Dense_0 (QuantizedDense) rNorm_2[0][0]']
Transformer/EncoderBlock_4/Mlp (None, 197, 768) 1536 ['Transformer/EncoderBlock_4/MlpB
Block/activation (QuantizedReL lock/Dense_0[0][0]']
U)
dropout_13 (QuantizedDropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_4/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_4/Mlp (None, 197, 192) 148032 ['dropout_13[0][0]']
Block/Dense_1 (QuantizedDense)
dropout_14 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_4/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_4/add (None, 197, 192) 384 ['Transformer/EncoderBlock_4/add_
_2 (QuantizedAdd) 1[0][0]',
'dropout_14[0][0]']
Transformer/EncoderBlock_5/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_4/add_
erNorm_0 (QuantizedLayerNormal 2[0][0]']
ization)
Transformer/EncoderBlock_5/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_5/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (QuantizedDense)
Transformer/EncoderBlock_5/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_5/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (QuantizedDense)
Transformer/EncoderBlock_5/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_5/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (QuantizedDense)
Transformer/EncoderBlock_5/Mul ((None, 197, 192), 384 ['Transformer/EncoderBlock_5/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (QuantizedAttention) )) 0][0]',
'Transformer/EncoderBlock_5/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_5/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_5/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_5/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (QuantizedDense) ion[0][0]']
dropout_15 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_5/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_5/add (None, 197, 192) 384 ['dropout_15[0][0]',
_1 (QuantizedAdd) 'Transformer/EncoderBlock_4/add_
2[0][0]']
Transformer/EncoderBlock_5/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_5/add_
erNorm_2 (QuantizedLayerNormal 1[0][0]']
ization)
Transformer/EncoderBlock_5/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_5/Laye
Block/Dense_0 (QuantizedDense) rNorm_2[0][0]']
Transformer/EncoderBlock_5/Mlp (None, 197, 768) 1536 ['Transformer/EncoderBlock_5/MlpB
Block/activation (QuantizedReL lock/Dense_0[0][0]']
U)
dropout_16 (QuantizedDropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_5/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_5/Mlp (None, 197, 192) 148032 ['dropout_16[0][0]']
Block/Dense_1 (QuantizedDense)
dropout_17 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_5/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_5/add (None, 197, 192) 384 ['Transformer/EncoderBlock_5/add_
_2 (QuantizedAdd) 1[0][0]',
'dropout_17[0][0]']
Transformer/EncoderBlock_6/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_5/add_
erNorm_0 (QuantizedLayerNormal 2[0][0]']
ization)
Transformer/EncoderBlock_6/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_6/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (QuantizedDense)
Transformer/EncoderBlock_6/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_6/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (QuantizedDense)
Transformer/EncoderBlock_6/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_6/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (QuantizedDense)
Transformer/EncoderBlock_6/Mul ((None, 197, 192), 384 ['Transformer/EncoderBlock_6/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (QuantizedAttention) )) 0][0]',
'Transformer/EncoderBlock_6/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_6/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_6/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_6/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (QuantizedDense) ion[0][0]']
dropout_18 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_6/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_6/add (None, 197, 192) 384 ['dropout_18[0][0]',
_1 (QuantizedAdd) 'Transformer/EncoderBlock_5/add_
2[0][0]']
Transformer/EncoderBlock_6/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_6/add_
erNorm_2 (QuantizedLayerNormal 1[0][0]']
ization)
Transformer/EncoderBlock_6/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_6/Laye
Block/Dense_0 (QuantizedDense) rNorm_2[0][0]']
Transformer/EncoderBlock_6/Mlp (None, 197, 768) 1536 ['Transformer/EncoderBlock_6/MlpB
Block/activation (QuantizedReL lock/Dense_0[0][0]']
U)
dropout_19 (QuantizedDropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_6/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_6/Mlp (None, 197, 192) 148032 ['dropout_19[0][0]']
Block/Dense_1 (QuantizedDense)
dropout_20 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_6/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_6/add (None, 197, 192) 384 ['Transformer/EncoderBlock_6/add_
_2 (QuantizedAdd) 1[0][0]',
'dropout_20[0][0]']
Transformer/EncoderBlock_7/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_6/add_
erNorm_0 (QuantizedLayerNormal 2[0][0]']
ization)
Transformer/EncoderBlock_7/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_7/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (QuantizedDense)
Transformer/EncoderBlock_7/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_7/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (QuantizedDense)
Transformer/EncoderBlock_7/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_7/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (QuantizedDense)
Transformer/EncoderBlock_7/Mul ((None, 197, 192), 384 ['Transformer/EncoderBlock_7/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (QuantizedAttention) )) 0][0]',
'Transformer/EncoderBlock_7/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_7/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_7/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_7/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (QuantizedDense) ion[0][0]']
dropout_21 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_7/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_7/add (None, 197, 192) 384 ['dropout_21[0][0]',
_1 (QuantizedAdd) 'Transformer/EncoderBlock_6/add_
2[0][0]']
Transformer/EncoderBlock_7/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_7/add_
erNorm_2 (QuantizedLayerNormal 1[0][0]']
ization)
Transformer/EncoderBlock_7/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_7/Laye
Block/Dense_0 (QuantizedDense) rNorm_2[0][0]']
Transformer/EncoderBlock_7/Mlp (None, 197, 768) 1536 ['Transformer/EncoderBlock_7/MlpB
Block/activation (QuantizedReL lock/Dense_0[0][0]']
U)
dropout_22 (QuantizedDropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_7/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_7/Mlp (None, 197, 192) 148032 ['dropout_22[0][0]']
Block/Dense_1 (QuantizedDense)
dropout_23 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_7/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_7/add (None, 197, 192) 384 ['Transformer/EncoderBlock_7/add_
_2 (QuantizedAdd) 1[0][0]',
'dropout_23[0][0]']
Transformer/EncoderBlock_8/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_7/add_
erNorm_0 (QuantizedLayerNormal 2[0][0]']
ization)
Transformer/EncoderBlock_8/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_8/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (QuantizedDense)
Transformer/EncoderBlock_8/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_8/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (QuantizedDense)
Transformer/EncoderBlock_8/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_8/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (QuantizedDense)
Transformer/EncoderBlock_8/Mul ((None, 197, 192), 384 ['Transformer/EncoderBlock_8/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (QuantizedAttention) )) 0][0]',
'Transformer/EncoderBlock_8/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_8/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_8/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_8/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (QuantizedDense) ion[0][0]']
dropout_24 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_8/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_8/add (None, 197, 192) 384 ['dropout_24[0][0]',
_1 (QuantizedAdd) 'Transformer/EncoderBlock_7/add_
2[0][0]']
Transformer/EncoderBlock_8/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_8/add_
erNorm_2 (QuantizedLayerNormal 1[0][0]']
ization)
Transformer/EncoderBlock_8/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_8/Laye
Block/Dense_0 (QuantizedDense) rNorm_2[0][0]']
Transformer/EncoderBlock_8/Mlp (None, 197, 768) 1536 ['Transformer/EncoderBlock_8/MlpB
Block/activation (QuantizedReL lock/Dense_0[0][0]']
U)
dropout_25 (QuantizedDropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_8/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_8/Mlp (None, 197, 192) 148032 ['dropout_25[0][0]']
Block/Dense_1 (QuantizedDense)
dropout_26 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_8/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_8/add (None, 197, 192) 384 ['Transformer/EncoderBlock_8/add_
_2 (QuantizedAdd) 1[0][0]',
'dropout_26[0][0]']
Transformer/EncoderBlock_9/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_8/add_
erNorm_0 (QuantizedLayerNormal 2[0][0]']
ization)
Transformer/EncoderBlock_9/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_9/Laye
tiHeadDotProductAttention_1/qu rNorm_0[0][0]']
ery (QuantizedDense)
Transformer/EncoderBlock_9/Mul (None, 197, 192) 37058 ['Transformer/EncoderBlock_9/Laye
tiHeadDotProductAttention_1/ke rNorm_0[0][0]']
y (QuantizedDense)
Transformer/EncoderBlock_9/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_9/Laye
tiHeadDotProductAttention_1/va rNorm_0[0][0]']
lue (QuantizedDense)
Transformer/EncoderBlock_9/Mul ((None, 197, 192), 384 ['Transformer/EncoderBlock_9/Mult
tiHeadDotProductAttention_1/at (None, 3, 197, 197 iHeadDotProductAttention_1/query[
tention (QuantizedAttention) )) 0][0]',
'Transformer/EncoderBlock_9/Mult
iHeadDotProductAttention_1/key[0]
[0]',
'Transformer/EncoderBlock_9/Mult
iHeadDotProductAttention_1/value[
0][0]']
Transformer/EncoderBlock_9/Mul (None, 197, 192) 37440 ['Transformer/EncoderBlock_9/Mult
tiHeadDotProductAttention_1/ou iHeadDotProductAttention_1/attent
t (QuantizedDense) ion[0][0]']
dropout_27 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_9/Mult
iHeadDotProductAttention_1/out[0]
[0]']
Transformer/EncoderBlock_9/add (None, 197, 192) 384 ['dropout_27[0][0]',
_1 (QuantizedAdd) 'Transformer/EncoderBlock_8/add_
2[0][0]']
Transformer/EncoderBlock_9/Lay (None, 197, 192) 768 ['Transformer/EncoderBlock_9/add_
erNorm_2 (QuantizedLayerNormal 1[0][0]']
ization)
Transformer/EncoderBlock_9/Mlp (None, 197, 768) 148224 ['Transformer/EncoderBlock_9/Laye
Block/Dense_0 (QuantizedDense) rNorm_2[0][0]']
Transformer/EncoderBlock_9/Mlp (None, 197, 768) 1536 ['Transformer/EncoderBlock_9/MlpB
Block/activation (QuantizedReL lock/Dense_0[0][0]']
U)
dropout_28 (QuantizedDropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_9/MlpB
lock/activation[0][0]']
Transformer/EncoderBlock_9/Mlp (None, 197, 192) 148032 ['dropout_28[0][0]']
Block/Dense_1 (QuantizedDense)
dropout_29 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_9/MlpB
lock/Dense_1[0][0]']
Transformer/EncoderBlock_9/add (None, 197, 192) 384 ['Transformer/EncoderBlock_9/add_
_2 (QuantizedAdd) 1[0][0]',
'dropout_29[0][0]']
Transformer/EncoderBlock_10/La (None, 197, 192) 768 ['Transformer/EncoderBlock_9/add_
yerNorm_0 (QuantizedLayerNorma 2[0][0]']
lization)
Transformer/EncoderBlock_10/Mu (None, 197, 192) 37058 ['Transformer/EncoderBlock_10/Lay
ltiHeadDotProductAttention_1/q erNorm_0[0][0]']
uery (QuantizedDense)
Transformer/EncoderBlock_10/Mu (None, 197, 192) 37058 ['Transformer/EncoderBlock_10/Lay
ltiHeadDotProductAttention_1/k erNorm_0[0][0]']
ey (QuantizedDense)
Transformer/EncoderBlock_10/Mu (None, 197, 192) 37440 ['Transformer/EncoderBlock_10/Lay
ltiHeadDotProductAttention_1/v erNorm_0[0][0]']
alue (QuantizedDense)
Transformer/EncoderBlock_10/Mu ((None, 197, 192), 384 ['Transformer/EncoderBlock_10/Mul
ltiHeadDotProductAttention_1/a (None, 3, 197, 197 tiHeadDotProductAttention_1/query
ttention (QuantizedAttention) )) [0][0]',
'Transformer/EncoderBlock_10/Mul
tiHeadDotProductAttention_1/key[0
][0]',
'Transformer/EncoderBlock_10/Mul
tiHeadDotProductAttention_1/value
[0][0]']
Transformer/EncoderBlock_10/Mu (None, 197, 192) 37440 ['Transformer/EncoderBlock_10/Mul
ltiHeadDotProductAttention_1/o tiHeadDotProductAttention_1/atten
ut (QuantizedDense) tion[0][0]']
dropout_30 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_10/Mul
tiHeadDotProductAttention_1/out[0
][0]']
Transformer/EncoderBlock_10/ad (None, 197, 192) 384 ['dropout_30[0][0]',
d_1 (QuantizedAdd) 'Transformer/EncoderBlock_9/add_
2[0][0]']
Transformer/EncoderBlock_10/La (None, 197, 192) 768 ['Transformer/EncoderBlock_10/add
yerNorm_2 (QuantizedLayerNorma _1[0][0]']
lization)
Transformer/EncoderBlock_10/Ml (None, 197, 768) 148224 ['Transformer/EncoderBlock_10/Lay
pBlock/Dense_0 (QuantizedDense erNorm_2[0][0]']
)
Transformer/EncoderBlock_10/Ml (None, 197, 768) 1536 ['Transformer/EncoderBlock_10/Mlp
pBlock/activation (QuantizedRe Block/Dense_0[0][0]']
LU)
dropout_31 (QuantizedDropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_10/Mlp
Block/activation[0][0]']
Transformer/EncoderBlock_10/Ml (None, 197, 192) 148032 ['dropout_31[0][0]']
pBlock/Dense_1 (QuantizedDense
)
dropout_32 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_10/Mlp
Block/Dense_1[0][0]']
Transformer/EncoderBlock_10/ad (None, 197, 192) 384 ['Transformer/EncoderBlock_10/add
d_2 (QuantizedAdd) _1[0][0]',
'dropout_32[0][0]']
Transformer/EncoderBlock_11/La (None, 197, 192) 768 ['Transformer/EncoderBlock_10/add
yerNorm_0 (QuantizedLayerNorma _2[0][0]']
lization)
Transformer/EncoderBlock_11/Mu (None, 197, 192) 37058 ['Transformer/EncoderBlock_11/Lay
ltiHeadDotProductAttention_1/q erNorm_0[0][0]']
uery (QuantizedDense)
Transformer/EncoderBlock_11/Mu (None, 197, 192) 37058 ['Transformer/EncoderBlock_11/Lay
ltiHeadDotProductAttention_1/k erNorm_0[0][0]']
ey (QuantizedDense)
Transformer/EncoderBlock_11/Mu (None, 197, 192) 37440 ['Transformer/EncoderBlock_11/Lay
ltiHeadDotProductAttention_1/v erNorm_0[0][0]']
alue (QuantizedDense)
Transformer/EncoderBlock_11/Mu ((None, 197, 192), 384 ['Transformer/EncoderBlock_11/Mul
ltiHeadDotProductAttention_1/a (None, 3, 197, 197 tiHeadDotProductAttention_1/query
ttention (QuantizedAttention) )) [0][0]',
'Transformer/EncoderBlock_11/Mul
tiHeadDotProductAttention_1/key[0
][0]',
'Transformer/EncoderBlock_11/Mul
tiHeadDotProductAttention_1/value
[0][0]']
Transformer/EncoderBlock_11/Mu (None, 197, 192) 37440 ['Transformer/EncoderBlock_11/Mul
ltiHeadDotProductAttention_1/o tiHeadDotProductAttention_1/atten
ut (QuantizedDense) tion[0][0]']
dropout_33 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_11/Mul
tiHeadDotProductAttention_1/out[0
][0]']
Transformer/EncoderBlock_11/ad (None, 197, 192) 384 ['dropout_33[0][0]',
d_1 (QuantizedAdd) 'Transformer/EncoderBlock_10/add
_2[0][0]']
Transformer/EncoderBlock_11/La (None, 197, 192) 768 ['Transformer/EncoderBlock_11/add
yerNorm_2 (QuantizedLayerNorma _1[0][0]']
lization)
Transformer/EncoderBlock_11/Ml (None, 197, 768) 148224 ['Transformer/EncoderBlock_11/Lay
pBlock/Dense_0 (QuantizedDense erNorm_2[0][0]']
)
Transformer/EncoderBlock_11/Ml (None, 197, 768) 1536 ['Transformer/EncoderBlock_11/Mlp
pBlock/activation (QuantizedRe Block/Dense_0[0][0]']
LU)
dropout_34 (QuantizedDropout) (None, 197, 768) 0 ['Transformer/EncoderBlock_11/Mlp
Block/activation[0][0]']
Transformer/EncoderBlock_11/Ml (None, 197, 192) 148032 ['dropout_34[0][0]']
pBlock/Dense_1 (QuantizedDense
)
dropout_35 (QuantizedDropout) (None, 197, 192) 0 ['Transformer/EncoderBlock_11/Mlp
Block/Dense_1[0][0]']
Transformer/EncoderBlock_11/ad (None, 197, 192) 0 ['Transformer/EncoderBlock_11/add
d_2 (QuantizedAdd) _1[0][0]',
'dropout_35[0][0]']
Transformer/EncoderNorm (Quant (None, 197, 192) 1152 ['Transformer/EncoderBlock_11/add
izedBatchNormalization) _2[0][0]']
ExtractToken (QuantizedExtract (None, 192) 0 ['Transformer/EncoderNorm[0][0]']
Token)
Head (QuantizedDense) (None, 1000) 193000 ['ExtractToken[0][0]']
dequantizer (Dequantizer) (None, 1000) 0 ['Head[0][0]']
==================================================================================================
Total params: 5,773,528
Trainable params: 5,717,416
Non-trainable params: 56,112
__________________________________________________________________________________________________
5. Conversion to Akida
A model quantized through QuantizeML python package is ready to be converted to Akida. Once the quantized model has the desired accuracy CNN2SNN toolkit is used for conversion to Akida. There is no further optimization required and equivalent accuracy is observed upon converting the model to Akida.
from cnn2snn import convert
# Convert the model
model_akida = convert(model_quantized)
model_akida.summary()
Model Summary
________________________________________________
Input shape Output shape Sequences Layers
================================================
[224, 224, 3] [1, 1, 1000] 1 137
________________________________________________
___________________________________________________________________________________________________________________
Layer (type) Output shape Kernel shape
======================================= SW/Embedding-dequantizer (Software) =======================================
Embedding (Stem) [1, 197, 192] (16, 16, 3, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_0/LayerNorm_0 (MadNorm) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_0/MultiHeadDotProductAttention_1/query (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_0/MultiHeadDotProductAttention_1/key (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_0/MultiHeadDotProductAttention_1/value (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_0/MultiHeadDotProductAttention_1/attention (Attention) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_0/MultiHeadDotProductAttention_1/out (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_0/add_1 (Add) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_0/LayerNorm_2 (MadNorm) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_0/MlpBlock/Dense_0 (Dense2D) [1, 197, 768] (192, 768)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_0/MlpBlock/Dense_1 (Dense2D) [1, 197, 192] (768, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_0/add_2 (Add) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_1/LayerNorm_0 (MadNorm) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_1/MultiHeadDotProductAttention_1/query (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_1/MultiHeadDotProductAttention_1/key (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_1/MultiHeadDotProductAttention_1/value (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_1/MultiHeadDotProductAttention_1/attention (Attention) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_1/MultiHeadDotProductAttention_1/out (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_1/add_1 (Add) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_1/LayerNorm_2 (MadNorm) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_1/MlpBlock/Dense_0 (Dense2D) [1, 197, 768] (192, 768)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_1/MlpBlock/Dense_1 (Dense2D) [1, 197, 192] (768, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_1/add_2 (Add) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_2/LayerNorm_0 (MadNorm) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_2/MultiHeadDotProductAttention_1/query (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_2/MultiHeadDotProductAttention_1/key (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_2/MultiHeadDotProductAttention_1/value (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_2/MultiHeadDotProductAttention_1/attention (Attention) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_2/MultiHeadDotProductAttention_1/out (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_2/add_1 (Add) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_2/LayerNorm_2 (MadNorm) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_2/MlpBlock/Dense_0 (Dense2D) [1, 197, 768] (192, 768)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_2/MlpBlock/Dense_1 (Dense2D) [1, 197, 192] (768, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_2/add_2 (Add) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_3/LayerNorm_0 (MadNorm) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_3/MultiHeadDotProductAttention_1/query (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_3/MultiHeadDotProductAttention_1/key (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_3/MultiHeadDotProductAttention_1/value (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_3/MultiHeadDotProductAttention_1/attention (Attention) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_3/MultiHeadDotProductAttention_1/out (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_3/add_1 (Add) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_3/LayerNorm_2 (MadNorm) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_3/MlpBlock/Dense_0 (Dense2D) [1, 197, 768] (192, 768)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_3/MlpBlock/Dense_1 (Dense2D) [1, 197, 192] (768, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_3/add_2 (Add) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_4/LayerNorm_0 (MadNorm) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_4/MultiHeadDotProductAttention_1/query (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_4/MultiHeadDotProductAttention_1/key (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_4/MultiHeadDotProductAttention_1/value (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_4/MultiHeadDotProductAttention_1/attention (Attention) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_4/MultiHeadDotProductAttention_1/out (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_4/add_1 (Add) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_4/LayerNorm_2 (MadNorm) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_4/MlpBlock/Dense_0 (Dense2D) [1, 197, 768] (192, 768)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_4/MlpBlock/Dense_1 (Dense2D) [1, 197, 192] (768, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_4/add_2 (Add) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_5/LayerNorm_0 (MadNorm) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_5/MultiHeadDotProductAttention_1/query (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_5/MultiHeadDotProductAttention_1/key (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_5/MultiHeadDotProductAttention_1/value (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_5/MultiHeadDotProductAttention_1/attention (Attention) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_5/MultiHeadDotProductAttention_1/out (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_5/add_1 (Add) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_5/LayerNorm_2 (MadNorm) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_5/MlpBlock/Dense_0 (Dense2D) [1, 197, 768] (192, 768)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_5/MlpBlock/Dense_1 (Dense2D) [1, 197, 192] (768, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_5/add_2 (Add) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_6/LayerNorm_0 (MadNorm) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_6/MultiHeadDotProductAttention_1/query (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_6/MultiHeadDotProductAttention_1/key (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_6/MultiHeadDotProductAttention_1/value (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_6/MultiHeadDotProductAttention_1/attention (Attention) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_6/MultiHeadDotProductAttention_1/out (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_6/add_1 (Add) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_6/LayerNorm_2 (MadNorm) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_6/MlpBlock/Dense_0 (Dense2D) [1, 197, 768] (192, 768)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_6/MlpBlock/Dense_1 (Dense2D) [1, 197, 192] (768, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_6/add_2 (Add) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_7/LayerNorm_0 (MadNorm) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_7/MultiHeadDotProductAttention_1/query (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_7/MultiHeadDotProductAttention_1/key (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_7/MultiHeadDotProductAttention_1/value (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_7/MultiHeadDotProductAttention_1/attention (Attention) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_7/MultiHeadDotProductAttention_1/out (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_7/add_1 (Add) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_7/LayerNorm_2 (MadNorm) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_7/MlpBlock/Dense_0 (Dense2D) [1, 197, 768] (192, 768)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_7/MlpBlock/Dense_1 (Dense2D) [1, 197, 192] (768, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_7/add_2 (Add) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_8/LayerNorm_0 (MadNorm) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_8/MultiHeadDotProductAttention_1/query (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_8/MultiHeadDotProductAttention_1/key (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_8/MultiHeadDotProductAttention_1/value (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_8/MultiHeadDotProductAttention_1/attention (Attention) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_8/MultiHeadDotProductAttention_1/out (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_8/add_1 (Add) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_8/LayerNorm_2 (MadNorm) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_8/MlpBlock/Dense_0 (Dense2D) [1, 197, 768] (192, 768)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_8/MlpBlock/Dense_1 (Dense2D) [1, 197, 192] (768, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_8/add_2 (Add) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_9/LayerNorm_0 (MadNorm) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_9/MultiHeadDotProductAttention_1/query (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_9/MultiHeadDotProductAttention_1/key (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_9/MultiHeadDotProductAttention_1/value (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_9/MultiHeadDotProductAttention_1/attention (Attention) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_9/MultiHeadDotProductAttention_1/out (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_9/add_1 (Add) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_9/LayerNorm_2 (MadNorm) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_9/MlpBlock/Dense_0 (Dense2D) [1, 197, 768] (192, 768)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_9/MlpBlock/Dense_1 (Dense2D) [1, 197, 192] (768, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_9/add_2 (Add) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_10/LayerNorm_0 (MadNorm) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_10/MultiHeadDotProductAttention_1/query (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_10/MultiHeadDotProductAttention_1/key (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_10/MultiHeadDotProductAttention_1/value (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_10/MultiHeadDotProductAttention_1/attention (Attention) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_10/MultiHeadDotProductAttention_1/out (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_10/add_1 (Add) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_10/LayerNorm_2 (MadNorm) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_10/MlpBlock/Dense_0 (Dense2D) [1, 197, 768] (192, 768)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_10/MlpBlock/Dense_1 (Dense2D) [1, 197, 192] (768, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_10/add_2 (Add) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_11/LayerNorm_0 (MadNorm) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_11/MultiHeadDotProductAttention_1/query (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_11/MultiHeadDotProductAttention_1/key (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_11/MultiHeadDotProductAttention_1/value (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_11/MultiHeadDotProductAttention_1/attention (Attention) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_11/MultiHeadDotProductAttention_1/out (Dense2D) [1, 197, 192] (192, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_11/add_1 (Add) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_11/LayerNorm_2 (MadNorm) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_11/MlpBlock/Dense_0 (Dense2D) [1, 197, 768] (192, 768)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_11/MlpBlock/Dense_1 (Dense2D) [1, 197, 192] (768, 192)
___________________________________________________________________________________________________________________
Transformer/EncoderBlock_11/add_2 (Add) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
Transformer/EncoderNorm (BatchNormalization) [1, 197, 192] N/A
___________________________________________________________________________________________________________________
ExtractToken (ExtractToken) [1, 1, 192] N/A
___________________________________________________________________________________________________________________
Head (Dense2D) [1, 1, 1000] (192, 1000)
___________________________________________________________________________________________________________________
dequantizer (Dequantizer) [1, 1, 1000] N/A
___________________________________________________________________________________________________________________
6. Displaying results Attention Maps
Instead of showing predictions, here we propose to show attention maps on an image. This is derived from Abnar et al. attention rollout as shown in the following Keras tutorial. This aims to highlight the model abilities to focus on relevant parts in the input image.
Just like for the AkidaNet example, ImageNet images are not publicly available, this example uses a set of 10 copyright free images that were found on Google using ImageNet class names.
Get sample images and preprocess them:
import os
import numpy as np
from tensorflow.io import read_file
from tensorflow.image import decode_jpeg
from akida_models.imagenet import preprocessing
# Model specification and hyperparameters
NUM_CHANNELS = 3
IMAGE_SIZE = 224
NUM_IMAGES = 10
# Retrieve dataset file from Brainchip data server
file_path = fetch_file(
fname="imagenet_like.zip",
origin="https://data.brainchip.com/dataset-mirror/imagenet_like/imagenet_like.zip",
cache_subdir='datasets/imagenet_like',
extract=True)
data_folder = os.path.dirname(file_path)
# Load images for test set
x_test_files = []
x_test = np.zeros((NUM_IMAGES, IMAGE_SIZE, IMAGE_SIZE, NUM_CHANNELS)).astype('uint8')
for id in range(NUM_IMAGES):
test_file = 'image_' + str(id + 1).zfill(2) + '.jpg'
x_test_files.append(test_file)
img_path = os.path.join(data_folder, test_file)
base_image = read_file(img_path)
image = decode_jpeg(base_image, channels=NUM_CHANNELS)
image = preprocessing.preprocess_image(image, (IMAGE_SIZE, IMAGE_SIZE))
x_test[id, :, :, :] = np.expand_dims(image, axis=0)
print(f'{NUM_IMAGES} images loaded and preprocessed.')
10 images loaded and preprocessed.
Build and display the attention map for one selected sample:
import cv2
import matplotlib.pyplot as plt
from keras import Model
from quantizeml.layers import ClassToken, Attention
from quantizeml.tensors import FixedPoint
from quantizeml.models.transforms.transforms_utils import get_layers_by_type
def build_attention_map(model, image):
# Get the Attention layers list
attentions = get_layers_by_type(model, Attention)
# Calculate the number of tokens and deduce the grid size
num_tokens = sum(isinstance(ly, ClassToken) for ly in model.layers)
grid_size = int(np.sqrt(attentions[0].output_shape[0][-2] - num_tokens))
# Get the attention weights from each transformer
outputs = [la.output[1] for la in attentions]
weights = Model(inputs=model.inputs, outputs=outputs).predict(np.expand_dims(image, 0))
# Converts to float if needed
weights = [w.to_float() if isinstance(w, FixedPoint) else w for w in weights]
weights = np.array(weights)
# Heads number
num_heads = weights.shape[2]
num_layers = weights.shape[0]
reshaped = weights.reshape((num_layers, num_heads, grid_size**2 + 1, grid_size**2 + 1))
# Average the attention weights across all heads
reshaped = reshaped.mean(axis=1)
# To account for residual connections, we add an identity matrix to the attention matrix and
# re-normalize the weights.
reshaped = reshaped + np.eye(reshaped.shape[1])
reshaped = reshaped / reshaped.sum(axis=(1, 2))[:, np.newaxis, np.newaxis]
# Recursively multiply the weight matrices
v = reshaped[-1]
for n in range(1, len(reshaped)):
v = np.matmul(v, reshaped[-1 - n])
# Attention from the output token to the input space
mask = v[0, 1:].reshape(grid_size, grid_size)
mask = cv2.resize(mask / mask.max(), (image.shape[1], image.shape[0]))[..., np.newaxis]
return (mask * image).astype("uint8")
# Using a specific image for which attention map is easier to observe
image = x_test[8]
# Compute the attention map
attention_float = build_attention_map(model_keras, image)
attention_quantized = build_attention_map(model_quantized, image)
# Display the attention map
fig, (ax1, ax2, ax3) = plt.subplots(ncols=3)
ax1.axis('off')
ax1.set_title('Original')
ax1.imshow(image)
ax2.axis('off')
ax2.set_title('Float')
ax2.imshow(attention_float)
ax3.axis('off')
ax3.set_title('Quantized')
ax3.imshow(attention_quantized)
fig.suptitle('Attention masks', fontsize=10)
plt.show()
1/1 [==============================] - ETA: 0s
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - ETA: 0s
1/1 [==============================] - 34s 34s/step
Total running time of the script: (1 minutes 57.885 seconds)