Note
Go to the end to download the full example code.
DS-CNN/KWS inference
This tutorial illustrates the process of developing an Akida-compatible speech recognition model that can identify thirty-two different keywords.
Initially, the model is defined as a CNN in Keras and trained regularly. Next, it undergoes quantization using QuantizeML and finally converted to Akida using CNN2SNN.
This example uses a Keyword Spotting Dataset prepared using TensorFlow audio recognition example utils.
1. Load the preprocessed dataset
The TensorFlow speech_commands dataset is used for training and validation. All keywords except “backward”, “follow” and “forward”, are retrieved. These three words are kept to illustrate the edge learning in this edge example.
The words to recognize have been converted to spectrogram images that allows us to use a model architecture that is typically used for image recognition tasks. The raw audio data have been preprocessed, transforming the audio files into MFCC features, well-suited for CNN networks. A pickle file containing the preprocessed data is available on Brainchip data server.
import pickle
from akida_models import fetch_file
# Fetch pre-processed data for 32 keywords
fname = fetch_file(
fname='kws_preprocessed_all_words_except_backward_follow_forward.pkl',
origin="https://data.brainchip.com/dataset-mirror/kws/kws_preprocessed_all_words_except_backward_follow_forward.pkl",
cache_subdir='datasets/kws')
with open(fname, 'rb') as f:
[_, _, x_valid, y_valid, _, _, word_to_index, _] = pickle.load(f)
# Preprocessed dataset parameters
num_classes = len(word_to_index)
print("Wanted words and labels:\n", word_to_index)
Downloading data from https://data.brainchip.com/dataset-mirror/kws/kws_preprocessed_all_words_except_backward_follow_forward.pkl.
0/62628765 [..............................] - ETA: 0s
106496/62628765 [..............................] - ETA: 29s
630784/62628765 [..............................] - ETA: 9s
1335296/62628765 [..............................] - ETA: 6s
2195456/62628765 [>.............................] - ETA: 5s
3031040/62628765 [>.............................] - ETA: 5s
3907584/62628765 [>.............................] - ETA: 4s
4734976/62628765 [=>............................] - ETA: 4s
5652480/62628765 [=>............................] - ETA: 4s
6791168/62628765 [==>...........................] - ETA: 3s
7823360/62628765 [==>...........................] - ETA: 3s
8896512/62628765 [===>..........................] - ETA: 3s
10108928/62628765 [===>..........................] - ETA: 3s
11321344/62628765 [====>.........................] - ETA: 3s
12525568/62628765 [====>.........................] - ETA: 2s
13795328/62628765 [=====>........................] - ETA: 2s
15147008/62628765 [======>.......................] - ETA: 2s
16498688/62628765 [======>.......................] - ETA: 2s
17719296/62628765 [=======>......................] - ETA: 2s
19021824/62628765 [========>.....................] - ETA: 2s
20258816/62628765 [========>.....................] - ETA: 2s
21487616/62628765 [=========>....................] - ETA: 2s
22732800/62628765 [=========>....................] - ETA: 1s
23945216/62628765 [==========>...................] - ETA: 1s
25214976/62628765 [===========>..................] - ETA: 1s
26525696/62628765 [===========>..................] - ETA: 1s
27836416/62628765 [============>.................] - ETA: 1s
29065216/62628765 [============>.................] - ETA: 1s
30326784/62628765 [=============>................] - ETA: 1s
31686656/62628765 [==============>...............] - ETA: 1s
32940032/62628765 [==============>...............] - ETA: 1s
34209792/62628765 [===============>..............] - ETA: 1s
35545088/62628765 [================>.............] - ETA: 1s
36773888/62628765 [================>.............] - ETA: 1s
38100992/62628765 [=================>............] - ETA: 1s
39444480/62628765 [=================>............] - ETA: 1s
40804352/62628765 [==================>...........] - ETA: 0s
42278912/62628765 [===================>..........] - ETA: 0s
43556864/62628765 [===================>..........] - ETA: 0s
45031424/62628765 [====================>.........] - ETA: 0s
46227456/62628765 [=====================>........] - ETA: 0s
47669248/62628765 [=====================>........] - ETA: 0s
48881664/62628765 [======================>.......] - ETA: 0s
50241536/62628765 [=======================>......] - ETA: 0s
51470336/62628765 [=======================>......] - ETA: 0s
52928512/62628765 [========================>.....] - ETA: 0s
54157312/62628765 [========================>.....] - ETA: 0s
55484416/62628765 [=========================>....] - ETA: 0s
56877056/62628765 [==========================>...] - ETA: 0s
58122240/62628765 [==========================>...] - ETA: 0s
59498496/62628765 [===========================>..] - ETA: 0s
60743680/62628765 [============================>.] - ETA: 0s
62283776/62628765 [============================>.] - ETA: 0s
62628765/62628765 [==============================] - 3s 0us/step
Download complete.
Wanted words and labels:
{'six': 23, 'three': 25, 'seven': 21, 'bed': 1, 'eight': 6, 'yes': 31, 'cat': 3, 'on': 18, 'one': 19, 'stop': 24, 'two': 27, 'house': 11, 'five': 7, 'down': 5, 'four': 8, 'go': 9, 'up': 28, 'learn': 12, 'no': 16, 'bird': 2, 'zero': 32, 'nine': 15, 'visual': 29, 'wow': 30, 'sheila': 22, 'marvin': 14, 'off': 17, 'right': 20, 'left': 13, 'happy': 10, 'dog': 4, 'tree': 26, '_silence_': 0}
2. Load a pre-trained native Keras model
The model consists of:
a first convolutional layer accepting dense inputs (images),
several separable convolutional layers preserving spatial dimensions,
a global pooling reducing the spatial dimensions to a single pixel,
a final dense layer to classify words.
All layers are followed by a batch normalization and a ReLU activation.
from tensorflow.keras.models import load_model
# Retrieve the model file from the BrainChip data server
model_file = fetch_file(fname="ds_cnn_kws.h5",
origin="https://data.brainchip.com/models/AkidaV2/ds_cnn/ds_cnn_kws.h5",
cache_subdir='models')
# Load the native Keras pre-trained model
model_keras = load_model(model_file)
model_keras.summary()
Downloading data from https://data.brainchip.com/models/AkidaV2/ds_cnn/ds_cnn_kws.h5.
0/170496 [..............................] - ETA: 0s
106496/170496 [=================>............] - ETA: 0s
170496/170496 [==============================] - 0s 0us/step
Download complete.
Model: "ds_cnn_kws"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) [(None, 49, 10, 1)] 0
rescaling (Rescaling) (None, 49, 10, 1) 0
conv_0 (Conv2D) (None, 25, 5, 64) 1600
conv_0/BN (BatchNormalizat (None, 25, 5, 64) 256
ion)
conv_0/relu (ReLU) (None, 25, 5, 64) 0
dw_separable_1 (DepthwiseC (None, 25, 5, 64) 576
onv2D)
pw_separable_1 (Conv2D) (None, 25, 5, 64) 4096
pw_separable_1/BN (BatchNo (None, 25, 5, 64) 256
rmalization)
pw_separable_1/relu (ReLU) (None, 25, 5, 64) 0
dw_separable_2 (DepthwiseC (None, 25, 5, 64) 576
onv2D)
pw_separable_2 (Conv2D) (None, 25, 5, 64) 4096
pw_separable_2/BN (BatchNo (None, 25, 5, 64) 256
rmalization)
pw_separable_2/relu (ReLU) (None, 25, 5, 64) 0
dw_separable_3 (DepthwiseC (None, 25, 5, 64) 576
onv2D)
pw_separable_3 (Conv2D) (None, 25, 5, 64) 4096
pw_separable_3/BN (BatchNo (None, 25, 5, 64) 256
rmalization)
pw_separable_3/relu (ReLU) (None, 25, 5, 64) 0
dw_separable_4 (DepthwiseC (None, 25, 5, 64) 576
onv2D)
pw_separable_4 (Conv2D) (None, 25, 5, 64) 4096
pw_separable_4/BN (BatchNo (None, 25, 5, 64) 256
rmalization)
pw_separable_4/relu (ReLU) (None, 25, 5, 64) 0
pw_separable_4/global_avg (None, 64) 0
(GlobalAveragePooling2D)
dense_5 (Dense) (None, 33) 2145
act_softmax (Activation) (None, 33) 0
=================================================================
Total params: 23713 (92.63 KB)
Trainable params: 23073 (90.13 KB)
Non-trainable params: 640 (2.50 KB)
_________________________________________________________________
import numpy as np
from sklearn.metrics import accuracy_score
# Check Keras Model performance
potentials_keras = model_keras.predict(x_valid)
preds_keras = np.squeeze(np.argmax(potentials_keras, 1))
accuracy = accuracy_score(y_valid, preds_keras)
print("Accuracy: " + "{0:.2f}".format(100 * accuracy) + "%")
1/308 [..............................] - ETA: 52s
49/308 [===>..........................] - ETA: 0s
96/308 [========>.....................] - ETA: 0s
142/308 [============>.................] - ETA: 0s
190/308 [=================>............] - ETA: 0s
235/308 [=====================>........] - ETA: 0s
282/308 [==========================>...] - ETA: 0s
308/308 [==============================] - ETA: 0s
308/308 [==============================] - 1s 1ms/step
Accuracy: 93.09%
3. Load a pre-trained quantized Keras model
The above native Keras model has been quantized to 8-bit. Note that a 4-bit version is also available from the model zoo.
from quantizeml import load_model
# Load the pre-trained quantized model
model_file = fetch_file(
fname="ds_cnn_kws_i8_w8_a8.h5",
origin="https://data.brainchip.com/models/AkidaV2/ds_cnn/ds_cnn_kws_i8_w8_a8.h5",
cache_subdir='models')
model_keras_quantized = load_model(model_file)
model_keras_quantized.summary()
# Check Model performance
potentials_keras_q = model_keras_quantized.predict(x_valid)
preds_keras_q = np.squeeze(np.argmax(potentials_keras_q, 1))
accuracy_q = accuracy_score(y_valid, preds_keras_q)
print("Accuracy: " + "{0:.2f}".format(100 * accuracy_q) + "%")
Downloading data from https://data.brainchip.com/models/AkidaV2/ds_cnn/ds_cnn_kws_i8_w8_a8.h5.
0/176200 [..............................] - ETA: 0s
122880/176200 [===================>..........] - ETA: 0s
176200/176200 [==============================] - 0s 0us/step
Download complete.
Model: "ds_cnn_kws"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) [(None, 49, 10, 1)] 0
rescaling (QuantizedRescal (None, 49, 10, 1) 0
ing)
conv_0 (QuantizedConv2D) (None, 25, 5, 64) 1664
conv_0/relu (QuantizedReLU (None, 25, 5, 64) 128
)
dw_separable_1 (QuantizedD (None, 25, 5, 64) 704
epthwiseConv2D)
pw_separable_1 (QuantizedC (None, 25, 5, 64) 4160
onv2D)
pw_separable_1/relu (Quant (None, 25, 5, 64) 128
izedReLU)
dw_separable_2 (QuantizedD (None, 25, 5, 64) 704
epthwiseConv2D)
pw_separable_2 (QuantizedC (None, 25, 5, 64) 4160
onv2D)
pw_separable_2/relu (Quant (None, 25, 5, 64) 128
izedReLU)
dw_separable_3 (QuantizedD (None, 25, 5, 64) 704
epthwiseConv2D)
pw_separable_3 (QuantizedC (None, 25, 5, 64) 4160
onv2D)
pw_separable_3/relu (Quant (None, 25, 5, 64) 128
izedReLU)
dw_separable_4 (QuantizedD (None, 25, 5, 64) 704
epthwiseConv2D)
pw_separable_4 (QuantizedC (None, 25, 5, 64) 4160
onv2D)
pw_separable_4/relu (Quant (None, 25, 5, 64) 0
izedReLU)
pw_separable_4/global_avg (None, 64) 2
(QuantizedGlobalAveragePoo
ling2D)
dense_5 (QuantizedDense) (None, 33) 2145
dense_5/dequantizer (Dequa (None, 33) 0
ntizer)
act_softmax (Activation) (None, 33) 0
=================================================================
Total params: 23779 (92.89 KB)
Trainable params: 22753 (88.88 KB)
Non-trainable params: 1026 (4.01 KB)
_________________________________________________________________
1/308 [..............................] - ETA: 16:36
9/308 [..............................] - ETA: 1s
17/308 [>.............................] - ETA: 1s
25/308 [=>............................] - ETA: 1s
33/308 [==>...........................] - ETA: 1s
41/308 [==>...........................] - ETA: 1s
49/308 [===>..........................] - ETA: 1s
57/308 [====>.........................] - ETA: 1s
65/308 [=====>........................] - ETA: 1s
73/308 [======>.......................] - ETA: 1s
81/308 [======>.......................] - ETA: 1s
89/308 [=======>......................] - ETA: 1s
97/308 [========>.....................] - ETA: 1s
105/308 [=========>....................] - ETA: 1s
113/308 [==========>...................] - ETA: 1s
121/308 [==========>...................] - ETA: 1s
129/308 [===========>..................] - ETA: 1s
137/308 [============>.................] - ETA: 1s
145/308 [=============>................] - ETA: 1s
153/308 [=============>................] - ETA: 1s
161/308 [==============>...............] - ETA: 0s
169/308 [===============>..............] - ETA: 0s
177/308 [================>.............] - ETA: 0s
185/308 [=================>............] - ETA: 0s
193/308 [=================>............] - ETA: 0s
201/308 [==================>...........] - ETA: 0s
209/308 [===================>..........] - ETA: 0s
217/308 [====================>.........] - ETA: 0s
225/308 [====================>.........] - ETA: 0s
233/308 [=====================>........] - ETA: 0s
241/308 [======================>.......] - ETA: 0s
249/308 [=======================>......] - ETA: 0s
257/308 [========================>.....] - ETA: 0s
265/308 [========================>.....] - ETA: 0s
273/308 [=========================>....] - ETA: 0s
281/308 [==========================>...] - ETA: 0s
289/308 [===========================>..] - ETA: 0s
297/308 [===========================>..] - ETA: 0s
305/308 [============================>.] - ETA: 0s
308/308 [==============================] - 5s 7ms/step
Accuracy: 92.87%
4. Conversion to Akida
The converted model is Akida 2.0 compatible and its performance evaluation is done using the Akida simulator.
from cnn2snn import convert
# Convert the model
model_akida = convert(model_keras_quantized)
model_akida.summary()
/usr/local/lib/python3.11/dist-packages/cnn2snn/quantizeml/blocks.py:160: UserWarning: Conversion stops at layer dense_5 because of a dequantizer. The end of the model is ignored:
___________________________________________________
Layer (type)
===================================================
act_softmax (Activation)
===================================================
warnings.warn("Conversion stops" + stop_layer_msg + " because of a dequantizer. "
Model Summary
______________________________________________
Input shape Output shape Sequences Layers
==============================================
[49, 10, 1] [1, 1, 33] 1 11
______________________________________________
_________________________________________________________________
Layer (type) Output shape Kernel shape
============ SW/conv_0-dense_5/dequantizer (Software) ===========
conv_0 (InputConv2D) [25, 5, 64] (5, 5, 1, 64)
_________________________________________________________________
dw_separable_1 (DepthwiseConv2D) [25, 5, 64] (3, 3, 64, 1)
_________________________________________________________________
pw_separable_1 (Conv2D) [25, 5, 64] (1, 1, 64, 64)
_________________________________________________________________
dw_separable_2 (DepthwiseConv2D) [25, 5, 64] (3, 3, 64, 1)
_________________________________________________________________
pw_separable_2 (Conv2D) [25, 5, 64] (1, 1, 64, 64)
_________________________________________________________________
dw_separable_3 (DepthwiseConv2D) [25, 5, 64] (3, 3, 64, 1)
_________________________________________________________________
pw_separable_3 (Conv2D) [25, 5, 64] (1, 1, 64, 64)
_________________________________________________________________
dw_separable_4 (DepthwiseConv2D) [25, 5, 64] (3, 3, 64, 1)
_________________________________________________________________
pw_separable_4 (Conv2D) [1, 1, 64] (1, 1, 64, 64)
_________________________________________________________________
dense_5 (Dense1D) [1, 1, 33] (64, 33)
_________________________________________________________________
dense_5/dequantizer (Dequantizer) [1, 1, 33] N/A
_________________________________________________________________
# Check Akida model performance
preds_akida = model_akida.predict_classes(x_valid, num_classes=num_classes)
accuracy = accuracy_score(y_valid, preds_akida)
print("Accuracy: " + "{0:.2f}".format(100 * accuracy) + "%")
# For non-regression purposes
assert accuracy > 0.9
Accuracy: 92.87%
5. Confusion matrix
The confusion matrix provides a good summary of what mistakes the network is making.
Per scikit-learn convention it displays the true class in each row (ie on each row you can see what the network predicted for the corresponding word).
Please refer to the Tensorflow audio recognition example for a detailed explanation of the confusion matrix.
import itertools
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
# Create confusion matrix
cm = confusion_matrix(y_valid, preds_akida,
labels=list(word_to_index.values()))
# Normalize
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
# Display confusion matrix
plt.rcParams["figure.figsize"] = (16, 16)
plt.figure()
title = 'Confusion matrix'
cmap = plt.cm.Blues
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(word_to_index))
plt.xticks(tick_marks, word_to_index, rotation=45)
plt.yticks(tick_marks, word_to_index)
thresh = cm.max() / 2.
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j,
i,
format(cm[i, j], '.2f'),
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")
plt.ylabel('True label')
plt.xlabel('Predicted label')
plt.autoscale()
plt.show()
Total running time of the script: (0 minutes 21.454 seconds)