Books Online Courses
Free Tutorials  Go to Your University  Placement Preparation 
Latest:- Important tips to get an Off Campus Placements
0 like 0 dislike
8.8k views
in AI-ML-Data Science Projects by (279 points)

Abstract- Communication is the main channel between people to communicate with each other. In the recent years, there has been rapid increase in the number of deaf and dumb victims due to birth defects, accidents and oral diseases. Since deaf and dumb people cannot communicate with normal person so they have to depend on some sort of visual communication.Sometimes people interpret these messages wrongly either through sign language or through lip reading or lip sync. This project is made in such a way to help these specially challenged people hold equal par in the society.



GOEDUHUB's Online Courses @ Udemy



2 Answers

1 like 0 dislike
by (279 points)
selected by
 
Best answer

Hand Gesture Recognition

Purpose of the model-The Main challenges that this special person facing is the communication gap between -special person and normal person. Deaf and Dumb people always find difficulties to communicate with normal person. This huge challenge makes them uncomfortable and they feel discriminated in society. Because of miss communication Deaf and Dumb people feel not to communicate and hence they never able to express their feelings. HGRVC (Hand Gesture Recognition and Voice Conversion) system localizes and track the hand gestures of the dumb and deaf people in order to maintain a communication channel with the other people. 

General idea about our model-The detection of hand gestures can be done using web camera. The pictures are then converted into standard size with the help of pre-processing. The aim of this project is to develop a system that can convert the hand gestures into text. The focus of this project is to place the pictures in the database and with database matching the image is converted into text. The detection involves observation of hand movement. The method gives output in text format that helps to reduce the communication gap between deaf-mute and people.

Architecture of our model

Here we start our implementation of our model -

Basically we will first train our CNN models with a lot of  images of hand gestures.

Why CNN:  As we have seen in CNN tutorial,CNN reads a very large image in a simple manner. CNN most commonly used to analyze visual imagery and are frequently working behind the scenes in image classification.

For Official documentation of Keras (Click Here) ,and for tensorflow (click here).

import library-

import numpy as np

import matplotlib.pyplot as plt

import utils

import os

%matplotlib inline

from keras.preprocessing.image import ImageDataGenerator

from keras.layers import Dense, Input, Dropout,Flatten, Conv2D

from keras.layers import BatchNormalization, Activation, MaxPooling2D

from keras.models import Model, Sequential

from keras.optimizers import Adam

from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau

from keras.utils import plot_model

from IPython.display import SVG, Image

import tensorflow as tf

print("Tensorflow version:", tf.__version__)

Download the dataset from here.

or you can create a dataset by yourself also.

  1. In this part of code, we have imported Keras and its libraries/layers.
  2. As the version of TensorFlow changes, the way the importing of Keras  and keras models changes, so check out official documentation (Links given above). (here tensorflow version 1  is used)

to install libraries we use command on anaconda- pip install "module name"

for example- pip install utils

Lets check our dataset-

for expression in os.listdir("C:/Users/lenovo/signlang/train/"):

    print(str(len(os.listdir("C:/Users/lenovo/signlang/train/"+expression)))+" "+expression+' images')

here we get all the no of files of our dataset. here we give the path of our train dataset to see how many files we have in our data set for each class.

dataset of hand gesture images

pre-process the dataset-

img_size=64

batch_size=64

datagen_train=ImageDataGenerator(horizontal_flip=True)

train_generator=datagen_train.flow_from_directory("C:/Users/lenovo/signlang/train",

                                                 target_size=(img_size,img_size),

                                                 color_mode='grayscale',

                                                 batch_size=batch_size,

                                                 class_mode='categorical',

                                                 shuffle=True)

datagen_validation=ImageDataGenerator(horizontal_flip=True)

validation_generator=datagen_train.flow_from_directory("C:/Users/lenovo/signlang/test",

                                                 target_size=(img_size,img_size),

                                                 color_mode='grayscale',

                                                 batch_size=batch_size,

                                                 class_mode='categorical',

                                                 shuffle=True)

here we pre-process our dataset and take the data into train_generator to train our model and we convert all the images into gray_scale images. and to validate our model we use validation_generator to check accuracy of the model.

preparing our model-

model=Sequential()

#conv-1

model.add(Conv2D(64,(3,3),padding='same',input_shape=(64,64,1)))

model.add(BatchNormalization())

model.add(Activation('relu'))

model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Dropout(0.25))

#2 -conv layer

model.add(Conv2D(128,(5,5),padding='same'))

model.add(BatchNormalization())

model.add(Activation('relu'))

model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Dropout(0.25))

#3 -conv layer

model.add(Conv2D(512,(3,3),padding='same'))

model.add(BatchNormalization())

model.add(Activation('relu'))

model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Dropout(0.25))

#4 -conv layer

model.add(Conv2D(512,(3,3),padding='same'))

model.add(BatchNormalization())

model.add(Activation('relu'))

model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Dropout(0.25))

model.add(Flatten())

model.add(Dense(256))

model.add(BatchNormalization())

model.add(Activation('relu'))

model.add(Dropout(0.25))

model.add(Dense(512))

model.add(BatchNormalization())

model.add(Activation('relu'))

model.add(Dropout(0.25))

model.add(Dense(6,activation='softmax'))

opt=Adam(lr=0.0005)

#lr-learning rate

model.compile(optimizer=opt,loss='categorical_crossentropy',metrics=['accuracy'])

model.summary()

here we prepare our model using CNN so we add 4 convolutional layer and then we connect 2 dense layer for fully conncected layes among the neurons. 

and also here we use adam optimizer to optimize our model and we take initial learning rate as 0.0005 and it will increase after each epochs.

Training of model-

ephocs=15

steps_per_epoch=train_generator.n//train_generator.batch_size

steps_per_epoch

validation_steps=validation_generator.n//validation_generator.batch_size

validation_steps

history=model.fit(

    x=train_generator,

    steps_per_epoch=steps_per_epoch,

    epochs=ephocs,

    validation_data=validation_generator,

    validation_steps=validation_steps,

 #   callbacks=callbacks

)

model.save('hand_gesture.h5')

here we train our model using model.fit() method and it will give almost 90 to 95% accuracy of model.

by (471 points)
In this model you use numeric dataset like (1,2,3,4) but in live video output picture show alphabet like (a,b,c) etc. Why??
by (279 points)
edited by
yes because it just an example i have use the numeric dataset but it is also finely work for hand gestures and alphabets. here i add pictures for hand gesture but the code is written for numeric dataset. you can change categories  for different datasets.
0 like 0 dislike
by (279 points)
edited by

To check  validation accuracy of our model

val = test_datagen.flow_from_directory(

        'path of validation folder',shuffle=True,target_size = (64, 64),batch_size = 32,

class_mode = 'binary')

_,acc=classifier.evaluate(val,verbose=0)

print('>%.3f'%(acc*100.0))

Now we take image from our dataset ,give that image to our model and to check whether it is giving a right prediction or not.

so here is the code for that-

import numpy as np

from keras.preprocessing import image

test_image = image.load_img('path for our image', target_size = (64,64),color_mode = "grayscale")

plt.imshow(test_image)

test_image = image.img_to_array(test_image)

test_image = np.expand_dims(test_image, axis = 0)

result = model.predict(test_image)

a=result.argmax()

s=train_generator.class_indices

        #print(s)

name=[]

for i in s:

    name.append(i)

for i in range(len(s)):

    if(i==a):

        q=name[i]

print(q)

so here we can see that the prediction of the model upon images but our main to check that this model is working or live webcam or not.

for that we have written our code-

#first we load our model

from tensorflow.keras.models import load_model

loaded_model=load_model('hand_gesture3.h5')

from tensorflow.keras.preprocessing import image

import numpy as np

import cv2

#we are starting our web cam

webcam=cv2.VideoCapture(0)

cap = cv2.VideoCapture(0)

# Category dictionary

categories = {0: 'ZERO', 1: 'ONE', 2: 'TWO', 3: 'THREE', 4: 'FOUR', 5: 'FIVE'}

s=""

d={}

p=""

count=0

while True:

    

    _, frame = cap.read()

    # Simulating mirror image

    frame = cv2.flip(frame, 1)

    

    # Got this from collect-data.py

    # Coordinates of the ROI

    x1 = int(0.5*frame.shape[1])

    y1 = 10

    x2 = frame.shape[1]-10

    y2 = int(0.5*frame.shape[1])

    # Drawing the ROI

    # The increment/decrement by 1 is to compensate for the bounding box

    cv2.rectangle(frame, (x1-1, y1-1), (x2+1, y2+1), (255,0,0) ,1)

    # Extracting the ROI

    roi = frame[y1:y2, x1:x2]

    

    # Resizing the ROI so it can be fed to the model for prediction

    roi = cv2.resize(roi, (64, 64)) 

    roi = cv2.cvtColor(roi, cv2.COLOR_BGR2GRAY)

    _, test_image = cv2.threshold(roi, 120, 255, cv2.THRESH_BINARY)

    cv2.imshow("test", test_image)

    # Batch of 1

    result = loaded_model.predict(test_image.reshape(1, 64, 64, 1))

    prediction = {'FIVE': result[0][0], 

                  'FOUR': result[0][1], 

                  'ONE': result[0][2],

                  'THREE': result[0][3],

                  'TWO': result[0][4],

                  'ZERO': result[0][5]}

    max_key = max(prediction, key=prediction.get)

    cv2.putText(test_image,max_key,(x1,y1),cv2.FONT_HERSHEY_SIMPLEX,1,(255, 0, 0),2) 

    print(max_key)

    cv2.imshow("Frame", frame)

    interrupt = cv2.waitKey(2)

    if interrupt & 0xFF == 27: # esc key

        break

cap.release()

cv2.destroyAllWindows()

Here we use opencv file to open the web cam and we take frame from the web cam and according to our model we use in following way-

  1. we take the frame the preprocess that image so first we resize it and convert that image into gray_scale image as we take our image for the training or our model.
  2. we create a rectangle using co-ordinates and we show our hand into the rectangle.
  3. then we resize it and then we remove background from the image so now we have image ready to give the model.
  4. so we give it to model the predict() method will us the right prediction of the image then we put that prediction upon that rectangle.
  5. so in this method we can take our prediction using web-cam.
  • First take the image into rectangle-

  • Now we take only rectangle of that image and resize it.

  • remove the background from the image then process it.

now we take the image as gray scale image and pass that image to our model so the model can predict it.

Convert the result into sound

mytext=q

language='en'

my=gTTS(text=mytext,lang=language,slow=False)

my.save('signtovoice.mp3')

import os

os.system('signtovoice.mp3')

CONCLUSION- Hand Gesture recognition and voice conversion for dumb and deaf person was successfully executed using image processing. The method takes image as input and gives text and speech as an output. Implementation of this system gives up to 90% accuracy and works successfully in most of the test cases.

3.3k questions

7.1k answers

395 comments

4.6k users

 Goeduhub:

About Us | Contact Us || Terms & Conditions | Privacy Policy || Youtube Channel || Telegram Channel © goeduhub.com Social::   |  | 
...