Consider the challenges we must solve when moving on foot:
If we are walking or running uphill, we lean forward.
Whenever we’re walking on a slippery surface, we take our steps carefully to avoid slipping.
If we’re running barefoot on pebbles, we try to step lightly to avoid hurting the soles of our feet.
If one of our feet is injured, we shift our weight to remove pressure from it.
Interestingly, we produce all these complex functions with little thought.
Drawing lessons from the human brain, we have leveraged Artificial Intelligence that helps us control our drone’s flight using our voice and some simple gestures. The system uses Artificial Intelligence to control the drone’s movement using gesture recognition and voice detection.
In this article, I want to introduce the solution based on the Hand KeyPoint detection model by MediaPipe and details on Speech recognition and PyAudio modules that help capture the voice and recognize the voice commands.
This project relies on three main parts:
- The Drone API
- Gesture Detection using Mediapipe
- Voice Recognition.
Drone description: –
DJI Tello is a perfect drone for any programming experiment. It has a rich Python Module called DJITelloPy, which helps to entirely control a drone, create drone swarms, and utilize its camera for Computer vision.
About the Approach: –
The application is divided into 3main parts: Gesture recognition, Voice Detection, and Drone controller. Those are autonomous instances that can be readily modified. For example, to add new gestures, voice commands, or change the movement speed of the drone.
Let us take a deeper dig into each part: –
A considerable part of this project is dedicated to the Gesture detector. Media Pipe inspired the idea for the recognition approach in this project.
Here is a quick overview of how it works.
Media Pipe has a python implementation for their Hand Key points Detector. It is returning 3D coordinates of 21 hand landmarks.
We have currently added the following features to the drone right now; the gestures and their functions are: –
a.) Gesture 1 to Activates Body Tracking
This allows us to enable body tracking, which is achieved through gesture detection. It tracks the person who is closest to the drone cam.
For this currently, we are using the media pipe holistic model to get the bounding box around the face (we are using holistic because it provides face, pose and hand detection all with one single model)
b.) Gesture 2 to Enable Voice control
For this, we are using the Speech Recognition python library. The controlling speeches consist of MOVE BACKWARD, MOVE FORWARD, RETURN TO BASE, ENABLE BODY TRACKING, MOVE UP, MOVE DOWN, MOVE LEFT, MOVE RIGHT, LAND, ROTATE CLOCKWISE, and ROTATE ANTICLOCKWISE.
For this, we have a simple speech to text approach we are using Google STT for because it provides accurate text detection (but it requires an internet connection because Google does not have any offline model we have to use API to communicate)
To execute an action according to the command given by the user, create several if conditions. So, suppose the user wishes the drone to go forward. By providing a voice command, say “move forward,” the request is matched with the if condition. If it matches the action set in the code, a signal is sent to the Tello drone to start moving forward.
c.) Open palm: — Signal Landing of the drone
This will land the drone
Drone controller
DJITelloPy documentation has a bunch of functions, some of them are:
- Drone velocity control using the drone.send_rc_control( left_right_velocity, forward_backward_velocity, up_down_velocity, yaw_velocity)
- Live stream a video from the drone camera to the PC using the drone.streamon( )
- Take off and land the drone using the drone.takeoff() & drone.land( )
The best part about Tello is that it has a ready-made Python API to help us do that. We need to set each gesture ID to a command.
Once the drone has detected your face, it will create another bounding box, which is the one used to detect the gestures.
Running the Setup
Let’s move on to the most fun part of the project, i.e., running it.
But before that, we require doing some preparation: –
1). Clone the repository and install the dependencies using pip install -r requirements.txt.
2). Turn on the drone and once the yellow light starts blinking, open the Wi-Fi connections and connect to the TELLO connection.
3). Run the application by running the python main.py command, which will start a python window with the following visualization.
Drone streams take some time to load so give it a few seconds to load once it is visible you can activate the features.
Project GitHub link: https://github.com/zerocool-11/Advanced-AI-Enabled-Drone-Conroller
Written By:
Yash Kumar
An independent and self-motivated student with proven ability and experienced in developing Data Science projects with DS and Algo also worked on integration of Machine Learning with DevOps tools(MLOps) CI/CD pipelines, currently working as Machine Learning Engineer at OptiSol Business Solutions.