FPGA based Telepresence Robot
May 28, 2012
I was fascinated by remote-controlled machines, so I decided to build a telepresence robot that could capture live video, send it to my computer, and let me drive it from anywhere in the house. Most people I spoke to about this project expected me to use a microcontroller board or a small single-board computer. But being a hardware enthusiast, I wanted something more powerful and flexible for real-time data processing, so I went with an FPGA.
My goal was to create a roaming platform with four wheels, a camera, and a wireless connection. It would stream video over WiFi, let me control it in real-time from a Windows-based GUI, and also detect marked objects for fun experiments in automated identification. Throughout this project, I had to solve problems related to FPGA logic design, custom sensor interfaces, and high-speed data transfer. The end result, though a bit DIY in appearance, turned out to be a robust and responsive telepresence system.
Main Elements
- FPGA-Based Processing: Handled vision, motor control, and wireless communication with a soft-core CPU and external SDRAM.
- Real-Time Vision: Processed camera input with Debayering, HSV conversion, thresholding, and blob detection for object tracking.
- Wireless Streaming & Control: Used UDP video transmission with RLE compression and low-latency remote commands via a Windows GUI.
- Closed-Loop Motor Control: PWM-driven motors with encoder feedback and a PID loop for precise movement.
- Windows GUI & Telepresence: Integrated live video, joystick control, object tracking, and two-way audio for remote operation.
Getting Started with the Hardware
I chose a mid-range FPGA development board that had enough logic resources, on-board RAM, and I/O pins to manage camera data, motor control, and wireless communication. It came with a small suite of peripherals, including a configuration ROM and some built-in debugging LEDs. I liked that it had a large number of digital I/Os, because I knew I'd need to hook up a camera, sensors, and motor drivers.
For the mechanical side, I used a four-wheeled chassis with gearmotors. It wasn't anything too fancy, but sturdy enough to carry the FPGA board, battery, and additional circuitry. Each wheel had a small DC motor and a mounted encoder so I could measure wheel rotation. From experience, I knew the encoder would be vital for closed-loop speed control. I dedicated a separate region on the FPGA to read pulses from these encoders.
Power management was the next concern. My motors ran off 12 V, but the FPGA board needed 3.3 V for I/O and a lower internal voltage for the core. I installed two DC/DC converters: one from 12 V to 5 V, another from 5 V to 3.3 V, taking care to filter noise to keep the FPGA stable. Then I added an H-bridge module for each motor. Those modules took the PWM signals from my FPGA and converted them into the appropriate voltage and current for the motors.
The final key piece was a WiFi module, essentially a small board configured for 802.11 communications. I connected it to the FPGA via a standard interface, allowing the FPGA logic to send packets and receive incoming data. This was a bit tricky at first: I had to implement a lightweight bus interface on the FPGA side to handle communication with the module. But once I had the basic send/receive routines working, I was ready to start tackling the real challenge: streaming video in real-time.
Capturing and Processing Vision Data on the FPGA
I'd decided early on that I wanted to do at least some level of object identification on-board, so I looked for a camera module that provided a digital video signal I could feed directly into the FPGA. Once I found a suitable module, I tied its pixel clock and data lines to my FPGA. In the beginning, I wrote a small test block in Verilog that latched each pixel and stored it in a buffer. To check if everything was working, I used the development board's VGA output and displayed a crude live feed on a monitor, just to see the raw image data.
After I confirmed that was functional, I built a color-thresholding pipeline. The camera output was in a Bayer pattern, so I added a Debayer unit to get RGB values, and then converted those values into HSV space. At that point, I ran a thresholding operation to highlight regions of a specific color. My intention was to place colored markers on objects, so I set the threshold values to pick out that distinct hue. The pipeline produced a binary mask: 1 for pixels that matched, 0 otherwise. Then I wrote a blob-detection hardware module that scanned the image to find connected pixel regions and calculate their centroids. The FPGA stored the centroid coordinates in a register that my embedded CPU (running on the same FPGA) could read.
Of course, all of this took up quite a bit of FPGA resources. I carefully budgeted logic cells for the image pipeline, the wireless interface, the motor control system, and my soft-core CPU. There were moments I nearly ran out of space, but by optimizing the pipeline stages and using external SDRAM instead of storing entire frames in on-chip memory, I made it fit. Achieving real-time performance (at least 30 frames per second) demanded a deeply pipelined approach. Each stage, including Debayer, color conversion, thresholding, and blob detection, operated mostly in parallel. That's the beauty of FPGAs: I didn't have to do it sequentially in software.
Wireless Video Transfer and Command Reception
Next, I had to stream the camera feed to my Windows GUI. Without compression, that was going to consume significant bandwidth, so I started with a lower resolution. Using a fairly small frame size, I packaged each row of video data into packets and sent them via the WiFi module. In the Windows application, I reassembled them into images. That worked okay, but the throughput demands were quite high. I then implemented a simple form of compression, essentially a run-length encoding (RLE), to reduce repeated background pixels. This provided enough data reduction to maintain a relatively smooth frame rate without saturating my WiFi link.
On the control side, I had an asynchronous channel for commands like "go forward," "turn left," or "stop." This channel used a simple protocol on top of UDP. My embedded CPU, a soft-core processor on the FPGA, listened for incoming packets and wrote new control values into the motor control registers. Because UDP is connectionless, I had to handle reliability in software, but occasional lost commands weren't catastrophic for teleoperation. I appreciated the low latency this approach provided.
Initially, jitter spikes disrupted the driving experience. Adjusting the WiFi module to a less congested channel and ensuring my image packets were small enough to avoid fragmentation minimized latency. This allowed smoother navigation, even at a decent distance from the router.
Designing the Windows GUI
I wrote a Windows-based GUI application that combined the robot's video feed with a control panel. The main window had a display area where frames were drawn. I also overlaid bounding boxes for any detected objects, pulling that data from the packets sent by the robot. It was very satisfying to see a highlighted rectangle on the image whenever the robot encountered a marked object in its field of view.
Below the video display, I put arrow buttons and a speed slider. I also allowed joystick input so I could drive it using a gamepad. The commands were translated into simple messages, such as "left wheel speed = X, right wheel speed = Y," which I sent to the FPGA. Under the hood, I used a networking library for sending and receiving UDP packets. The GUI also displayed the battery voltage, measured by the FPGA through an ADC module, and the WiFi signal strength from my module. Providing this status information made operating the robot safely much easier, especially since the robot's power drained faster than I anticipated when streaming video and driving motors simultaneously.
To make the telepresence experience more immersive, I later added a microphone and small speaker. The microphone signals were digitized by another small module, but I mostly used a separate application for sending that audio data. I partially integrated audio into my GUI so I could hear ambient sounds from the robot's environment and even speak commands that the robot's speaker would play. Though I had to do some careful bandwidth management, it did contribute to a deeper sense of presence.
Motor Control and Closed-Loop Feedback
Getting the robot to move accurately was another significant challenge. With four motors, I decided to keep it simple by pairing the left side motors and the right side motors, using tank steering. The FPGA generated two PWM signals, one for the left motors and one for the right. Each signal could vary its duty cycle to control speed and had a direction bit for forward or reverse. I read the wheel encoders in hardware counters to measure how many pulses occurred within a given time frame. That information was combined into a proportional control loop to stabilize the speed around the user-set target.
Because the FPGA can respond to encoder pulses very quickly, the control loop was highly precise. I updated it every few milliseconds, ensuring that if one side slowed due to friction or a bump, the PWM duty cycle automatically adjusted. Initially, I tried to run the control loop purely in Verilog, but eventually, I moved the math for the PID calculation into my soft-core CPU. The hardware still handled pulse counting, but the CPU polled those counters at a fixed rate and computed the new duty cycle each loop iteration. This approach struck a good balance between real-time response and maintainability, allowing me to tweak the PID gains in software.
Object Identification with Marked Targets
One of the coolest parts of my build was the marked object detection. I tested it by placing a bright neon sticker on various objects and letting my FPGA perform HSV-based thresholding. It could handle moderate lighting changes, though strong shadows sometimes caused false positives. The blob detection found the largest contiguous region of thresholded pixels, and the hardware computed that region's centroid and bounding box.
I integrated this into the Windows GUI by drawing a rectangle over the video feed whenever the robot spotted the marker. Then I displayed the centroid coordinates in text form, allowing me to see if the object was centered or off to the side. Later, I wrote an experimental routine in the embedded CPU that would automatically rotate the robot to center the object if I pressed a button in the GUI. It acted as a rudimentary "follow that marker" feature. Although it wasn't fully robust, it was an excellent demonstration of how real-time hardware processing could guide the robot's control logic.
Challenges and Lessons Learned
Building this telepresence robot taught me a lot about balancing resources. The FPGA had to simultaneously handle camera data, motor control, object detection, and WiFi communications. I had to be mindful of clock domains and data throughput. At one point, my video pipeline was too large to fit in on-chip memory, forcing me to rely heavily on external SDRAM. Writing an SDRAM controller that could keep up with the data demands was tricky, but once I got it right, it became quite powerful.
I also learned the importance of carefully managing wireless bandwidth and latency. For real-time telepresence, even small delays can make driving the robot feel clumsy. Optimizing packet sizes, using efficient data formats, and testing in different environments were all crucial steps. Physically, I had to ensure mounting the FPGA board on a rolling platform with spinning motors wouldn't cause vibrations that could unseat connectors or stress the circuit boards.
Final Thoughts
The robot became a staple of my lab workspace. I can hop on my Windows PC, open the control GUI, and drive the robot around while I see a live video feed. If I place a black sticker or tape on an object, the robot detects it in real-time. Watching it respond so quickly reminds me how powerful FPGAs can be for parallel tasks. It also inspires me to push the project further, maybe adding a second camera for depth perception, or refining my hardware to include more sophisticated image processing.
In the end, this build proved that an FPGA is more than capable of driving motors, handling sensors, and streaming video wirelessly with surprisingly low latency. The fact that I could do color-based object detection entirely in hardware, all while streaming and controlling the robot, still amazes me. It took a fair bit of debugging and hardware integration, but having that level of flexibility and performance justifies the complexity.