Robot Jenga Assistant

Published: Dec 19, 2022 by Liz Metzger

(ROS2, Computer Vision, Machine Learning)

In this project I worked with a group using ROS2, computer vision, and machine learning to make a robotic jenga assistant. I mainly worked on the computer vision aspects of this project. Our main goal was to make it so that when there was a brick partially pushed out of the jenga tower the robot would be able to find the brick, move to it, grab it, remove it from the tower, place it on top in the right orientation, then do it again.

The first step in setting up the robot was to create a calibration method so that we knew where everything was relative to the robot. To do this we used tags and computer vision by moving the robot to a calibration position so that the end effector was in view of the camera then putting an april tag in the end effector. This meant that the camera was able to create a frame at the center of the april tag that corresponded to the frame of the end effector. Since we knew the relation between the camera and the april tag and the base of the robot and the end effector we were able to create a static transform between the camera and the base of the robot once the tag was detected. Adding this frame meant that anything that was within the camera’s frame would be able to be converted into the frame of the robot so we would be able to accurately move the end effector to desired positions. The image below shows our robot in its calibration position while holding the april tag.

jengabells_calibrate

After calibrating the program entered an initial scanning mode where the program scanned through each layer of the depth map generated by our realsense camera until it located the top of the tower. Once it found the top of the tower it saved the depth information, the coordinates of the centroid, and used openCV’s line detection to record the starting orientation of the top of the tower. It determined the top of the tower by finding a contour larger than a specified size and detect the orientation of the top of the tower using canny edge detection then houghlines on the image of the lines in a masked area. An example of what the line detection looks like is shown below. After finding the top of the tower the program would continue to scan until it detected another large surface area and it would save that depth as the table. This scanning process would give us a depth range from the top of the tower to the table where we were looking for removed blocks.

jengabells_lines

It located blocks that had been pushed out because as it scanned it would find a contour of a significant surface area. Once it located a block it would take the centroid of the contour and publish a tf to the tf tree at the location of the block. Once the tf was published then a MoveIt command could be sent to the robot to make it move to the desired location. We used a custom API to plan then execute a path to a given location where the starting location was assumed to be the current position of the robot. The arm moved to the block using a couple different messages – cartesian and trajectory. The cartesian path used waypoints to ensure that the end effector is moved in a straight line to the desired location while trajectory let the arm move in whatever path was calculated with GetPositionIK to the final pose based off of necessary joint angles. Once the end effector reached its final pose a grab action was used to grip the block then cartesian paths were used to remove the block and place it on top of the tower. We also added obstacles to the planning scene such as the camera, table, and tower so that the paths would be planned without hitting any of the obstacles in our environment.

After the block is placed the program goes back to scanning for pieces. During this time it uses a machine learning model to look for a person in the frame. We used a pre-trained neural network from MobileNet and trained the last layer using about 1000 images for each class (hands in frame and no hands). The program counts frames where there are no hands visible and if it gets to 80 consecutive frames then it searches for another block. If no block is found then it goes back to check for hands again and repeats the process until it finds a block to retrieve. We were able to get the robot to do ten blocks in a row following the jenga pattern as is seen in the video below!

Group Members: Liz Metzger Katie Hughes Alyssa Chen Hang Yin