Human Pose Estimation using Depth Video
-Research assistant, Computer vision lab, USC, 2011. 4 ~ current
Estimate and track human pose in depth video without initialization
3D Object Classification / Recognition
-Research assistant, Computer vision lab, USC, 2009. 1 ~ current
3D Object Recognition in Range Images Using Visibility Context
Abstract: Recognizing and localizing queried objects in range images play an important role for robotic manipulation and navigation. Even though it has been steadily studied, it is still a challenging task for scenes with occlusion and clutter.
We present a novel approach to object recognition that boosts dissimilarity between queried objects and similar-shaped background objects in the scene by maximizing use of the visibility context. We design a new point pair feature containing discriminative description inferred from the visibility context.
Also, we propose a pose estimation method that accurately localizes objects using these point pair matches. Finally, two measures of validity are suggested to discard false detections.
With 10 query objects, our approach is evaluated on depth images of cluttered office scenes captured from a real-time range sensor. The experimental results demonstrate that our method remarkably outperforms two state-of-the-art methods in terms of recognition (recall & precision) and runtime
Scalable Object Classification in Range Images
Abstract: We present a novel scalable framework for free-form object classification in range images. The framework includes an automatic 3D object recognition system in range images and a scalable database structure to learn new instances and new categories efficiently.We adopt the TAX model, previously proposed for unsupervised object modeling in 2D images, to construct our hierarchical model of object classes from unlabelled range images. The hierarchical model embodies unorganized shape patterns of 3D objects in various classes in a tree structure with probabilistic distributions. A new visual vocabulary is introduced to represent a range image as a set of visual words for the process of hierarchical model inference, classification and online learning. We also propose an online learning algorithm that updates the hierarchical model efficiently thanks to the tree structure, when a new object should be learned into the model. Extensive experiments demonstrate average classification rates of 94% on a large synthetic dataset (1,350 training images and 450 test images for 9 object classes) and 88.4% on 1,433 depth images captured from real-time range sensors. We also show that our approach outperforms the original TAX method in terms of recall rate and stability. We validated the module on a large-scale LIDAR dataset as well.
(Example of the HSD: Each node represents the discrete probabilistic distribution at each node. Due to the limited space, we only show the paths the test objects belong to, and the paths which share the node with new paths and have the most training images among the branches. For every existing path, the example training object under the path is displayed in the purple box)
(Example result on range images captured from Swiss Ranger SR 3000)
(Example result on range images captured from Prime sensor)
Eunyoung Kim and Gerard Medioni, Scalable Object Classification Using Range images, 3DIMPVT 2011.
Large Scale Range Image Processing
-Research assistant, Computer vision lab, USC, 2007. 3 ~ 2011. 3
Abstract.We present a framework to segment cultural and natural features, given 3D aerial scans of a large urban area, and (optionally) registered ground level scans of the same area. This system is a primary step to achieve the ultimate goal to detect every object from a large number of varied categories, from antenna to power plants. Our approach is to first identify local patches of the ground surface and roofs of buildings. This is accomplished by tensor voting that infers surface orientation from neighboring regions as well as local 3D points. Then, we group adjacent planar surfaces with consistent pose to find surface segments and classify them as either the terrain or roofs of buildings.
Second, we delineate vertical faces of buildings, as well as free-standing vertical structures such as fences. We then use this information as geometric context to segment linear structures such as power lines and the structures attached to walls and roofs from remaining unclassified 3D points in the scene. We demonstrate our system on real LIDAR datasets acquired from typical urban regions with areas of a few square kilometers each, and provide a quantitative analysis of performance using externally provided ground truth.
Eunyoung Kim and Gerard Medioni, Urban Scene Understanding from Aerial and Ground LIDAR Data, Machine Vision and Applications (MVA)
Eunyoung Kim and Gerard Medioni, Dense Structure Inference for Object Classification in Aerial LIDAR Dataset, In ICPR 2010
Planar Patch based 3D Environment Modeling with Stereo Camera
-Research assistant, Computer vision lab, USC, 2006. 8 ~ 2007. 2
Abstract. We present two robust and novel algorithms to model a 3D environment using both intensity and range data provided by an off-the-shelf stereo camera. The main issue we need to address is that the output of the stereo system is both sparse and noisy. To overcome this limitation, we detect planar patches in the environment by region segmentation in 2D and plane extraction in 3D. The extracted planar patches are used not only to represent the workspace, but also to fill holes in range data. We also suggest a new planar patch based scan matching algorithm to register multiple views, and to incrementally augment the description of the 3D workspace in a sequence of scenes. Experimental results on real data show that planar patch segmentation and 3D scene registration for environment modeling can be robustly achieved by the proposed approaches.
Eunyoung Kim, Gerard Medioni and Sukhan Lee, “Planar Patch based 3D Environment Modeling with Stereo Camera”, 16th IEEE International Symposium on Robot & Human Interactive Communication(RO-MAN2007), August 26-29 2007, Jeju island, Korea. [paper]
Fast and robust 3D environment modeling and object pose estimation for robotic manipulation and SLAM
-Research assistant, Intelligent system research center(ISRC), SKKU, 2003. 10 ~ 2006. 7
3D Object Pose Esimation using Multiple Features for Robotic Manipulation
Abstract. For robust 3D object recognition in the environment having diverse variances, it is necessary to increase the certainty by using consecutive scenes and combining different features. This paper proposes a novel 3D object pose estimation approach that combines a photometric feature (SIFT) and geometric feature (3D lines) in a sequence of scenes. In order to utilize the consecutive scenes, we use the particle filtering method and all particles which represent the possible pose of object are generated by each feature. These particles are to be spread out where the object is considered to exist, and the probability of each particle is obtained through matching test with each feature in the scene. Then the particle sets derived from SIFT and 3D lines are fused and it gives a pose of the object estimated. For the sake of computational efficiency, this recognition system is performed in a hierarchical process. In this paper, we also introduce a simple method to decide the next best view position based on results of recognition. Lastly, the experimental results demonstrate that the proposed methods are feasible in real environment.
(The proposed method was integrated into a service robot, T-rot and successfully estimated the pose of objects for robotic manipulation. The video shows the T-rot exhibited in APEC 2005. )
Sukhan Lee, Eunyoung Kim and Yeonchul Park, “3D Object Recognition using Multiple Features for Robot Manipulation”, 2006 IEEE International Conference on Robotics and Automation(ICRA2006), May 15 – 19 2006 , Orlando, Florida. [paper]
A Real-Time 3D Workspace Modeling with Stereo Camera
Abstract. This paper presents a novel approach to real time 3D modeling of workspace for manipulative robotic tasks. First, we establish the three fundamental principles that human uses for modeling and interacting with environment. These principles have led to the development of an integrated approach to real-time 3D modeling, as follows: 1) It starts with a rapid but approximate characterization of the geometric configuration of workspace by identifying global plane features. 2) It quickly recognizes known objects in workspace and replaces them by their models in database based on in-situ registration. 3) It models the geometric details on the fly adaptively to the need of the given task based on a multi-resolution octree representation. SIFT features with their 3D position data, referred to here as stereo-sis SIFT, are used extensively, together with point clouds, for fast extraction of global plane features, for fast recognition of objects, for fast registration of scenes, as well as for overcoming incomplete and noisy nature of point clouds. The experimental results show the feasibility of real-time and behavior-oriented 3D modeling of workspace for robotic manipulative tasks.
(This video demonstrates how the proposed approaches work on a real robot.)
Eunyoung Kim, Daesik Jang, Sukhan Lee and JungHyun Han, “Task-Oriented Context Understanding of 3D Workspace for Robotic Manipulation”, Proceedings of Conference of Korea Information Processing Associate, May. 2005
*Sukhan Lee, Daesik Jang, Eunyoung Kim, Suyeon Hong and JungHyun Han, “Stereo Vision Based Real-Time Workspace Modeling for Robotic Manipulation”, Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS 2005), August 2-6 2005, Edmonton,Alberta, Canada. [paper]