Abstract: Despite the significant advancements in deep learning, the task of estimating 3-D human pose and shape from a monocular RGB image remains challenging, primarily due to depth ambiguity.