In this paper, we propose a transformer-based model for multi-person human image matting model using pose estimation, which uses information from human pose estimation to guide a transformer-based matting network to improve its performance in human im...
In this paper, we propose a transformer-based model for multi-person human image matting model using pose estimation, which uses information from human pose estimation to guide a transformer-based matting network to improve its performance in human image matting task. Our method takes a matting encoder and a multi-person human pose encoder to extract features from input image, then uses a fusion block to combine and merge these features, finally these merged features will be sent to ResNet decoder to be transformed into matting alpha map. Since most of existing image matting datasets are not focused on real world multi-person human matting, we build a combined human image dataset based on existing datasets and trained our model on this new dataset. Experiments shows that our method achieves high performance on multi-person human image matting task, especially on challenging cases.