diff --git a/docs/reference/high-resolution-pose-estimation.md b/docs/reference/high-resolution-pose-estimation.md index c77e90e0ef..791ae06e02 100644 --- a/docs/reference/high-resolution-pose-estimation.md +++ b/docs/reference/high-resolution-pose-estimation.md @@ -8,12 +8,12 @@ Bases: `engine.learners.Learner` The *HighResolutionLightweightOpenPose* class is an implementation for pose estimation in high resolution images. This method creates a heatmap of a resized version of the input image. Using this heatmap, the input image is cropped keeping the area of interest and then it is used for pose estimation. -Since the high resolution pose estimation method is based on Lightweight OpenPose algorithm,the models that could be used have to be trained with Lightweight OpenPose tool. +Since the high resolution pose estimation method is based on the Lightweight OpenPose algorithm, the models that can be used have to be trained with the Lightweight OpenPose tool. In this method there are two important variables which are responsible for the increase in speed and accuracy in high resolution images. -These variables are the *first_pass_height* and the *second_pass_height* that the image is resized in this procedure. +These variables are *first_pass_height* and *second_pass_height* which define how the image is resized in this procedure. -The [HighResolutionPoseEstimationLearner](/src/opendr/perception/pose_estimation/hr_pose_estimation/HighResolutionLearner.py) class has the following public methods: +The [HighResolutionPoseEstimationLearner](/src/opendr/perception/pose_estimation/hr_pose_estimation/high_resolution_learner.py) class has the following public methods: #### `HighResolutionPoseEstimationLearner` constructor ```python @@ -134,7 +134,7 @@ Parameters: HighResolutionPoseEstimationLearner.__first_pass(self, net, img) ``` -This method is used for extracting from the input image a heatmap about human locations in the picture. +This method is used for extracting a heatmap from the input image about human locations in the picture. Parameters: @@ -148,8 +148,8 @@ Parameters: HighResolutionPoseEstimationLearner.__second_pass(self, net, img, net_input_height_size, max_width, stride, upsample_ratio, pad_value, img_mean, img_scale) ``` -On this method it is carried out the second inference step which estimates the human poses on the image that is inserted. -Following the steps of the proposed method this image should be the cropped part of the initial high resolution image that came out from taking into account the area of interest of heatmap generation. +On this method the second inference step is carried out, which estimates the human poses on the image that is provided. +Following the steps of the proposed method this image should be the cropped part of the initial high resolution image that came out from taking into account the area of interest of the heatmap generated. Parameters: @@ -253,7 +253,7 @@ The experiments are conducted on a 1080p image. | OpenDR - Full | 2.9 | 83.1 | 11.2 | 13.5 | -#### Lightweght OpenPoseWithout resizing +#### Lightweight OpenPoseWithout resizing | Method | CPU i7-9700K (FPS) | RTX 2070 (FPS) | Jetson TX2 (FPS) | Xavier NX (FPS) | |-------------------|--------------------|-----------------|------------------|-----------------| | OpenDR - Baseline | 0.05 | 2.6 | 0.3 | 0.5 | @@ -270,7 +270,7 @@ The experiments are conducted on a 1080p image. | HRPoseEstim - H+S | 8.2 | 25.9 | 3.6 | 5.5 | | HRPoseEstim - Full | 10.9 | 31.7 | 4.8 | 6.9 | -As it is shown in the previous Table, OpenDR Lightweight OpenPose achieves higher FPS when it is resing the input image into 256 pixels. +As it is shown in the previous tables, OpenDR Lightweight OpenPose achieves higher FPS when it is resizing the input image into 256 pixels. It is easier to process that image, but as it is shown in the next tables the method falls apart when it comes to accuracy and there are no detections. We have evaluated the effect of using different inference settings, namely: @@ -282,7 +282,7 @@ We have evaluated the effect of using different inference settings, namely: - *HRPoseEstim - Full*, which refers to combining all three available optimization. was used as input to the models. -The average precision and average recall on the COCO evaluation split is also reported in the Table below: +The average precision and average recall on the COCO evaluation split is also reported in the tables below: #### Lightweight OpenPose with resizing diff --git a/projects/python/perception/pose_estimation/high_resolution_pose_estimation/README.md b/projects/python/perception/pose_estimation/high_resolution_pose_estimation/README.md index 75e19ef104..da48ec4088 100644 --- a/projects/python/perception/pose_estimation/high_resolution_pose_estimation/README.md +++ b/projects/python/perception/pose_estimation/high_resolution_pose_estimation/README.md @@ -6,6 +6,6 @@ More specifically, the applications provided are: 1. demos/inference_demo.py: A tool that demonstrates how to perform inference on a single high resolution image and then draw the detected poses. 2. demos/eval_demo.py: A tool that demonstrates how to perform evaluation using the High Resolution Pose Estimation algorithm on 720p, 1080p and 1440p datasets. -3. demos/benchmarking.py: A simple benchmarking tool for measuring the performance of High Resolution Pose Estimation in various platforms. +3. demos/benchmarking_demo.py: A simple benchmarking tool for measuring the performance of High Resolution Pose Estimation in various platforms. diff --git a/projects/python/perception/pose_estimation/high_resolution_pose_estimation/demos/benchmarking_demo.py b/projects/python/perception/pose_estimation/high_resolution_pose_estimation/demos/benchmarking_demo.py index e819b99566..33142e7c99 100644 --- a/projects/python/perception/pose_estimation/high_resolution_pose_estimation/demos/benchmarking_demo.py +++ b/projects/python/perception/pose_estimation/high_resolution_pose_estimation/demos/benchmarking_demo.py @@ -23,7 +23,6 @@ if __name__ == '__main__': parser = argparse.ArgumentParser() - parser.add_argument("--onnx", help="Use ONNX", default=False, action="store_true") parser.add_argument("--device", help="Device to use (cpu, cuda)", type=str, default="cuda") parser.add_argument("--accelerate", help="Enables acceleration flags (e.g., stride)", default=False, action="store_true") @@ -32,7 +31,7 @@ args = parser.parse_args() - onnx, device, accelerate, base_height1, base_height2 = args.onnx, args.device, args.accelerate,\ + device, accelerate, base_height1, base_height2 = args.device, args.accelerate,\ args.height1, args.height2 if device == 'cpu': @@ -61,15 +60,11 @@ image_path = join("temp", "dataset", "image", "000000000785_1080.jpg") img = cv2.imread(image_path) - if onnx: - pose_estimator.optimize() - fps_list = [] print("Benchmarking...") for i in tqdm(range(50)): start_time = time.perf_counter() # Perform inference - poses = pose_estimator.infer(img) end_time = time.perf_counter() diff --git a/projects/python/perception/pose_estimation/high_resolution_pose_estimation/demos/eval_demo.py b/projects/python/perception/pose_estimation/high_resolution_pose_estimation/demos/eval_demo.py index bdc088fb03..9f71e0bd5c 100644 --- a/projects/python/perception/pose_estimation/high_resolution_pose_estimation/demos/eval_demo.py +++ b/projects/python/perception/pose_estimation/high_resolution_pose_estimation/demos/eval_demo.py @@ -21,7 +21,6 @@ if __name__ == '__main__': parser = argparse.ArgumentParser() - parser.add_argument("--onnx", help="Use ONNX", default=False, action="store_true") parser.add_argument("--device", help="Device to use (cpu, cuda)", type=str, default="cuda") parser.add_argument("--accelerate", help="Enables acceleration flags (e.g., stride)", default=False, action="store_true") @@ -30,7 +29,7 @@ args = parser.parse_args() - onnx, device, accelerate, base_height1, base_height2 = args.onnx, args.device, args.accelerate,\ + device, accelerate, base_height1, base_height2 = args.device, args.accelerate,\ args.height1, args.height2 if accelerate: @@ -50,9 +49,6 @@ pose_estimator.download(path=".", verbose=True) pose_estimator.load("openpose_default") - if onnx: - pose_estimator.optimize() - # Download a sample dataset pose_estimator.download(path=".", mode="test_data") diff --git a/projects/python/perception/pose_estimation/high_resolution_pose_estimation/demos/inference_demo.py b/projects/python/perception/pose_estimation/high_resolution_pose_estimation/demos/inference_demo.py index 76b8c5e881..f1b588d943 100644 --- a/projects/python/perception/pose_estimation/high_resolution_pose_estimation/demos/inference_demo.py +++ b/projects/python/perception/pose_estimation/high_resolution_pose_estimation/demos/inference_demo.py @@ -22,7 +22,6 @@ if __name__ == '__main__': parser = argparse.ArgumentParser() - parser.add_argument("--onnx", help="Use ONNX", default=False, action="store_true") parser.add_argument("--device", help="Device to use (cpu, cuda)", type=str, default="cuda") parser.add_argument("--accelerate", help="Enables acceleration flags (e.g., stride)", default=False, action="store_true") @@ -31,7 +30,7 @@ args = parser.parse_args() - onnx, device, accelerate, base_height1, base_height2 = args.onnx, args.device, args.accelerate,\ + device, accelerate, base_height1, base_height2 = args.device, args.accelerate,\ args.height1, args.height2 if accelerate: @@ -57,9 +56,6 @@ img = Image.open(image_path) - if onnx: - pose_estimator.optimize() - poses = pose_estimator.infer(img) img_cv = img.opencv() diff --git a/src/opendr/perception/pose_estimation/hr_pose_estimation/high_resolution_learner.py b/src/opendr/perception/pose_estimation/hr_pose_estimation/high_resolution_learner.py index 49ea5e3b2b..c94e6b0ee0 100644 --- a/src/opendr/perception/pose_estimation/hr_pose_estimation/high_resolution_learner.py +++ b/src/opendr/perception/pose_estimation/hr_pose_estimation/high_resolution_learner.py @@ -159,7 +159,8 @@ def __second_pass(self, net, img, net_input_height_size, max_width, stride, upsa return heatmaps, pafs, scale, pad - def __pooling(self, img, kernel): # Pooling on input image for dimension reduction + @staticmethod + def __pooling(img, kernel): # Pooling on input image for dimension reduction """This method applies a pooling filter on an input image in order to resize it in a fixed shape :param img: input image for resizing @@ -173,12 +174,15 @@ def __pooling(self, img, kernel): # Pooling on input image for dimension reduct pool_img = pool_img.squeeze(0).permute(1, 2, 0).cpu().float().numpy() return pool_img - def fit(self, dataset, val_dataset=None, logging_path='', silent=True, verbose=True): + def fit(self, dataset, val_dataset=None, logging_path='', logging_flush_secs=30, + silent=False, verbose=True, epochs=None, use_val_subset=True, val_subset_size=250, + images_folder_name="train2017", annotations_filename="person_keypoints_train2017.json", + val_images_folder_name="val2017", val_annotations_filename="person_keypoints_val2017.json"): """This method is not used in this implementation.""" raise NotImplementedError - def optimize(self, target_device): + def optimize(self, do_constant_folding=False): """This method is not used in this implementation.""" raise NotImplementedError @@ -187,11 +191,11 @@ def reset(self): """This method is not used in this implementation.""" return NotImplementedError - def save(self, path): + def save(self, path, verbose=False): """This method is not used in this implementation.""" return NotImplementedError - def eval(self, dataset, silent=False, verbose=True, use_subset=True, subset_size=250, upsample_ratio=4, + def eval(self, dataset, silent=False, verbose=True, use_subset=True, subset_size=250, upsample_ratio=4, images_folder_name="val2017", annotations_filename="person_keypoints_val2017.json"): """ This method is used to evaluate a trained model on an evaluation dataset. @@ -222,7 +226,7 @@ def eval(self, dataset, silent=False, verbose=True, use_subset=True, subset_siz :rtype: dict """ - data = super(HighResolutionPoseEstimationLearner, + data = super(HighResolutionPoseEstimationLearner, # NOQA self)._LightweightOpenPoseLearner__prepare_val_dataset(dataset, use_subset=use_subset, subset_name="val_subset.json", subset_size=subset_size, @@ -287,13 +291,13 @@ def eval(self, dataset, silent=False, verbose=True, use_subset=True, subset_siz max_width = w kernel = int(h / self.first_pass_height) if kernel > 0: - pool_img = HighResolutionPoseEstimationLearner.__pooling(self, img, kernel) + pool_img = self.__pooling(img, kernel) else: pool_img = img # ------- Heatmap Generation ------- - avg_pafs = HighResolutionPoseEstimationLearner.__first_pass(self, self.model, pool_img) + avg_pafs = self.__first_pass(self.model, pool_img) avg_pafs = avg_pafs.astype(np.float32) pafs_map = cv2.blur(avg_pafs, (5, 5)) @@ -345,11 +349,9 @@ def eval(self, dataset, silent=False, verbose=True, use_subset=True, subset_siz h, w, _ = crop_img.shape # ------- Second pass of the image, inference for pose estimation ------- - avg_heatmaps, avg_pafs, scale, pad = \ - HighResolutionPoseEstimationLearner.__second_pass(self, - self.model, crop_img, - self.second_pass_height, max_width, - self.stride, upsample_ratio) + avg_heatmaps, avg_pafs, scale, pad = self.__second_pass(self.model, crop_img, + self.second_pass_height, max_width, + self.stride, upsample_ratio) total_keypoints_num = 0 all_keypoints_by_type = [] for kpt_idx in range(18): @@ -396,7 +398,7 @@ def eval(self, dataset, silent=False, verbose=True, use_subset=True, subset_siz if self.visualize: for keypoints in coco_keypoints: for idx in range(len(keypoints) // 3): - cv2.circle(img, (int(keypoints[idx * 3]+offset), int(keypoints[idx * 3 + 1])+offset), + cv2.circle(img, (int(keypoints[idx * 3] + offset), int(keypoints[idx * 3 + 1]) + offset), 3, (255, 0, 255), -1) cv2.imshow('keypoints', img) key = cv2.waitKey() @@ -461,12 +463,12 @@ def infer(self, img, upsample_ratio=4, stride=8, track=True, smooth=True, kernel = int(h / self.first_pass_height) if kernel > 0: - pool_img = HighResolutionPoseEstimationLearner.__pooling(self, img, kernel) + pool_img = self.__pooling(img, kernel) else: pool_img = img # ------- Heatmap Generation ------- - avg_pafs = HighResolutionPoseEstimationLearner.__first_pass(self, self.model, pool_img) + avg_pafs = self.__first_pass(self.model, pool_img) avg_pafs = avg_pafs.astype(np.float32) pafs_map = cv2.blur(avg_pafs, (5, 5)) @@ -517,10 +519,9 @@ def infer(self, img, upsample_ratio=4, stride=8, track=True, smooth=True, h, w, _ = crop_img.shape # ------- Second pass of the image, inference for pose estimation ------- - avg_heatmaps, avg_pafs, scale, pad = \ - HighResolutionPoseEstimationLearner.__second_pass(self, self.model, crop_img, - self.second_pass_height, - max_width, stride, upsample_ratio) + avg_heatmaps, avg_pafs, scale, pad = self.__second_pass(self.model, crop_img, + self.second_pass_height, + max_width, stride, upsample_ratio) total_keypoints_num = 0 all_keypoints_by_type = []