HDFS-17352. Add configuration to control whether DN delete this replica from disk when client requests a missing block #6559

haiyang1987 · 2024-02-16T13:49:07Z

Description of PR

https://issues.apache.org/jira/browse/HDFS-17352

As discussed at #6464 (comment)
we should add configuration to control whether DN delete this replica from disk when client requests a missing block.

…ca from disk when client requests a missing block

ZanderXu · 2024-02-19T02:33:21Z

...-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java

-    // So remove if from volume map notify namenode is ok.
+    // If checkFiles as true will check block or meta file existence again.
+    // If deleteCorruptReplicaFromDisk as true will delete the actual on-disk block and meta file,
+    // otherwise will remove it from volume map and notify namenode.


If checkFiles is true, the existence of the block and metafile will be checked again. If deleteCorruptReplicaFromDisk is true, delete the existing block or metafile directly, otherwise just remove them from the memory volumeMap.

ZanderXu · 2024-02-19T02:58:21Z

...-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java

+              .notifyNamenodeDeletedBlock(extendedBlock, replica.getStorageUuid());
+          invalidate(bpid, new Block[] {extendedBlock.getLocalBlock()});
+        } else {
+          volumeMap.remove(bpid, block);


Please add some comments to describe the necessity of the else logic.

Modify patch based on comments.

Hi @ZanderXu please help review it again, thanks~

ZanderXu

LGTM +1

haiyang1987 · 2024-02-21T06:27:51Z

The result of the CI is here. The failed tests seem not to be related.

https://ci-hadoop.apache.org/blue/organizations/jenkins/hadoop-multibranch/detail/PR-6559/3/tests

tomscut · 2024-02-21T12:32:58Z

hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java

@@ -188,6 +188,10 @@ public class DFSConfigKeys extends CommonConfigurationKeys {
  public static final long DFS_DN_CACHED_DFSUSED_CHECK_INTERVAL_DEFAULT_MS =
      600000;

+  public static final String DFS_DATANODE_DELETE_CORRUPT_REPLICA_FROM_DISK_ENABLE =


It would be handy if this could be configured dynamically.

Thanks @tomscut for your review.
I wll support dynamically configured later.

haiyang1987 · 2024-02-22T11:27:26Z

Update PR to support dynamically configured.
Hi @ZanderXu @tomscut please help review it again, thanks~

hadoop-yetus · 2024-02-24T07:32:46Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 22s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+0 🆗	xmllint	0m 1s		xmllint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 5 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	50m 31s		trunk passed
+1 💚	compile	0m 41s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 40s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	checkstyle	0m 40s		trunk passed
+1 💚	mvnsite	0m 47s		trunk passed
+1 💚	javadoc	0m 45s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	1m 2s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	1m 42s		trunk passed
+1 💚	shadedclient	20m 41s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 37s		the patch passed
+1 💚	compile	0m 36s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 36s		the patch passed
+1 💚	compile	0m 34s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	javac	0m 34s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 31s		the patch passed
+1 💚	mvnsite	0m 37s		the patch passed
+1 💚	javadoc	0m 32s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 59s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	1m 43s		the patch passed
+1 💚	shadedclient	20m 30s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
-1 ❌	unit	204m 3s	/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs in the patch passed.
+1 💚	asflicense	0m 30s		The patch does not generate ASF License warnings.
		310m 8s

Reason	Tests
Failed junit tests	hadoop.hdfs.server.datanode.TestLargeBlockReport
	hadoop.hdfs.server.datanode.TestDirectoryScanner
	hadoop.hdfs.protocol.TestBlockListAsLongs
	hadoop.hdfs.tools.TestDFSAdmin

Subsystem	Report/Notes
Docker	ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6559/5/artifact/out/Dockerfile
GITHUB PR	#6559
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname	Linux e1573661312f 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `cada170`
Default Java	Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6559/5/testReport/
Max. process+thread count	4349 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6559/5/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

haiyang1987 · 2024-02-24T11:22:59Z

The failed tests seem not to be related.

haiyang1987 · 2024-02-27T02:26:21Z

Hi @ZanderXu @tomscut @zhangshuyan0 @tasanuma please help me review again this PR when you are free, thanks ~

zhangshuyan0 · 2024-02-27T03:27:36Z

hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml

@@ -3982,6 +3982,17 @@
  </description>
 </property>

+<property>
+  <name>dfs.datanode.delete.corrupt.replica.from.disk.enable</name>
+  <value>true</value>


If the default value is true, there is a risk of block missing according to HDFS-16985. I suggest setting the default value to false, as block missing is a more serious problem than disk file deletion delay. What's your opion?

Thanks @zhangshuyan0 for your comment.

From the DataNode point of view, if already confirmed the meta file or data file is lost. it should be deleted directly from the memory and disk and this is expected behavior.

For HDFS-16985 mentioned, if the current cluster deployment adopts the AWS EC2 + EBS solution, can adjust dfs.datanode.delete.corrupt.replica.from.disk.enable is false as needed to avoid deleting disk data.

So I think it might be better from datanode perspective default to set dfs.datanode.delete.corrupt.replica.from.disk.enable to true

looking forward to your suggestions again.

Hi @zhangshuyan0 Would you mind look it again, thanks~

haiyang1987 · 2024-03-12T10:43:08Z

Hi @Hexiaoqiao Sir would you mind look it again, thanks~

Hexiaoqiao · 2024-03-13T02:21:13Z

Thanks involve me here. I think @zhangshuyan0 should be more professional about this improvement. Let's wait her/his feedback.

haiyang1987 · 2024-03-13T02:31:05Z

Thanks involve me here. I think @zhangshuyan0 should be more professional about this improvement. Let's wait her/his feedback.

ok, thanks for your comment.

haiyang1987 · 2024-03-29T08:57:24Z

Hi @zhangshuyan0 please help me review again this PR when you are free, thanks ~

hadoop-yetus · 2024-04-25T07:26:13Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
			_ Prechecks _
+1 💚	dupname	0m 02s		No case conflicting files found.
+0 🆗	spotbugs	0m 00s		spotbugs executables are not available.
+0 🆗	codespell	0m 01s		codespell was not available.
+0 🆗	detsecrets	0m 01s		detect-secrets was not available.
+0 🆗	xmllint	0m 01s		xmllint was not available.
+1 💚	@author	0m 00s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 00s		The patch appears to include 5 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	97m 05s		trunk passed
+1 💚	compile	6m 28s		trunk passed
+1 💚	checkstyle	5m 07s		trunk passed
+1 💚	mvnsite	7m 08s		trunk passed
+1 💚	javadoc	6m 36s		trunk passed
+1 💚	shadedclient	153m 07s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	4m 48s		the patch passed
+1 💚	compile	3m 42s		the patch passed
+1 💚	javac	3m 42s		the patch passed
+1 💚	blanks	0m 00s		The patch has no blanks issues.
+1 💚	checkstyle	2m 36s		the patch passed
+1 💚	mvnsite	4m 29s		the patch passed
+1 💚	javadoc	3m 50s		the patch passed
+1 💚	shadedclient	167m 08s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	asflicense	5m 41s		The patch does not generate ASF License warnings.
		445m 34s

Subsystem	Report/Notes
GITHUB PR	#6559
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname	MINGW64_NT-10.0-17763 d777404a85ec 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys
Build tool	maven
Personality	/c/hadoop/dev-support/bin/hadoop.sh
git revision	trunk / `cada170`
Default Java	Azul Systems, Inc.-1.8.0_332-b09
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6559/1/testReport/
modules	C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6559/1/console
versions	git=2.44.0.windows.1
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

HDFS-17352. Add configuration to control whether DN delete this repli…

6cedc3a

…ca from disk when client requests a missing block

github-actions bot added HDFS trunk labels Feb 16, 2024

trigger ci

9297f23

ZanderXu reviewed Feb 19, 2024

View reviewed changes

HDFS-17352. Modify patch based on comments

c9d6c88

ZanderXu approved these changes Feb 20, 2024

View reviewed changes

tomscut reviewed Feb 21, 2024

View reviewed changes

HDFS-17352. Modify patch based on comments

44dd0ca

slfan1989 mentioned this pull request Feb 23, 2024

HADOOP-19071. Update maven-surefire-plugin from 3.0.0 to 3.2.5. #6537

Merged

4 tasks

trigger ci

cada170

zhangshuyan0 reviewed Feb 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDFS-17352. Add configuration to control whether DN delete this replica from disk when client requests a missing block #6559

HDFS-17352. Add configuration to control whether DN delete this replica from disk when client requests a missing block #6559

haiyang1987 commented Feb 16, 2024

ZanderXu Feb 19, 2024

ZanderXu Feb 19, 2024

haiyang1987 Feb 19, 2024

ZanderXu left a comment

haiyang1987 commented Feb 21, 2024

tomscut Feb 21, 2024

haiyang1987 Feb 22, 2024

haiyang1987 commented Feb 22, 2024

hadoop-yetus commented Feb 24, 2024

haiyang1987 commented Feb 24, 2024

haiyang1987 commented Feb 27, 2024

zhangshuyan0 Feb 27, 2024

haiyang1987 Feb 28, 2024

haiyang1987 Mar 7, 2024

haiyang1987 commented Mar 12, 2024

Hexiaoqiao commented Mar 13, 2024

haiyang1987 commented Mar 13, 2024

haiyang1987 commented Mar 29, 2024

hadoop-yetus commented Apr 25, 2024

HDFS-17352. Add configuration to control whether DN delete this replica from disk when client requests a missing block #6559

Are you sure you want to change the base?

HDFS-17352. Add configuration to control whether DN delete this replica from disk when client requests a missing block #6559

Conversation

haiyang1987 commented Feb 16, 2024

Description of PR

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZanderXu left a comment

Choose a reason for hiding this comment

haiyang1987 commented Feb 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

haiyang1987 commented Feb 22, 2024

hadoop-yetus commented Feb 24, 2024

haiyang1987 commented Feb 24, 2024

haiyang1987 commented Feb 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

haiyang1987 commented Mar 12, 2024

Hexiaoqiao commented Mar 13, 2024

haiyang1987 commented Mar 13, 2024

haiyang1987 commented Mar 29, 2024

hadoop-yetus commented Apr 25, 2024