Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

HDFS-17352. Add configuration to control whether DN delete this replica from disk when client requests a missing block #6559

Open
wants to merge 5 commits into
base: trunk
Choose a base branch
from

Conversation

haiyang1987
Copy link
Contributor

Description of PR

https://issues.apache.org/jira/browse/HDFS-17352

As discussed at #6464 (comment)
we should add configuration to control whether DN delete this replica from disk when client requests a missing block.

…ca from disk when client requests a missing block
// So remove if from volume map notify namenode is ok.
// If checkFiles as true will check block or meta file existence again.
// If deleteCorruptReplicaFromDisk as true will delete the actual on-disk block and meta file,
// otherwise will remove it from volume map and notify namenode.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If checkFiles is true, the existence of the block and metafile will be checked again.
If deleteCorruptReplicaFromDisk is true, delete the existing block or metafile directly, otherwise just remove them  from the memory volumeMap.

.notifyNamenodeDeletedBlock(extendedBlock, replica.getStorageUuid());
invalidate(bpid, new Block[] {extendedBlock.getLocalBlock()});
} else {
volumeMap.remove(bpid, block);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some comments to describe the necessity of the else logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modify patch based on comments.

Hi @ZanderXu please help review it again, thanks~

Copy link
Contributor

@ZanderXu ZanderXu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

@haiyang1987
Copy link
Contributor Author

The result of the CI is here. The failed tests seem not to be related.

https://ci-hadoop.apache.org/blue/organizations/jenkins/hadoop-multibranch/detail/PR-6559/3/tests

@@ -188,6 +188,10 @@ public class DFSConfigKeys extends CommonConfigurationKeys {
public static final long DFS_DN_CACHED_DFSUSED_CHECK_INTERVAL_DEFAULT_MS =
600000;

public static final String DFS_DATANODE_DELETE_CORRUPT_REPLICA_FROM_DISK_ENABLE =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be handy if this could be configured dynamically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tomscut for your review.
I wll support dynamically configured later.

@haiyang1987
Copy link
Contributor Author

Update PR to support dynamically configured.
Hi @ZanderXu @tomscut please help review it again, thanks~

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 22s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 5 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 50m 31s trunk passed
+1 💚 compile 0m 41s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 40s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 checkstyle 0m 40s trunk passed
+1 💚 mvnsite 0m 47s trunk passed
+1 💚 javadoc 0m 45s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 2s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 1m 42s trunk passed
+1 💚 shadedclient 20m 41s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 37s the patch passed
+1 💚 compile 0m 36s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 36s the patch passed
+1 💚 compile 0m 34s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 javac 0m 34s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 31s the patch passed
+1 💚 mvnsite 0m 37s the patch passed
+1 💚 javadoc 0m 32s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 59s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 1m 43s the patch passed
+1 💚 shadedclient 20m 30s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 204m 3s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 30s The patch does not generate ASF License warnings.
310m 8s
Reason Tests
Failed junit tests hadoop.hdfs.server.datanode.TestLargeBlockReport
hadoop.hdfs.server.datanode.TestDirectoryScanner
hadoop.hdfs.protocol.TestBlockListAsLongs
hadoop.hdfs.tools.TestDFSAdmin
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6559/5/artifact/out/Dockerfile
GITHUB PR #6559
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname Linux e1573661312f 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / cada170
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6559/5/testReport/
Max. process+thread count 4349 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6559/5/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@haiyang1987
Copy link
Contributor Author

The failed tests seem not to be related.

@haiyang1987
Copy link
Contributor Author

Hi @ZanderXu @tomscut @zhangshuyan0 @tasanuma please help me review again this PR when you are free, thanks ~

@@ -3982,6 +3982,17 @@
</description>
</property>

<property>
<name>dfs.datanode.delete.corrupt.replica.from.disk.enable</name>
<value>true</value>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the default value is true, there is a risk of block missing according to HDFS-16985. I suggest setting the default value to false, as block missing is a more serious problem than disk file deletion delay. What's your opion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zhangshuyan0 for your comment.

From the DataNode point of view, if already confirmed the meta file or data file is lost. it should be deleted directly from the memory and disk and this is expected behavior.

For HDFS-16985 mentioned, if the current cluster deployment adopts the AWS EC2 + EBS solution, can adjust dfs.datanode.delete.corrupt.replica.from.disk.enable is false as needed to avoid deleting disk data.

So I think it might be better from datanode perspective default to set dfs.datanode.delete.corrupt.replica.from.disk.enable to true

looking forward to your suggestions again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @zhangshuyan0 Would you mind look it again, thanks~

@haiyang1987
Copy link
Contributor Author

Hi @Hexiaoqiao Sir would you mind look it again, thanks~

@Hexiaoqiao
Copy link
Contributor

Thanks involve me here. I think @zhangshuyan0 should be more professional about this improvement. Let's wait her/his feedback.

@haiyang1987
Copy link
Contributor Author

Thanks involve me here. I think @zhangshuyan0 should be more professional about this improvement. Let's wait her/his feedback.

ok, thanks for your comment.

@haiyang1987
Copy link
Contributor Author

Hi @zhangshuyan0 please help me review again this PR when you are free, thanks ~

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
_ Prechecks _
+1 💚 dupname 0m 02s No case conflicting files found.
+0 🆗 spotbugs 0m 00s spotbugs executables are not available.
+0 🆗 codespell 0m 01s codespell was not available.
+0 🆗 detsecrets 0m 01s detect-secrets was not available.
+0 🆗 xmllint 0m 01s xmllint was not available.
+1 💚 @author 0m 00s The patch does not contain any @author tags.
+1 💚 test4tests 0m 00s The patch appears to include 5 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 97m 05s trunk passed
+1 💚 compile 6m 28s trunk passed
+1 💚 checkstyle 5m 07s trunk passed
+1 💚 mvnsite 7m 08s trunk passed
+1 💚 javadoc 6m 36s trunk passed
+1 💚 shadedclient 153m 07s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 4m 48s the patch passed
+1 💚 compile 3m 42s the patch passed
+1 💚 javac 3m 42s the patch passed
+1 💚 blanks 0m 00s The patch has no blanks issues.
+1 💚 checkstyle 2m 36s the patch passed
+1 💚 mvnsite 4m 29s the patch passed
+1 💚 javadoc 3m 50s the patch passed
+1 💚 shadedclient 167m 08s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 asflicense 5m 41s The patch does not generate ASF License warnings.
445m 34s
Subsystem Report/Notes
GITHUB PR #6559
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname MINGW64_NT-10.0-17763 d777404a85ec 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys
Build tool maven
Personality /c/hadoop/dev-support/bin/hadoop.sh
git revision trunk / cada170
Default Java Azul Systems, Inc.-1.8.0_332-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6559/1/testReport/
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6559/1/console
versions git=2.44.0.windows.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants