-
Notifications
You must be signed in to change notification settings - Fork 9k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
HDFS-17352. Add configuration to control whether DN delete this replica from disk when client requests a missing block #6559
base: trunk
Are you sure you want to change the base?
Conversation
…ca from disk when client requests a missing block
// So remove if from volume map notify namenode is ok. | ||
// If checkFiles as true will check block or meta file existence again. | ||
// If deleteCorruptReplicaFromDisk as true will delete the actual on-disk block and meta file, | ||
// otherwise will remove it from volume map and notify namenode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If checkFiles is true, the existence of the block and metafile will be checked again.
If deleteCorruptReplicaFromDisk is true, delete the existing block or metafile directly, otherwise just remove them from the memory volumeMap.
.notifyNamenodeDeletedBlock(extendedBlock, replica.getStorageUuid()); | ||
invalidate(bpid, new Block[] {extendedBlock.getLocalBlock()}); | ||
} else { | ||
volumeMap.remove(bpid, block); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some comments to describe the necessity of the else
logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modify patch based on comments.
Hi @ZanderXu please help review it again, thanks~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1
The result of the CI is here. The failed tests seem not to be related. https://ci-hadoop.apache.org/blue/organizations/jenkins/hadoop-multibranch/detail/PR-6559/3/tests |
@@ -188,6 +188,10 @@ public class DFSConfigKeys extends CommonConfigurationKeys { | |||
public static final long DFS_DN_CACHED_DFSUSED_CHECK_INTERVAL_DEFAULT_MS = | |||
600000; | |||
|
|||
public static final String DFS_DATANODE_DELETE_CORRUPT_REPLICA_FROM_DISK_ENABLE = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be handy if this could be configured dynamically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tomscut for your review.
I wll support dynamically configured later.
💔 -1 overall
This message was automatically generated. |
The failed tests seem not to be related. |
Hi @ZanderXu @tomscut @zhangshuyan0 @tasanuma please help me review again this PR when you are free, thanks ~ |
@@ -3982,6 +3982,17 @@ | |||
</description> | |||
</property> | |||
|
|||
<property> | |||
<name>dfs.datanode.delete.corrupt.replica.from.disk.enable</name> | |||
<value>true</value> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the default value is true, there is a risk of block missing according to HDFS-16985. I suggest setting the default value to false, as block missing is a more serious problem than disk file deletion delay. What's your opion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zhangshuyan0 for your comment.
From the DataNode point of view, if already confirmed the meta file or data file is lost. it should be deleted directly from the memory and disk and this is expected behavior.
For HDFS-16985 mentioned, if the current cluster deployment adopts the AWS EC2 + EBS solution, can adjust dfs.datanode.delete.corrupt.replica.from.disk.enable
is false as needed to avoid deleting disk data.
So I think it might be better from datanode perspective default to set dfs.datanode.delete.corrupt.replica.from.disk.enable
to true
looking forward to your suggestions again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @zhangshuyan0 Would you mind look it again, thanks~
Hi @Hexiaoqiao Sir would you mind look it again, thanks~ |
Thanks involve me here. I think @zhangshuyan0 should be more professional about this improvement. Let's wait her/his feedback. |
ok, thanks for your comment. |
Hi @zhangshuyan0 please help me review again this PR when you are free, thanks ~ |
🎊 +1 overall
This message was automatically generated. |
Description of PR
https://issues.apache.org/jira/browse/HDFS-17352
As discussed at #6464 (comment)
we should add configuration to control whether DN delete this replica from disk when client requests a missing block.