-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
When deleting layers, move referenced data #8437
base: master
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThis pull request introduces changes to ensure that when a dataset or layer is deleted, any referenced layers are preserved by relocating them instead of being permanently removed. The documentation and routing have been updated accordingly. Unused case classes were removed from controllers, and new methods were added for path retrieval and symlink management. In the models and services layers, data types and method signatures have been updated to support the new relocation behavior. Additionally, significant enhancements were made to the dataset deletion process in the datastore, including robust symlink handling and client integrations. Changes
Assessment against linked issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (6)
docs/datasets/settings.md (1)
74-74
: Documentation update matches changelogThe added documentation clearly explains the behavior when deleting datasets with referenced layers, providing necessary context for users.
There's a minor grammatical issue with the comma placement.
-When other datasets reference layers from this dataset, WEBKNOSSOS will try to move these layers to the dataset they are referenced in, so that it can still be accessed. +When other datasets reference layers from this dataset, WEBKNOSSOS will try to move these layers to the dataset they are referenced in so that it can still be accessed.🧰 Tools
🪛 LanguageTool
[typographical] ~74-~74: The conjunction “so that” does not have a comma in front.
Context: ...rs to the dataset they are referenced in, so that it can still be accessed. ![Dataset Ed...(SO_THAT_UNNECESSARY_COMMA)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (5)
42-59
: Robust trash-move method.
Method gracefully handles non-existing paths and relocates existing datasets to the trash location. Consider concurrency checks if multiple deletions target the same dataset simultaneously.
72-92
: Symlink handling logic inmoveSymlinks
.
Overall approach to fetch and handle symlinks is sound. Verify concurrency if multiple calls run in parallel.
105-109
: Relativizing symlink paths.
Implementation is standard. Ensure cross-filesystem edge cases are documented or handled.
118-149
:updateDatasourceProperties
modifies local JSON.
Deletion-then-write approach can risk partial updates if an exception occurs between these operations. Consider fail-safe or transactional writes if feasible.
230-276
:handleLayerSymlinks
fallback logic.
Falls back to moving individual mags if no fully linked layer is detected. Code is large; a dedicated helper class or modular approach may increase readability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
CHANGELOG.unreleased.md
(1 hunks)app/controllers/DatasetController.scala
(0 hunks)app/controllers/WKRemoteDataStoreController.scala
(4 hunks)app/models/dataset/Dataset.scala
(4 hunks)app/models/dataset/DatasetService.scala
(2 hunks)conf/webknossos.latest.routes
(1 hunks)docs/datasets/settings.md
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/MagLinkInfo.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataServiceHolder.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
(3 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingService.scala
(1 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala
(1 hunks)
💤 Files with no reviewable changes (1)
- app/controllers/DatasetController.scala
🧰 Additional context used
🧬 Code Definitions (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1) (1)
DSRemoteWebknossosClient
(56-195)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceRepository.scala (1) (1)
DataSourceRepository
(15-60)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (1) (1)
DataSourceService
(30-338)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1) (1)
remoteWKClient
(152-152)
app/controllers/WKRemoteDataStoreController.scala (1)
app/models/dataset/Dataset.scala (3) (3)
findOne
(200-207)findOneByDirectoryNameAndOrganization
(408-421)findAllForDataset
(897-907)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/MagLinkInfo.scala (1)
app/models/dataset/Dataset.scala (1) (1)
dataSourceId
(89-89)
🪛 LanguageTool
docs/datasets/settings.md
[typographical] ~74-~74: The conjunction “so that” does not have a comma in front.
Context: ...rs to the dataset they are referenced in, so that it can still be accessed. ![Dataset Ed...
(SO_THAT_UNNECESSARY_COMMA)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: circleci_build
🔇 Additional comments (33)
CHANGELOG.unreleased.md (1)
21-21
: Concise and descriptive changelog entryThe changelog entry clearly explains the improved behavior for deleting datasets/layers, which aligns well with the PR objectives.
conf/webknossos.latest.routes (1)
112-112
:✅ Verification successful
New route for accessing directory paths
This new endpoint appears to support the layer reference management functionality, allowing retrieval of paths for specific organization/directory combinations in a datastore.
🏁 Script executed:
#!/bin/bash # Check if there's a matching controller implementation for this new route rg -A 3 -B 3 "def getPaths" --glob "*.scala"Length of output: 1233
Verified new endpoint for directory path access
I confirmed that the route in
conf/webknossos.latest.routes
now correctly maps to thegetPaths
method inapp/controllers/WKRemoteDataStoreController.scala
. The shell script output verifies that thegetPaths
controller method (acceptingname
,key
,organizationId
, anddirectoryName
as parameters) exists and is properly set up. This endpoint supports the intended layer reference management functionality for retrieving directory paths by organization and directory.webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala (1)
72-72
:✅ Verification successful
Updated BinaryDataService constructor with additional parameter
The constructor call has been updated to include an additional
None
parameter, likely corresponding to a newremoteWKClient
parameter in theBinaryDataService
class. This change supports the enhanced functionality for handling referenced layers.
🏁 Script executed:
#!/bin/bash # Check the signature of the BinaryDataService class to confirm parameter alignment rg -A 10 "class BinaryDataService" --glob "*.scala"Length of output: 3750
Verified Update: BinaryDataService Constructor Parameter
The instantiation in
VolumeTracingService.scala
now correctly includes an additionalNone
value that matches the newremoteWKClient: Option[DSRemoteWebknossosClient]
parameter inBinaryDataService
. This change is aligned with the updated constructor signature (as verified inwebknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala
) and supports the enhanced functionality for handling referenced layers.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
123-127
: New method for fetching layer paths looks goodThis new
fetchPaths
method allows retrieving linked layer path information from WebKnossos, which is essential for the PR's goal of handling referenced data during layer deletion.webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingService.scala (1)
106-106
: Updated constructor call to accommodate new parameterThe
BinaryDataService
constructor now receives an additionalNone
parameter, which likely represents the optionalremoteWKClient
needed for the new symlink resolution functionality.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (3)
20-24
: Added import for DSRemoteWebknossosClientGood improvement to explicitly import the DSRemoteWebknossosClient class.
113-114
: Added remoteWebknossosClient as dependencyThe UploadService now receives the DSRemoteWebknossosClient as a constructor parameter, enabling it to interact with the remote WebKnossos server during upload operations.
152-152
: Implemented remoteWKClient methodThis override correctly provides access to the injected remoteWebknossosClient, enabling the DatasetDeleter trait to access it when needed for resolving symlinks during layer deletions.
app/models/dataset/DatasetService.scala (3)
7-7
: Updated import to include DataSourceMagInfoThe import now correctly references DataSourceMagInfo, which is used in the updated getPathsForDataLayer method.
360-361
: Changed return type to use DataSourceMagInfoThe method signature has been updated to return
DataSourceMagInfo
instead ofDatasetMagInfo
, which likely provides more comprehensive information needed for the proper handling of symlinked layers.
369-370
: Improved filtering logic for linked pathsThe filtering logic now excludes paths from the same data source rather than exact path matches. This is a key improvement for the PR's goal of properly handling referenced data during layer deletion, as it ensures that only references to data in other datasets are considered for preservation.
app/controllers/WKRemoteDataStoreController.scala (3)
8-8
: Good addition of necessary import.The imported
LayerMagLinkInfo
andMagLinkInfo
types are required for the newgetPaths
method implementation.
51-51
: Appropriate dependency injection.Adding the
datasetLayerDAO
parameter is necessary to fetch layer information in the newgetPaths
method.
240-255
: Well-structured implementation of thegetPaths
endpoint.The method correctly:
- Validates datastore access
- Retrieves organization and dataset
- Fetches all layers for the dataset
- Gets paths for each layer
- Returns structured data in JSON format
This endpoint will be essential for handling layer symlinks when deleting datasets, as described in the PR objectives.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (1)
25-26
: Good addition of remote client parameter.Adding the
remoteWKClient
parameter provides theBinaryDataService
with the capability to communicate with remote WebKnossos instances, which will be necessary for managing symlinked data across datasets when deleting layers.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataServiceHolder.scala (2)
22-26
: Properly updated dependency injection.The constructor parameters have been neatly reformatted, and the new
remoteWebknossosClient
parameter has been correctly added to support the functionality required for layer deletion with symlink preservation.
49-50
: Correct instantiation with new parameter.The
binaryDataService
instantiation is properly updated to include the newremoteWebknossosClient
parameter, ensuring the service has all required dependencies.webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/MagLinkInfo.scala (3)
7-16
: Well-designed data model for source information.The
DataSourceMagInfo
class effectively encapsulates all necessary information about a data source, including:
- Data source identifier
- Layer name
- Magnification vector
- Path information
- Local data status
The JSON format implementation will facilitate serialization for API responses.
18-22
: Clear representation of mag links.The
MagLinkInfo
class provides a clean structure to represent relationships between a primary mag and its linked mags, which is essential for tracking symlinks between layers.
24-28
: Effective layer organization structure.The
LayerMagLinkInfo
class appropriately groups multipleMagLinkInfo
objects by layer name, providing a hierarchical structure that will be useful for thegetPaths
endpoint response.app/models/dataset/Dataset.scala (6)
8-8
: Import usage approved.
This new import forDataSourceMagInfo
is consistent with the added logic in the file.
805-814
: Comprehensive transformation logic.
rowsToMagInfos
neatly converts DB rows into typed domain objects. No issues.
815-823
: Query alignment forfindPathsForDatasetAndDatalayer
.
The joined queries and data-mapping approach seem correct. Confirm$columns
includes the columns consumed byDataSourceMagRow
.
825-832
: Similar query approach forfindAllByRealPath
.
Implementation matches the pattern infindPathsForDatasetAndDatalayer
. Looks good.
35-35
:✅ Verification successful
New slick import for custom row parsing.
Ensure all raw queries align column order with theDataSourceMagRow
fields to avoid runtime mismatches.
🏁 Script executed:
#!/bin/bash # Verify all SELECT column orders match DataSourceMagRow constructor usage rg -A5 'SELECT \$columns,'Length of output: 1130
Slick Import Verified & Column Order Confirmed
The new
slick.jdbc.GetResult
import is correctly implemented for custom row parsing. Verification confirms that the raw SELECT queries (e.g., those using$columns, _organization, directoryName
) align with theDataSourceMagRow
fields. Please ensure that any future modifications maintain this column order to avoid runtime mismatches.
793-804
:✅ Verification successful
Check column ordering in
GetResultDataSourceMagRow
.
Verify that the selected columns strictly match the constructor parameters forDataSourceMagRow
. Otherwise, misaligned columns may lead to runtime errors.
🏁 Script executed:
#!/bin/bash # Verify the SQL queries that populate DataSourceMagRow and confirm they match the constructor's field order rg -A8 'DataSourceMagRow('Length of output: 113
Attention: Additional Verification Needed for GetResultDataSourceMagRow Column Alignment
The initial verification command failed due to an unescaped parenthesis in the regex. To properly verify that the SQL queries and the constructor parameters for
DataSourceMagRow
are aligned, please run the following additional shell script:This script uses an escaped parenthesis (
\(
) to search for instances whereDataSourceMagRow
is invoked, and it also locates the class definition to verify the constructor’s parameter order. Once you review the outputs, confirm that the column order in theGetResultDataSourceMagRow
implementation strictly matches the order of parameters in theDataSourceMagRow
constructor.
🏁 Script executed:
#!/bin/bash echo "Checking instantiations in app/models/dataset/Dataset.scala:" rg -A10 "DataSourceMagRow\(" app/models/dataset/Dataset.scala echo "---------" echo "Looking for the definition of DataSourceMagRow (its constructor signature):" rg -n "case class DataSourceMagRow" .Length of output: 1511
Column ordering in
GetResultDataSourceMagRow
verified.
The verification confirms that the order of the parameters provided in theDataSourceMagRow
instantiation exactly matches the constructor signature declared in the case class (i.e.,_dataset
,dataLayerName
,mag
,path
,realPath
,hasLocalData
,_organization
, anddirectoryName
). No changes are required.webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (7)
2-9
: New imports introduced.
All newly imported classes and traits appear relevant for symlink handling, JSON parsing, and remote data source interaction.
11-12
: Liftweb imports for Box usage.
Usingtryo
andFull
maintains consistency with the existing code style. No concerns here.
64-68
: Symlink move precedes trash deletion.
IfmoveSymlinks
fails, the dataset is not deleted. Verify this ordering is intentional.
70-70
: Remote client accessor.
The optionalremoteWKClient
is straightforward. No immediate issues.
93-103
:getFullyLinkedLayers
synergy.
Nicely checks for all local mags and an intersection of linked layers. Straightforward logic.
111-117
:getMagPath
location builder.
Clear path construction for data source references. Looks fine.
150-165
:updateMagSymlinks
single-step logic.
Deletes any existing link before recreating it. This is functional, but be aware of concurrency edge cases.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, quite the complex logic 🙈 But it does seem justified, I don’t see easy ways to simplify.
I’m a little fearful of exceptions, could you have a look which of the used file functions can throw exceptions and add tryos there?
Also, maybe it makes sense to first check for write access so that a readable error message can be given. Also, what happens if for permission reasons this resymlinking does not happen. Can the deletion still go forward? What is shown to the user?
I also added a few small comments on the code.
Did no testing yet, will do that in the next round.
app/models/dataset/Dataset.scala
Outdated
path: Option[String], | ||
realPath: Option[String], | ||
hasLocalData: Boolean) | ||
case class DataSourceMagRow(_dataset: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case class DataSourceMagRow(_dataset: String, | |
case class DataSourceMagRow(_dataset: ObjectId, |
does that work?
logger.info(s"Deleting dataset by moving it from $dataSourcePath to $targetPath${if (reason.isDefined) | ||
s" because ${reason.getOrElse("")}" | ||
else "..."}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.info(s"Deleting dataset by moving it from $dataSourcePath to $targetPath${if (reason.isDefined) | |
s" because ${reason.getOrElse("")}" | |
else "..."}") | |
logger.info(s"Deleting dataset by moving it from $dataSourcePath to $targetPath${reason.map(r => s"because $r").getOrElse("...")}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I wrote this but happy to change 😄
@@ -31,23 +39,244 @@ trait DatasetDeleter extends LazyLogging with DirectoryConstants { | |||
case e: Exception => Fox.failure(s"Deleting dataset failed: ${e.toString}", Full(e)) | |||
} | |||
|
|||
def moveToTrash(organizationId: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably make everything in here either private or protected, unless it’s really necessary for it to be public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not private because it is defined in the deleteOnDisk method, apart from deleteOnDisk there are no non-private methods here.
@@ -22,7 +22,8 @@ class BinaryDataService(val dataBaseDir: Path, | |||
val agglomerateServiceOpt: Option[AgglomerateService], | |||
remoteSourceDescriptorServiceOpt: Option[RemoteSourceDescriptorService], | |||
sharedChunkContentsCache: Option[AlfuCache[String, MultiArray]], | |||
datasetErrorLoggingService: Option[DatasetErrorLoggingService])(implicit ec: ExecutionContext) | |||
datasetErrorLoggingService: Option[DatasetErrorLoggingService], | |||
val remoteWKClient: Option[DSRemoteWebknossosClient])(implicit ec: ExecutionContext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it really necessary for the binarydataservice to extend the datasetdeleter? I get the feeling it shouldn’t be involved in deletions. If not, this can also be removed again, and the client could be made non-optional in the deleter?
Edit: looks like the only usage is in def deleteOnDisk
in DataSourceController. That could be moved to the dataSourceService
instead, which could extend the deleter without any Option shenanigans. The DataSourceService even has a remoteWebknossosClient
injected already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea!
def getPathsForDataLayer(datasetId: ObjectId, | ||
layerName: String): Fox[List[(DataSourceMagInfo, List[DataSourceMagInfo])]] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could youu add a comment explaining the signature of this function? I think it’s not immediately obvious why the return type is this nested list
case Some(layersAndLinkedMags) => | ||
layersAndLinkedMags.map(layerMagLinkInfo => | ||
handleLayerSymlinks(dataSourceId, layerMagLinkInfo.layerName, layerMagLinkInfo.magLinkInfos.toList)) | ||
case None => Seq(tryo {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does the tryo {}
return? maybe we could make this more explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed because remoteWKClient is no longer optional
case Full(dataSource) => | ||
val updatedDataSource = dataSource.copy(dataLayers = dataSource.dataLayers.map { | ||
case dl: DataLayerWithMagLocators => | ||
if (dl.mags.forall(_.path.exists(_.startsWith("file://")))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (dl.mags.forall(_.path.exists(_.startsWith("file://")))) { | |
if (dl.mags.forall(_.path.exists(_.startsWith(s"${DataVaultService.schemeFile}://")))) { |
That way this will be listed as a usage of this file scheme.
case dl => dl | ||
}) | ||
// Write properties back | ||
Files.delete(propertiesPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might throw an exception, maybe we should tryo this too.
I added the suggestion tryo for updateDatasourceProperties deletion, everything else should be covered by the big tryo in handleLayerSymlinks.
Added an exception before deleting stuff. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (4)
112-144
: Robust datasource properties update with error handling.The method properly handles updating datasource properties, including checks for file existence and JSON validation. Error handling with
tryo
is used for the deletion operation.Consider wrapping the JSON write operation in a
tryo
as well:- case Full(_) => JsonHelper.jsonToFile(propertiesPath, updatedDataSource) + case Full(_) => tryo(JsonHelper.jsonToFile(propertiesPath, updatedDataSource))
146-166
: Comprehensive mag symlink update with permission checking.The method handles various cases for updating mag symlinks, including permission checking, existence verification, and special handling for locally explored datasets.
Consider wrapping the
Files.createSymbolicLink
operations intryo
blocks to handle potential failures:- Files.createSymbolicLink(linkedMagPath, relativizeSymlinkPath(targetMagPath, linkedMagPath)) + tryo(Files.createSymbolicLink(linkedMagPath, relativizeSymlinkPath(targetMagPath, linkedMagPath))) match { + case Full(_) => // success + case e => logger.warn(s"Failed to create symlink at $linkedMagPath: $e") + }
168-240
: Complex but logically sound layer migration implementation.The
moveLayer
method correctly handles moving layers and updating all related symlinks with appropriate permission checks, deletion of existing symlinks, and recreation of symlinks to the new location.The comment on line 177 raises a good question about selection criteria. Consider adding a more definitive explanation or implementing a more deterministic selection process for the target layer.
176-178
: Target selection strategy could be improved.Currently, an arbitrary target is selected from the fully linked layers. Consider implementing a more deterministic selection strategy based on specific criteria (e.g., most recent, most referenced, etc.).
- // Selection of the first one is arbitrary, is there anything to distinguish between them? - val target = fullLayerLinks.head + // Select target based on criteria like most recent access or layer with most references + val target = selectTargetLayer(fullLayerLinks)A
selectTargetLayer
method could implement logic to choose the most appropriate target based on metadata or other criteria.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
CHANGELOG.unreleased.md
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- CHANGELOG.unreleased.md
🔇 Additional comments (9)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (9)
2-13
: Well-organized imports with necessary additions.The imports have been properly organized to support the new functionality for handling symlinks and JSON processing.
43-58
: NewmoveToTrash
method handles dataset deletion correctly.The method properly handles moving datasets to the trash directory, with good error handling and logging. It checks if the dataset exists before attempting deletion and appropriately handles both cases.
64-67
: Refactored deletion process with symlink handling.The
deleteOnDisk
method now uses a clean for-comprehension to combinemoveSymlinks
andmoveToTrash
operations, ensuring that symlinks are handled before the dataset is moved to trash.
70-70
: Abstract method for remote client access.This abstract method declaration ensures implementing classes will provide access to the remote WebKnossos client, allowing for proper dependency injection.
87-97
: Clear logic for identifying fully linked layers.The method effectively determines which layers are completely linked through all magnifications, with good comments explaining the logic.
99-103
: Efficient path relativization for symlinks.The method properly converts absolute paths to relative paths for symlink creation, ensuring symlinks remain valid regardless of the absolute path to the data directory.
105-111
: Clear mag path construction.The method constructs paths to magnification directories in a clear, readable manner using appropriate path resolution.
242-297
: Comprehensive layer symlink handling with excellent error handling.The method orchestrates the handling of symlinks for layers during deletion, with proper error handling using
tryo
. The comments explaining different cases for magnifications without local data are very helpful.
74-85
:❓ Verification inconclusive
Comprehensive symlink migration coordination.
The
moveSymlinks
method effectively coordinates retrieving paths, handling layer symlinks, and updating datasource properties.Verify all symlink operations are handled within error boundaries:
🏁 Script executed:
#!/bin/bash # Check if the handling of symlink operations is properly wrapped in error handling rg -n "tryo" --context 5 DatasetDeleter.scalaLength of output: 106
Update: Verify Correct File Path & Error Handling in Symlink Migration
It appears the previous verification script referenced the wrong file path (i.e. using "DatasetDeleter.scala" directly). Please ensure that the error boundary handling (using constructs such as the custom operator
?~>
) in themoveSymlinks
method is correctly applied in the file located atwebknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala
. For clarity, please verify the following:
- The file exists at the expected location.
- Error handling for symlink operations (e.g. via
Fox.combined(exceptionBoxes.map(Fox.box2Fox)) ?~> "Failed to move symlinks"
) is in place.- Confirm that any internal wrappers (if intended) properly catch and relay errors during the symlink migration.
Once you’ve validated these points—preferably by running the search on the correct file path or manually checking the file—please update the verification accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (7)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (7)
53-54
: Missing space in log message.There should be a space between the path and the reason in the log message for better readability.
- s"Deleting dataset by moving it from $dataSourcePath to $targetPath${reason.map(r => s"because $r").getOrElse("...")}") + s"Deleting dataset by moving it from $dataSourcePath to $targetPath ${reason.map(r => s"because $r").getOrElse("...")}")
75-86
: Consider adding comprehensive logging for symlink operations.The symlink handling is complex and critical to data integrity. Consider adding more detailed logging to trace each step of the process, especially for debugging purposes.
private def moveSymlinks(organizationId: String, datasetName: String)(implicit ec: ExecutionContext) = for { dataSourceId <- Fox.successful(DataSourceId(datasetName, organizationId)) + _ = logger.info(s"Starting symlink handling for dataset: $datasetName in organization: $organizationId") layersAndLinkedMags <- remoteWebknossosClient.fetchPaths(dataSourceId) + _ = logger.info(s"Found ${layersAndLinkedMags.size} layers with linked mags to process") exceptionBoxes = layersAndLinkedMags.map(layerMagLinkInfo => handleLayerSymlinks(dataSourceId, layerMagLinkInfo.layerName, layerMagLinkInfo.magLinkInfos.toList)) _ <- Fox.combined(exceptionBoxes.map(Fox.box2Fox)) ?~> "Failed to move symlinks" affectedDataSources = layersAndLinkedMags .flatMap(_.magLinkInfos.map(m => m.linkedMags.map(_.dataSourceId))) .flatten + _ = logger.info(s"Updating properties for ${affectedDataSources.size} affected data sources") _ <- updateDatasourceProperties(affectedDataSources) + _ = logger.info(s"Successfully completed symlink handling for dataset: $datasetName") } yield ()
88-98
: Add documentation to clarify the purpose of getFullyLinkedLayers.This method performs a complex operation to find layers that fully link to all mags. Adding documentation will help future developers understand its purpose and logic.
+ /** + * Identifies layers that fully link to all mags in the list. + * A fully linked layer is one where: + * 1. All mags have local data + * 2. There's at least one dataset/layer combination that links to every mag + * + * @param linkedMags List of mag link information + * @return Option containing sequences of (DataSourceId, layerName) pairs that fully link to all mags, or None if no such layers exist + */ private def getFullyLinkedLayers(linkedMags: List[MagLinkInfo]): Option[Seq[(DataSourceId, String)]] = { val allMagsLocal = linkedMags.forall(_.mag.hasLocalData) val allLinkedDatasetLayers = linkedMags.map(_.linkedMags.map(lm => (lm.dataSourceId, lm.dataLayerName))) // Get combinations of datasourceId, layerName that link to EVERY mag val linkedToByAllMags = allLinkedDatasetLayers.reduce((a, b) => a.intersect(b)) if (allMagsLocal && linkedToByAllMags.nonEmpty) { Some(linkedToByAllMags) } else { None } }
100-104
: Simplify the relativizeSymlinkPath method.The method can be simplified by removing the intermediate variable and directly returning the result.
private def relativizeSymlinkPath(targetPath: Path, originPath: Path): Path = { val absoluteTargetPath = targetPath.toAbsolutePath - val relativeTargetPath = originPath.getParent.toAbsolutePath.relativize(absoluteTargetPath) - relativeTargetPath + originPath.getParent.toAbsolutePath.relativize(absoluteTargetPath) }
169-246
: Add high-level documentation for the complex moveLayer method.This method handles multiple complex scenarios and would benefit from clear documentation that explains its purpose and the different cases it handles.
+ /** + * Moves a layer from one dataset to another when it's fully linked by other datasets. + * This operation involves: + * 1. Identifying a target dataset/layer to move the source layer to + * 2. Moving the physical layer directory to the target location + * 3. Updating all symlinks that were pointing to the original layer + * 4. Handling individual mag symlinks for partially linked layers + * + * @param sourceDataSource The data source ID of the layer being moved + * @param sourceLayer The name of the layer being moved + * @param fullLayerLinks List of dataset/layer pairs that fully link to this layer + * @param layerMags Information about mags in this layer and their links + */ private def moveLayer(sourceDataSource: DataSourceId, sourceLayer: String, fullLayerLinks: Seq[(DataSourceId, String)], layerMags: List[MagLinkInfo]): Unit = {
191-192
: Improve log message clarity.The log message could be more clear about the purpose of the operation being performed.
- s"Found complete symlinks to layer; Moving layer $sourceLayer from $sourceDataSource to $moveToDataSource/$moveToDataLayer") + s"Found complete symlinks to layer. Moving layer $sourceLayer from dataset $sourceDataSource to dataset $moveToDataSource layer $moveToDataLayer to preserve data")
248-303
: Consider breaking down handleLayerSymlinks into smaller functions.This method is quite complex and handles multiple scenarios. Breaking it down into smaller, focused functions would improve readability and maintainability.
Consider refactoring this large method into smaller helper methods:
- A method for the fully linked layer case
- A method for handling individual mag symlinks
- A method specifically for the local data movement
This would make the code more maintainable and easier to test.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala
(2 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala (2)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (1)
failure
(67-71)util/src/main/scala/com/scalableminds/util/time/Instant.scala (1)
toString
(15-15)
🔇 Additional comments (6)
util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala (2)
95-99
: LGTM: Well-structured new method for exception formatting.The addition of the
firstException
method provides a clean way to extract and format the exception message from aFailure
instance. This follows the established pattern of helper methods in this trait and will improve error reporting by displaying exception information clearly.
117-118
: Enhanced error reporting with improved message structure.The change to include the first exception at the beginning of the formatted message provides better context when debugging failures. This will be particularly useful when handling errors during the layer deletion process mentioned in the PR objectives, making it easier to diagnose issues with symlink operations and data movement.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (4)
65-68
: LGTM: Good error handling in the deletion process.The for-comprehension with appropriate error messages ensures that symlink handling failures are properly reported before attempting to move the dataset to trash.
147-167
: LGTM: Good permission checks before symlink operations.The method correctly checks write permissions before attempting to modify symlinks, which will help prevent partial operations and provide clear error messages to users.
2-10
: LGTM: Good import organization.The imports are well organized, clearly separating the different types of dependencies needed for the symlink handling functionality.
75-145
:❓ Verification inconclusive
Verify proper error recovery in symlink handling.
The symlink handling involves multiple filesystem operations. Ensure that the system can recover properly if some operations fail in the middle of the process, particularly in cases of permission issues or network failures when dealing with remote resources.
🏁 Script executed:
#!/bin/bash # Check for exception handling patterns in the codebase # Look for similar error handling patterns in other parts of the codebase echo "Checking for tryo usage patterns in other deletion operations:" rg "tryo.*\{.*delete" --type scala echo "Checking for recovery mechanisms in similar operations:" rg "recover|fallback|onFailure" --type scala -A 2 -B 2Length of output: 65883
Action: Confirm and Document Error Recovery in Symlink Handling
The implementation in
DatasetDeleter.scala
leverages the Fox monad (with constructs like?~>
andtryo
) consistent with error handling patterns across the codebase. However, please verify that:
Partial Operation Rollback: In the
moveSymlinks
method, if some filesystem or remote operations (e.g., moving symlinks or deleting the dataset properties file) succeed while subsequent ones fail, the system does not end up in an inconsistent state. If not already handled, consider adding a rollback mechanism or documenting that partial changes are acceptable.Robust Error Wrapping: Ensure that all critical filesystem operations (such as
Files.delete
) and remote resource calls are adequately wrapped to capture exceptions related to permission issues or network failures.If the current error recovery is by design, please add documentation clarifying that these failure paths have been considered. Otherwise, adjustments may be needed to manage partial failures more gracefully.
JsonHelper.validatedJsonFromFile[DataSource](propertiesPath, dataBaseDir) match { | ||
case Full(dataSource) => | ||
val updatedDataSource = dataSource.copy(dataLayers = dataSource.dataLayers.map { | ||
case dl: DataLayerWithMagLocators => | ||
if (dl.mags.forall(_.path.exists(_.startsWith(s"${DataVaultService.schemeFile}://")))) { | ||
// Setting path to None means using resolution of layer/mag directories to access data | ||
dl.mapped(magMapping = _.copy(path = None)) | ||
} else { | ||
dl | ||
} | ||
case dl => dl | ||
}) | ||
// Write properties back | ||
tryo(Files.delete(propertiesPath)) match { | ||
case Full(_) => JsonHelper.jsonToFile(propertiesPath, updatedDataSource) | ||
case e => e | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
❓ Verification inconclusive
Add error handling for JSON parsing failures.
The method handles errors when deleting files but doesn't provide detailed error logging for JSON parsing failures. Consider adding more specific error handling for this case.
🏁 Script executed:
#!/bin/bash
# Check if there are any JSON related failures in the logs
# Look for JSON parsing errors in the logs
rg -i "json.*error|failed.*json|invalid.*json" --type log
Length of output: 208
Action Required: Enhance JSON Parsing Error Handling
-
In
DatasetDeleter.scala
(lines 123–139), the code only handles file deletion errors but does not account for JSON parsing failures from the call toJsonHelper.validatedJsonFromFile
. -
Please add a dedicated case in the match expression (e.g., a
Failure
branch) that logs detailed error messages when JSON parsing fails. For example:JsonHelper.validatedJsonFromFile[DataSource](propertiesPath, dataBaseDir) match { case Full(dataSource) => // … existing processing … case Failure(msg, exception, chain) => logger.error(s"JSON parsing failed for file $propertiesPath: $msg", exception) // Consider handling the failure further, e.g., returning an error response or safe fallback case _ => logger.error(s"Unexpected error during JSON parsing for file $propertiesPath.") }
-
Note: Our initial log search using
rg
did not yield any JSON parsing errors (possibly due to filtering issues), so please verify manually that any JSON parsing failures in your runtime or production environment are captured and logged appropriately.
Place this script into sample_organization to generate test datasets to test scenarios (not testing locally explored layers) with a dataset as argument that has a color layer and mags 1, 2-2-1, 4-4-1 ( I have used l4_sample with only the color layer and reduced mags to make copying faster) #!/bin/bash
set -e # Exit on error
if [ "$#" -ne 1 ]; then
echo "Usage: $0 <dataset_directory>"
exit 1
fi
SOURCE_DIR="$1"
BASE_DIR="$(dirname "$SOURCE_DIR")"
# Function to copy dataset while maintaining symlinks
copy_test_dataset() {
local src="$1"
local dest="$2"
mkdir -p "$dest"
cp -a "$src"/. "$dest" # Preserve symlinks and structure
cp "$src/datasource-properties.json" "$dest/" # Copy properties file
}
# Test Case 1: DS1 has color with local data, DS2 has color symlinked to DS1
TC1_DS1="$BASE_DIR/TC1_DS1"
TC1_DS2="$BASE_DIR/TC1_DS2"
copy_test_dataset "$SOURCE_DIR" "$TC1_DS1"
mkdir -p "$TC1_DS2" # Ensure parent directory exists
ln -s "../TC1_DS1/color" "$TC1_DS2/color"
cp "$TC1_DS1/datasource-properties.json" "$TC1_DS2/" # Copy properties file
# Test Case 2: TC2_DS1 has color with local data, TC2_DS3 has color symlinked to TC2_DS1
TC2_DS1="$BASE_DIR/TC2_DS1"
TC2_DS2="$BASE_DIR/TC2_DS2"
TC2_DS3="$BASE_DIR/TC2_DS3"
copy_test_dataset "$SOURCE_DIR" "$TC2_DS1"
mkdir -p "$TC2_DS3" # Ensure parent directory exists
ln -s "../TC2_DS1/color" "$TC2_DS3/color"
cp "$TC2_DS1/datasource-properties.json" "$TC2_DS3/" # Copy properties file
mkdir -p "$TC2_DS2" # Ensure parent directory exists
ln -s "../TC2_DS1/color" "$TC2_DS2/color"
cp "$TC2_DS1/datasource-properties.json" "$TC2_DS2/" # Copy properties file
# Test Case 3: TC3_DS1 has color with local data, TC3_DS2 has color with mags symlinked to TC3_DS1
TC3_DS1="$BASE_DIR/TC3_DS1"
TC3_DS2="$BASE_DIR/TC3_DS2"
copy_test_dataset "$SOURCE_DIR" "$TC3_DS1"
mkdir -p "$TC3_DS2/color"
for mag in "$TC3_DS1/color/"*; do
ln -s "../../$mag" "$TC3_DS2/color/$(basename "$mag")"
done
cp "$TC3_DS1/datasource-properties.json" "$TC3_DS2/" # Copy properties file
# Test Case 4: TC4_DS1 has color with local data, TC4_DS2 has one mag symlinked to TC4_DS1
TC4_DS1="$BASE_DIR/TC4_DS1"
TC4_DS2="$BASE_DIR/TC4_DS2"
copy_test_dataset "$SOURCE_DIR" "$TC4_DS1"
mkdir -p "$TC4_DS2/color"
ln -s "../../$TC4_DS1/color/1" "$TC4_DS2/color/1"
cp -r "$TC4_DS1/color/2-2-1" "$TC4_DS2/color/2-2-1"
cp -r "$TC4_DS1/color/4-4-1" "$TC4_DS2/color/4-4-1"
cp "$TC4_DS1/datasource-properties.json" "$TC4_DS2/" # Copy properties file
# Test Case 5: TC5_DS1 has color with local data, TC5_DS2 and TC5_DS3 both symlink the same mag
TC5_DS1="$BASE_DIR/TC5_DS1"
TC5_DS2="$BASE_DIR/TC5_DS2"
TC5_DS3="$BASE_DIR/TC5_DS3"
copy_test_dataset "$SOURCE_DIR" "$TC5_DS1"
mkdir -p "$TC5_DS2/color" "$TC5_DS3/color"
ln -s "../../$TC5_DS1/color/1" "$TC5_DS2/color/1"
cp -r "$TC5_DS1/color/2-2-1" "$TC5_DS2/color/2-2-1"
cp -r "$TC5_DS1/color/4-4-1" "$TC5_DS2/color/4-4-1"
ln -s "../../$TC5_DS1/color/1" "$TC5_DS3/color/1"
cp -r "$TC5_DS1/color/2-2-1" "$TC5_DS3/color/2-2-1"
cp -r "$TC5_DS1/color/4-4-1" "$TC5_DS3/color/4-4-1"
cp "$TC5_DS1/datasource-properties.json" "$TC5_DS2/" # Copy properties file
cp "$TC5_DS1/datasource-properties.json" "$TC5_DS3/" # Copy properties file
echo "Test datasets created in $BASE_DIR"
echo "To test deletion scenarios, remove the DS1 datasets"
`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
app/controllers/WKRemoteDataStoreController.scala (1)
240-255
: Consider handling potential error cases more explicitly.While using the Fox monad provides error handling, consider adding explicit error messaging for the layer retrieval operation to improve debugging and user feedback.
Action.async { implicit request => dataStoreService.validateAccess(name, key) { _ => for { organization <- organizationDAO.findOne(organizationId)(GlobalAccessContext) dataset <- datasetDAO.findOneByDirectoryNameAndOrganization(directoryName, organization._id)( GlobalAccessContext) - layers <- datasetLayerDAO.findAllForDataset(dataset._id) + layers <- datasetLayerDAO.findAllForDataset(dataset._id) ?~> + s"Failed to retrieve layers for dataset ${dataset.name} (${dataset._id})" magsAndLinkedMags <- Fox.serialCombined(layers)(l => datasetService.getPathsForDataLayer(dataset._id, l.name)) magLinkInfos = magsAndLinkedMags.map(_.map { case (mag, linkedMags) => MagLinkInfo(mag, linkedMags) }) layersAndMagLinkInfos = layers.zip(magLinkInfos).map { case (layer, magLinkInfo) => LayerMagLinkInfo(layer.name, magLinkInfo) } } yield Ok(Json.toJson(layersAndMagLinkInfos))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
CHANGELOG.unreleased.md
(1 hunks)app/controllers/WKRemoteDataStoreController.scala
(4 hunks)conf/webknossos.latest.routes
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- conf/webknossos.latest.routes
- CHANGELOG.unreleased.md
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: circleci_build
🔇 Additional comments (3)
app/controllers/WKRemoteDataStoreController.scala (3)
8-8
: LGTM: Required import for new functionality.The import for
LayerMagLinkInfo
andMagLinkInfo
is necessary to support the return type of the newgetPaths
method.
38-38
: Good addition of explicit duration import.The
import scala.concurrent.duration.DurationInt
enables the30 seconds
syntax used on line 203, making the code more readable and ensuring the implicit conversion is explicitly imported.
51-51
: LGTM: Required dependency for new functionality.Adding the
datasetLayerDAO
dependency to the controller is necessary for retrieving layer information in the newgetPaths
method, maintaining the dependency injection pattern used throughout the controller.
def getPaths(name: String, key: String, organizationId: String, directoryName: String): Action[AnyContent] = | ||
Action.async { implicit request => | ||
dataStoreService.validateAccess(name, key) { _ => | ||
for { | ||
organization <- organizationDAO.findOne(organizationId)(GlobalAccessContext) | ||
dataset <- datasetDAO.findOneByDirectoryNameAndOrganization(directoryName, organization._id)( | ||
GlobalAccessContext) | ||
layers <- datasetLayerDAO.findAllForDataset(dataset._id) | ||
magsAndLinkedMags <- Fox.serialCombined(layers)(l => datasetService.getPathsForDataLayer(dataset._id, l.name)) | ||
magLinkInfos = magsAndLinkedMags.map(_.map { case (mag, linkedMags) => MagLinkInfo(mag, linkedMags) }) | ||
layersAndMagLinkInfos = layers.zip(magLinkInfos).map { | ||
case (layer, magLinkInfo) => LayerMagLinkInfo(layer.name, magLinkInfo) | ||
} | ||
} yield Ok(Json.toJson(layersAndMagLinkInfos)) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
❓ Verification inconclusive
Ensure proper test coverage for new endpoint.
This new endpoint is critical for managing layer symlinks during deletion operations. Ensure comprehensive test coverage exists to verify its behavior with various dataset and layer configurations.
🏁 Script executed:
#!/bin/bash
# Check if tests exist for this new endpoint
rg -A 3 -B 3 "getPaths.*Action" --type scala test/
rg -A 3 -B 3 "LayerMagLinkInfo" --type scala test/
Length of output: 103
Action Required: Add Comprehensive Tests for the getPaths
Endpoint
It appears that no tests referencing the new getPaths
endpoint or its usage of LayerMagLinkInfo
were found in the test suite. Since this endpoint is critical for managing layer symlinks during deletion operations, please ensure that comprehensive tests are added to verify the behavior with various dataset and layer configurations.
- File:
app/controllers/WKRemoteDataStoreController.scala
(lines 240-255) - Areas to Cover:
- Validate that the access control (
validateAccess
) correctly gates the endpoint. - Ensure that retrievals from
organizationDAO
,datasetDAO
, anddatasetLayerDAO
behave as expected. - Test the complete mapping logic that produces
LayerMagLinkInfo
with varying dataset and layer scenarios.
- Validate that the access control (
💡 Verification agent
🧩 Analysis chain
Add documentation and consider performance optimizations.
The implementation of getPaths
is well-structured and follows the existing patterns in the controller. However, there are a few improvements to consider:
- Add method documentation explaining its purpose and usage context
- The
Fox.serialCombined
performs operations sequentially, which could be a performance bottleneck with many layers - consider if these operations could be parallelized - Ensure comprehensive error handling for all database operations
- def getPaths(name: String, key: String, organizationId: String, directoryName: String): Action[AnyContent] =
+ /**
+ * Retrieves paths for all layers in a dataset.
+ * Used to support layer symlink management when deleting layers.
+ *
+ * @param name Datastore name
+ * @param key Datastore access key
+ * @param organizationId ID of the organization owning the dataset
+ * @param directoryName Directory name of the dataset
+ * @return JSON response with LayerMagLinkInfo objects containing layer paths
+ */
+ def getPaths(name: String, key: String, organizationId: String, directoryName: String): Action[AnyContent] =
Action.async { implicit request =>
dataStoreService.validateAccess(name, key) { _ =>
for {
organization <- organizationDAO.findOne(organizationId)(GlobalAccessContext)
dataset <- datasetDAO.findOneByDirectoryNameAndOrganization(directoryName, organization._id)(
GlobalAccessContext)
layers <- datasetLayerDAO.findAllForDataset(dataset._id)
- magsAndLinkedMags <- Fox.serialCombined(layers)(l => datasetService.getPathsForDataLayer(dataset._id, l.name))
+ // Perform path lookups in parallel for better performance with many layers
+ magsAndLinkedMags <- Fox.combined(layers)(l => datasetService.getPathsForDataLayer(dataset._id, l.name))
magLinkInfos = magsAndLinkedMags.map(_.map { case (mag, linkedMags) => MagLinkInfo(mag, linkedMags) })
layersAndMagLinkInfos = layers.zip(magLinkInfos).map {
case (layer, magLinkInfo) => LayerMagLinkInfo(layer.name, magLinkInfo)
}
} yield Ok(Json.toJson(layersAndMagLinkInfos))
}
}
Consider verifying if parallel execution using Fox.combined
instead of Fox.serialCombined
is appropriate for this case:
🏁 Script executed:
#!/bin/bash
# Check implementation of Fox.serialCombined vs Fox.combined to understand differences
rg -A 5 "def serialCombined" --type scala
rg -A 5 "def combined" --type scala
# Look for similar patterns in the codebase
rg -B 2 -A 2 "Fox\.combined" --type scala
Length of output: 23365
Attention: Required changes on getPaths method for parallel execution
The updated method documentation is a good improvement, and parallelizing the data layer lookups can boost performance. However, based on our investigation:
-
Parallel Execution: The current diff replaces the sequential call
Fox.serialCombined(layers)(l => datasetService.getPathsForDataLayer(dataset._id, l.name))
with
Fox.combined(layers)(l => datasetService.getPathsForDataLayer(dataset._id, l.name))
However, the definition of
Fox.combined
in our codebase expects a sequence ofFox[T]
values (i.e. a single-argument list). To safely execute these operations in parallel, please refactor the call by mapping the function over the layers before combining the results. For example, update the call to:magsAndLinkedMags <- Fox.combined(layers.map(l => datasetService.getPathsForDataLayer(dataset._id, l.name)))
This change ensures that each lookup is executed concurrently according to
Fox.combined
’s signature. -
Error Handling: While the core logic remains sound, please verify that comprehensive error handling is applied to all database operations to avoid unhandled failures.
Please incorporate these changes to resolve the concerns.
URL of deployed dev instance (used for testing):
Steps to test:
Notes
When a layer A symlinks to another layer B that links to a layer C, and you remove layer B, the symlink at layer A is broken. That is because for storage in the DB all symlinks are completely resolved, so A real Path is C and B real Path is C, when B is removed there is no knowledge on the link from A. I hope this works for now. Solving this would probably require a new table only for paths, because every length of link-links need to be resolved.
Another problem currently is locally explored datasets (with file://). Because we can't rewrite symlinks there, I think we need to rewrite the datasource properties in this case.DoneRight now everything assumes the usual structure with layers and mags named as usual. Anything else will cause trouble.
TODOs:
All these cases need to be handled
Issues:
(Please delete unneeded items, merge only when none are left open)