Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

When deleting layers, move referenced data #8437

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

frcroth
Copy link
Member

@frcroth frcroth commented Mar 10, 2025

URL of deployed dev instance (used for testing):

  • https://___.webknossos.xyz

Steps to test:

  • For all scenarios create datasets and make sure they have been scanned and reported to WK.
  • Layers (all this should work for fully linked layers, so it should not matter if the layer itself is a link or all mags in the layer are linked), note that for locally explored layers you need local non-wkw data
    • DS 1 has Layer A with local data, DS 2 has Layer B with symlink to A. Delete DS 1. Layer B now has local data.
    • DS 1 has Layer A with local data, DS 2 has Layer B with symlink to A, DS 3 has Layer C with symlink to A. Delete DS 1. Layer B (without restriction of generality) has local data, Layer C has symlink to B.
    • DS 1 has Layer A with local data, DS 2 has Layer B locally explored (file:// protocol) with link to A. Delete DS 1. Layer B now exists physically with local data, there is no remote link anymore.
    • DS 1 has layer A with local data, DS 2 has Layer B locally explored with link to A, DS 3 has Layer C with symlink to A. Delete DS 1. it is currently arbitrary which ds is selected to store local data If DS 2 is selected, this is now a regular dataset and C is symlink to B, otherwise DS 2 now symlinks to C. In any case, DS 2 is no longer a remote dataset with file protocol.
    • DS 1 has Layer A with local data, DS 2 has Layer B, which exists physically, but every mag in layer B is a symlink to the mag in Layer A. Delete DS 1. Layer B now has local data.
  • Mags
    • DS 1 has Layer A with local data. DS 2 has Layer B where one mag is a symlink to a mag of A. Delete DS 1. This mag has now been moved to B.
    • DS 1 has Layer A with local data. DS 2 has Layer B where one mag is a symlink to a mag M of A. DS 3 has layer C where one mag is a symlink to mag M. Delete DS 1. Mag M has been moved to B or C, the other has a symlink to that location.

Notes

When a layer A symlinks to another layer B that links to a layer C, and you remove layer B, the symlink at layer A is broken. That is because for storage in the DB all symlinks are completely resolved, so A real Path is C and B real Path is C, when B is removed there is no knowledge on the link from A. I hope this works for now. Solving this would probably require a new table only for paths, because every length of link-links need to be resolved.

Another problem currently is locally explored datasets (with file://). Because we can't rewrite symlinks there, I think we need to rewrite the datasource properties in this case. Done

Right now everything assumes the usual structure with layers and mags named as usual. Anything else will cause trouble.

TODOs:

All these cases need to be handled

  • Layer with symlinks to it
    • One layer is a symlink -> move layer here
    • Multiple layers with symlinks -> move layer to one, relink everything else
    • (Multiple) layers with symlinks, but also a layer where one mag symlinks
  • Mag with symlinks to it
  • Locally explored

Issues:


(Please delete unneeded items, merge only when none are left open)

Copy link
Contributor

coderabbitai bot commented Mar 10, 2025

📝 Walkthrough

Walkthrough

This pull request introduces changes to ensure that when a dataset or layer is deleted, any referenced layers are preserved by relocating them instead of being permanently removed. The documentation and routing have been updated accordingly. Unused case classes were removed from controllers, and new methods were added for path retrieval and symlink management. In the models and services layers, data types and method signatures have been updated to support the new relocation behavior. Additionally, significant enhancements were made to the dataset deletion process in the datastore, including robust symlink handling and client integrations.

Changes

File(s) Change Summary
CHANGELOG.unreleased.md, docs/datasets/settings.md, conf/...latest.routes Updated deletion documentation and added a new GET route for retrieving dataset paths.
app/controllers/DatasetController.scala, app/controllers/WKRemoteDataStoreController.scala Removed the unused MagLinkInfo; added the new getPaths method and updated controller dependencies.
app/models/dataset/Dataset.scala, app/models/dataset/DatasetService.scala Replaced DatasetMagInfo with DataSourceMagRow/DataSourceMagInfo, updating method signatures and filtering logic for dataset paths.
webknossos-datastore/app/com/scalableminds/.../DatasetDeleter.scala, .../MagLinkInfo.scala, .../BinaryDataService.scala, .../DSRemoteWebknossosClient.scala, .../UploadService.scala, .../DataSourceController.scala, .../DataSourceService.scala Enhanced deletion process to relocate referenced layers safely, added robust symlink management through new helper methods, and updated client integrations by modifying trait usage and parameter visibility.

Assessment against linked issues

Objective Addressed Explanation
Delete layers that are referenced by other datasets (#8279)

Possibly related PRs

Suggested labels

enhancement, bug, backend

Suggested reviewers

  • MichaelBuessemeyer

Poem

I hopped through code with eyes so bright,
Fixing symlinks deep into the night.
Layers now saved, securely aligned,
No dataset lost, all neatly confined.
With every hop, the bugs take flight –
A jolly rabbit sings in pure delight!
🐇


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@frcroth frcroth changed the title Handle symlinks when deleting datasets/layers When deleting layers, move referenced data Mar 17, 2025
@frcroth frcroth marked this pull request as ready for review March 19, 2025 10:47
@frcroth frcroth requested a review from fm3 March 19, 2025 10:48
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
docs/datasets/settings.md (1)

74-74: Documentation update matches changelog

The added documentation clearly explains the behavior when deleting datasets with referenced layers, providing necessary context for users.

There's a minor grammatical issue with the comma placement.

-When other datasets reference layers from this dataset, WEBKNOSSOS will try to move these layers to the dataset they are referenced in, so that it can still be accessed.
+When other datasets reference layers from this dataset, WEBKNOSSOS will try to move these layers to the dataset they are referenced in so that it can still be accessed.
🧰 Tools
🪛 LanguageTool

[typographical] ~74-~74: The conjunction “so that” does not have a comma in front.
Context: ...rs to the dataset they are referenced in, so that it can still be accessed. ![Dataset Ed...

(SO_THAT_UNNECESSARY_COMMA)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (5)

42-59: Robust trash-move method.
Method gracefully handles non-existing paths and relocates existing datasets to the trash location. Consider concurrency checks if multiple deletions target the same dataset simultaneously.


72-92: Symlink handling logic in moveSymlinks.
Overall approach to fetch and handle symlinks is sound. Verify concurrency if multiple calls run in parallel.


105-109: Relativizing symlink paths.
Implementation is standard. Ensure cross-filesystem edge cases are documented or handled.


118-149: updateDatasourceProperties modifies local JSON.
Deletion-then-write approach can risk partial updates if an exception occurs between these operations. Consider fail-safe or transactional writes if feasible.


230-276: handleLayerSymlinks fallback logic.
Falls back to moving individual mags if no fully linked layer is detected. Code is large; a dedicated helper class or modular approach may increase readability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b9f63bf and 41ba8ff.

📒 Files selected for processing (15)
  • CHANGELOG.unreleased.md (1 hunks)
  • app/controllers/DatasetController.scala (0 hunks)
  • app/controllers/WKRemoteDataStoreController.scala (4 hunks)
  • app/models/dataset/Dataset.scala (4 hunks)
  • app/models/dataset/DatasetService.scala (2 hunks)
  • conf/webknossos.latest.routes (1 hunks)
  • docs/datasets/settings.md (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/MagLinkInfo.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataServiceHolder.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (3 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingService.scala (1 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala (1 hunks)
💤 Files with no reviewable changes (1)
  • app/controllers/DatasetController.scala
🧰 Additional context used
🧬 Code Definitions (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1) (1)
  • DSRemoteWebknossosClient (56-195)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceRepository.scala (1) (1)
  • DataSourceRepository (15-60)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (1) (1)
  • DataSourceService (30-338)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1) (1)
  • remoteWKClient (152-152)
app/controllers/WKRemoteDataStoreController.scala (1)
app/models/dataset/Dataset.scala (3) (3)
  • findOne (200-207)
  • findOneByDirectoryNameAndOrganization (408-421)
  • findAllForDataset (897-907)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/MagLinkInfo.scala (1)
app/models/dataset/Dataset.scala (1) (1)
  • dataSourceId (89-89)
🪛 LanguageTool
docs/datasets/settings.md

[typographical] ~74-~74: The conjunction “so that” does not have a comma in front.
Context: ...rs to the dataset they are referenced in, so that it can still be accessed. ![Dataset Ed...

(SO_THAT_UNNECESSARY_COMMA)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (33)
CHANGELOG.unreleased.md (1)

21-21: Concise and descriptive changelog entry

The changelog entry clearly explains the improved behavior for deleting datasets/layers, which aligns well with the PR objectives.

conf/webknossos.latest.routes (1)

112-112:

✅ Verification successful

New route for accessing directory paths

This new endpoint appears to support the layer reference management functionality, allowing retrieval of paths for specific organization/directory combinations in a datastore.


🏁 Script executed:

#!/bin/bash
# Check if there's a matching controller implementation for this new route
rg -A 3 -B 3 "def getPaths" --glob "*.scala"

Length of output: 1233


Verified new endpoint for directory path access

I confirmed that the route in conf/webknossos.latest.routes now correctly maps to the getPaths method in app/controllers/WKRemoteDataStoreController.scala. The shell script output verifies that the getPaths controller method (accepting name, key, organizationId, and directoryName as parameters) exists and is properly set up. This endpoint supports the intended layer reference management functionality for retrieving directory paths by organization and directory.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala (1)

72-72:

✅ Verification successful

Updated BinaryDataService constructor with additional parameter

The constructor call has been updated to include an additional None parameter, likely corresponding to a new remoteWKClient parameter in the BinaryDataService class. This change supports the enhanced functionality for handling referenced layers.


🏁 Script executed:

#!/bin/bash
# Check the signature of the BinaryDataService class to confirm parameter alignment
rg -A 10 "class BinaryDataService" --glob "*.scala"

Length of output: 3750


Verified Update: BinaryDataService Constructor Parameter

The instantiation in VolumeTracingService.scala now correctly includes an additional None value that matches the new remoteWKClient: Option[DSRemoteWebknossosClient] parameter in BinaryDataService. This change is aligned with the updated constructor signature (as verified in webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala) and supports the enhanced functionality for handling referenced layers.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)

123-127: New method for fetching layer paths looks good

This new fetchPaths method allows retrieving linked layer path information from WebKnossos, which is essential for the PR's goal of handling referenced data during layer deletion.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingService.scala (1)

106-106: Updated constructor call to accommodate new parameter

The BinaryDataService constructor now receives an additional None parameter, which likely represents the optional remoteWKClient needed for the new symlink resolution functionality.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (3)

20-24: Added import for DSRemoteWebknossosClient

Good improvement to explicitly import the DSRemoteWebknossosClient class.


113-114: Added remoteWebknossosClient as dependency

The UploadService now receives the DSRemoteWebknossosClient as a constructor parameter, enabling it to interact with the remote WebKnossos server during upload operations.


152-152: Implemented remoteWKClient method

This override correctly provides access to the injected remoteWebknossosClient, enabling the DatasetDeleter trait to access it when needed for resolving symlinks during layer deletions.

app/models/dataset/DatasetService.scala (3)

7-7: Updated import to include DataSourceMagInfo

The import now correctly references DataSourceMagInfo, which is used in the updated getPathsForDataLayer method.


360-361: Changed return type to use DataSourceMagInfo

The method signature has been updated to return DataSourceMagInfo instead of DatasetMagInfo, which likely provides more comprehensive information needed for the proper handling of symlinked layers.


369-370: Improved filtering logic for linked paths

The filtering logic now excludes paths from the same data source rather than exact path matches. This is a key improvement for the PR's goal of properly handling referenced data during layer deletion, as it ensures that only references to data in other datasets are considered for preservation.

app/controllers/WKRemoteDataStoreController.scala (3)

8-8: Good addition of necessary import.

The imported LayerMagLinkInfo and MagLinkInfo types are required for the new getPaths method implementation.


51-51: Appropriate dependency injection.

Adding the datasetLayerDAO parameter is necessary to fetch layer information in the new getPaths method.


240-255: Well-structured implementation of the getPaths endpoint.

The method correctly:

  1. Validates datastore access
  2. Retrieves organization and dataset
  3. Fetches all layers for the dataset
  4. Gets paths for each layer
  5. Returns structured data in JSON format

This endpoint will be essential for handling layer symlinks when deleting datasets, as described in the PR objectives.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (1)

25-26: Good addition of remote client parameter.

Adding the remoteWKClient parameter provides the BinaryDataService with the capability to communicate with remote WebKnossos instances, which will be necessary for managing symlinked data across datasets when deleting layers.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataServiceHolder.scala (2)

22-26: Properly updated dependency injection.

The constructor parameters have been neatly reformatted, and the new remoteWebknossosClient parameter has been correctly added to support the functionality required for layer deletion with symlink preservation.


49-50: Correct instantiation with new parameter.

The binaryDataService instantiation is properly updated to include the new remoteWebknossosClient parameter, ensuring the service has all required dependencies.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/MagLinkInfo.scala (3)

7-16: Well-designed data model for source information.

The DataSourceMagInfo class effectively encapsulates all necessary information about a data source, including:

  • Data source identifier
  • Layer name
  • Magnification vector
  • Path information
  • Local data status

The JSON format implementation will facilitate serialization for API responses.


18-22: Clear representation of mag links.

The MagLinkInfo class provides a clean structure to represent relationships between a primary mag and its linked mags, which is essential for tracking symlinks between layers.


24-28: Effective layer organization structure.

The LayerMagLinkInfo class appropriately groups multiple MagLinkInfo objects by layer name, providing a hierarchical structure that will be useful for the getPaths endpoint response.

app/models/dataset/Dataset.scala (6)

8-8: Import usage approved.
This new import for DataSourceMagInfo is consistent with the added logic in the file.


805-814: Comprehensive transformation logic.
rowsToMagInfos neatly converts DB rows into typed domain objects. No issues.


815-823: Query alignment for findPathsForDatasetAndDatalayer.
The joined queries and data-mapping approach seem correct. Confirm $columns includes the columns consumed by DataSourceMagRow.


825-832: Similar query approach for findAllByRealPath.
Implementation matches the pattern in findPathsForDatasetAndDatalayer. Looks good.


35-35:

✅ Verification successful

New slick import for custom row parsing.
Ensure all raw queries align column order with the DataSourceMagRow fields to avoid runtime mismatches.


🏁 Script executed:

#!/bin/bash
# Verify all SELECT column orders match DataSourceMagRow constructor usage
rg -A5 'SELECT \$columns,' 

Length of output: 1130


Slick Import Verified & Column Order Confirmed

The new slick.jdbc.GetResult import is correctly implemented for custom row parsing. Verification confirms that the raw SELECT queries (e.g., those using $columns, _organization, directoryName) align with the DataSourceMagRow fields. Please ensure that any future modifications maintain this column order to avoid runtime mismatches.


793-804:

✅ Verification successful

Check column ordering in GetResultDataSourceMagRow.
Verify that the selected columns strictly match the constructor parameters for DataSourceMagRow. Otherwise, misaligned columns may lead to runtime errors.


🏁 Script executed:

#!/bin/bash
# Verify the SQL queries that populate DataSourceMagRow and confirm they match the constructor's field order
rg -A8 'DataSourceMagRow('

Length of output: 113


Attention: Additional Verification Needed for GetResultDataSourceMagRow Column Alignment

The initial verification command failed due to an unescaped parenthesis in the regex. To properly verify that the SQL queries and the constructor parameters for DataSourceMagRow are aligned, please run the following additional shell script:

This script uses an escaped parenthesis (\() to search for instances where DataSourceMagRow is invoked, and it also locates the class definition to verify the constructor’s parameter order. Once you review the outputs, confirm that the column order in the GetResultDataSourceMagRow implementation strictly matches the order of parameters in the DataSourceMagRow constructor.


🏁 Script executed:

#!/bin/bash
echo "Checking instantiations in app/models/dataset/Dataset.scala:"
rg -A10 "DataSourceMagRow\(" app/models/dataset/Dataset.scala

echo "---------"
echo "Looking for the definition of DataSourceMagRow (its constructor signature):"
rg -n "case class DataSourceMagRow" .

Length of output: 1511


Column ordering in GetResultDataSourceMagRow verified.
The verification confirms that the order of the parameters provided in the DataSourceMagRow instantiation exactly matches the constructor signature declared in the case class (i.e., _dataset, dataLayerName, mag, path, realPath, hasLocalData, _organization, and directoryName). No changes are required.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (7)

2-9: New imports introduced.
All newly imported classes and traits appear relevant for symlink handling, JSON parsing, and remote data source interaction.


11-12: Liftweb imports for Box usage.
Using tryo and Full maintains consistency with the existing code style. No concerns here.


64-68: Symlink move precedes trash deletion.
If moveSymlinks fails, the dataset is not deleted. Verify this ordering is intentional.


70-70: Remote client accessor.
The optional remoteWKClient is straightforward. No immediate issues.


93-103: getFullyLinkedLayers synergy.
Nicely checks for all local mags and an intersection of linked layers. Straightforward logic.


111-117: getMagPath location builder.
Clear path construction for data source references. Looks fine.


150-165: updateMagSymlinks single-step logic.
Deletes any existing link before recreating it. This is functional, but be aware of concurrency edge cases.

Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, quite the complex logic 🙈 But it does seem justified, I don’t see easy ways to simplify.

I’m a little fearful of exceptions, could you have a look which of the used file functions can throw exceptions and add tryos there?

Also, maybe it makes sense to first check for write access so that a readable error message can be given. Also, what happens if for permission reasons this resymlinking does not happen. Can the deletion still go forward? What is shown to the user?

I also added a few small comments on the code.

Did no testing yet, will do that in the next round.

path: Option[String],
realPath: Option[String],
hasLocalData: Boolean)
case class DataSourceMagRow(_dataset: String,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case class DataSourceMagRow(_dataset: String,
case class DataSourceMagRow(_dataset: ObjectId,

does that work?

Comment on lines 51 to 53
logger.info(s"Deleting dataset by moving it from $dataSourcePath to $targetPath${if (reason.isDefined)
s" because ${reason.getOrElse("")}"
else "..."}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.info(s"Deleting dataset by moving it from $dataSourcePath to $targetPath${if (reason.isDefined)
s" because ${reason.getOrElse("")}"
else "..."}")
logger.info(s"Deleting dataset by moving it from $dataSourcePath to $targetPath${reason.map(r => s"because $r").getOrElse("...")}")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I wrote this but happy to change 😄

@@ -31,23 +39,244 @@ trait DatasetDeleter extends LazyLogging with DirectoryConstants {
case e: Exception => Fox.failure(s"Deleting dataset failed: ${e.toString}", Full(e))
}

def moveToTrash(organizationId: String,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably make everything in here either private or protected, unless it’s really necessary for it to be public.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not private because it is defined in the deleteOnDisk method, apart from deleteOnDisk there are no non-private methods here.

@@ -22,7 +22,8 @@ class BinaryDataService(val dataBaseDir: Path,
val agglomerateServiceOpt: Option[AgglomerateService],
remoteSourceDescriptorServiceOpt: Option[RemoteSourceDescriptorService],
sharedChunkContentsCache: Option[AlfuCache[String, MultiArray]],
datasetErrorLoggingService: Option[DatasetErrorLoggingService])(implicit ec: ExecutionContext)
datasetErrorLoggingService: Option[DatasetErrorLoggingService],
val remoteWKClient: Option[DSRemoteWebknossosClient])(implicit ec: ExecutionContext)
Copy link
Member

@fm3 fm3 Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it really necessary for the binarydataservice to extend the datasetdeleter? I get the feeling it shouldn’t be involved in deletions. If not, this can also be removed again, and the client could be made non-optional in the deleter?

Edit: looks like the only usage is in def deleteOnDisk in DataSourceController. That could be moved to the dataSourceService instead, which could extend the deleter without any Option shenanigans. The DataSourceService even has a remoteWebknossosClient injected already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

Comment on lines +360 to +361
def getPathsForDataLayer(datasetId: ObjectId,
layerName: String): Fox[List[(DataSourceMagInfo, List[DataSourceMagInfo])]] =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could youu add a comment explaining the signature of this function? I think it’s not immediately obvious why the return type is this nested list

case Some(layersAndLinkedMags) =>
layersAndLinkedMags.map(layerMagLinkInfo =>
handleLayerSymlinks(dataSourceId, layerMagLinkInfo.layerName, layerMagLinkInfo.magLinkInfos.toList))
case None => Seq(tryo {})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does the tryo {} return? maybe we could make this more explicit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed because remoteWKClient is no longer optional

case Full(dataSource) =>
val updatedDataSource = dataSource.copy(dataLayers = dataSource.dataLayers.map {
case dl: DataLayerWithMagLocators =>
if (dl.mags.forall(_.path.exists(_.startsWith("file://")))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (dl.mags.forall(_.path.exists(_.startsWith("file://")))) {
if (dl.mags.forall(_.path.exists(_.startsWith(s"${DataVaultService.schemeFile}://")))) {

That way this will be listed as a usage of this file scheme.

case dl => dl
})
// Write properties back
Files.delete(propertiesPath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might throw an exception, maybe we should tryo this too.

@frcroth
Copy link
Member Author

frcroth commented Mar 24, 2025

Wow, quite the complex logic 🙈 But it does seem justified, I don’t see easy ways to simplify.

I’m a little fearful of exceptions, could you have a look which of the used file functions can throw exceptions and add tryos there?

I added the suggestion tryo for updateDatasourceProperties deletion, everything else should be covered by the big tryo in handleLayerSymlinks.

Also, maybe it makes sense to first check for write access so that a readable error message can be given. Also, what happens if for permission reasons this resymlinking does not happen. Can the deletion still go forward? What is shown to the user?

Added an exception before deleting stuff.

@frcroth frcroth requested a review from fm3 March 24, 2025 16:27
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (4)

112-144: Robust datasource properties update with error handling.

The method properly handles updating datasource properties, including checks for file existence and JSON validation. Error handling with tryo is used for the deletion operation.

Consider wrapping the JSON write operation in a tryo as well:

-              case Full(_) => JsonHelper.jsonToFile(propertiesPath, updatedDataSource)
+              case Full(_) => tryo(JsonHelper.jsonToFile(propertiesPath, updatedDataSource))

146-166: Comprehensive mag symlink update with permission checking.

The method handles various cases for updating mag symlinks, including permission checking, existence verification, and special handling for locally explored datasets.

Consider wrapping the Files.createSymbolicLink operations in tryo blocks to handle potential failures:

-      Files.createSymbolicLink(linkedMagPath, relativizeSymlinkPath(targetMagPath, linkedMagPath))
+      tryo(Files.createSymbolicLink(linkedMagPath, relativizeSymlinkPath(targetMagPath, linkedMagPath))) match {
+        case Full(_) => // success
+        case e => logger.warn(s"Failed to create symlink at $linkedMagPath: $e")
+      }

168-240: Complex but logically sound layer migration implementation.

The moveLayer method correctly handles moving layers and updating all related symlinks with appropriate permission checks, deletion of existing symlinks, and recreation of symlinks to the new location.

The comment on line 177 raises a good question about selection criteria. Consider adding a more definitive explanation or implementing a more deterministic selection process for the target layer.


176-178: Target selection strategy could be improved.

Currently, an arbitrary target is selected from the fully linked layers. Consider implementing a more deterministic selection strategy based on specific criteria (e.g., most recent, most referenced, etc.).

-    // Selection of the first one is arbitrary, is there anything to distinguish between them?
-    val target = fullLayerLinks.head
+    // Select target based on criteria like most recent access or layer with most references
+    val target = selectTargetLayer(fullLayerLinks)

A selectTargetLayer method could implement logic to choose the most appropriate target based on metadata or other criteria.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ab0ab84 and 6572e32.

📒 Files selected for processing (2)
  • CHANGELOG.unreleased.md (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • CHANGELOG.unreleased.md
🔇 Additional comments (9)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (9)

2-13: Well-organized imports with necessary additions.

The imports have been properly organized to support the new functionality for handling symlinks and JSON processing.


43-58: New moveToTrash method handles dataset deletion correctly.

The method properly handles moving datasets to the trash directory, with good error handling and logging. It checks if the dataset exists before attempting deletion and appropriately handles both cases.


64-67: Refactored deletion process with symlink handling.

The deleteOnDisk method now uses a clean for-comprehension to combine moveSymlinks and moveToTrash operations, ensuring that symlinks are handled before the dataset is moved to trash.


70-70: Abstract method for remote client access.

This abstract method declaration ensures implementing classes will provide access to the remote WebKnossos client, allowing for proper dependency injection.


87-97: Clear logic for identifying fully linked layers.

The method effectively determines which layers are completely linked through all magnifications, with good comments explaining the logic.


99-103: Efficient path relativization for symlinks.

The method properly converts absolute paths to relative paths for symlink creation, ensuring symlinks remain valid regardless of the absolute path to the data directory.


105-111: Clear mag path construction.

The method constructs paths to magnification directories in a clear, readable manner using appropriate path resolution.


242-297: Comprehensive layer symlink handling with excellent error handling.

The method orchestrates the handling of symlinks for layers during deletion, with proper error handling using tryo. The comments explaining different cases for magnifications without local data are very helpful.


74-85:

❓ Verification inconclusive

Comprehensive symlink migration coordination.

The moveSymlinks method effectively coordinates retrieving paths, handling layer symlinks, and updating datasource properties.

Verify all symlink operations are handled within error boundaries:


🏁 Script executed:

#!/bin/bash
# Check if the handling of symlink operations is properly wrapped in error handling
rg -n "tryo" --context 5 DatasetDeleter.scala

Length of output: 106


Update: Verify Correct File Path & Error Handling in Symlink Migration

It appears the previous verification script referenced the wrong file path (i.e. using "DatasetDeleter.scala" directly). Please ensure that the error boundary handling (using constructs such as the custom operator ?~>) in the moveSymlinks method is correctly applied in the file located at webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala. For clarity, please verify the following:

  • The file exists at the expected location.
  • Error handling for symlink operations (e.g. via Fox.combined(exceptionBoxes.map(Fox.box2Fox)) ?~> "Failed to move symlinks") is in place.
  • Confirm that any internal wrappers (if intended) properly catch and relay errors during the symlink migration.

Once you’ve validated these points—preferably by running the search on the correct file path or manually checking the file—please update the verification accordingly.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (7)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (7)

53-54: Missing space in log message.

There should be a space between the path and the reason in the log message for better readability.

-          s"Deleting dataset by moving it from $dataSourcePath to $targetPath${reason.map(r => s"because $r").getOrElse("...")}")
+          s"Deleting dataset by moving it from $dataSourcePath to $targetPath ${reason.map(r => s"because $r").getOrElse("...")}")

75-86: Consider adding comprehensive logging for symlink operations.

The symlink handling is complex and critical to data integrity. Consider adding more detailed logging to trace each step of the process, especially for debugging purposes.

 private def moveSymlinks(organizationId: String, datasetName: String)(implicit ec: ExecutionContext) =
   for {
     dataSourceId <- Fox.successful(DataSourceId(datasetName, organizationId))
+    _ = logger.info(s"Starting symlink handling for dataset: $datasetName in organization: $organizationId")
     layersAndLinkedMags <- remoteWebknossosClient.fetchPaths(dataSourceId)
+    _ = logger.info(s"Found ${layersAndLinkedMags.size} layers with linked mags to process")
     exceptionBoxes = layersAndLinkedMags.map(layerMagLinkInfo =>
       handleLayerSymlinks(dataSourceId, layerMagLinkInfo.layerName, layerMagLinkInfo.magLinkInfos.toList))
     _ <- Fox.combined(exceptionBoxes.map(Fox.box2Fox)) ?~> "Failed to move symlinks"
     affectedDataSources = layersAndLinkedMags
       .flatMap(_.magLinkInfos.map(m => m.linkedMags.map(_.dataSourceId)))
       .flatten
+    _ = logger.info(s"Updating properties for ${affectedDataSources.size} affected data sources")
     _ <- updateDatasourceProperties(affectedDataSources)
+    _ = logger.info(s"Successfully completed symlink handling for dataset: $datasetName")
   } yield ()

88-98: Add documentation to clarify the purpose of getFullyLinkedLayers.

This method performs a complex operation to find layers that fully link to all mags. Adding documentation will help future developers understand its purpose and logic.

+  /**
+   * Identifies layers that fully link to all mags in the list.
+   * A fully linked layer is one where:
+   * 1. All mags have local data
+   * 2. There's at least one dataset/layer combination that links to every mag
+   *
+   * @param linkedMags List of mag link information
+   * @return Option containing sequences of (DataSourceId, layerName) pairs that fully link to all mags, or None if no such layers exist
+   */
   private def getFullyLinkedLayers(linkedMags: List[MagLinkInfo]): Option[Seq[(DataSourceId, String)]] = {
     val allMagsLocal = linkedMags.forall(_.mag.hasLocalData)
     val allLinkedDatasetLayers = linkedMags.map(_.linkedMags.map(lm => (lm.dataSourceId, lm.dataLayerName)))
     // Get combinations of datasourceId, layerName that link to EVERY mag
     val linkedToByAllMags = allLinkedDatasetLayers.reduce((a, b) => a.intersect(b))
     if (allMagsLocal && linkedToByAllMags.nonEmpty) {
       Some(linkedToByAllMags)
     } else {
       None
     }
   }

100-104: Simplify the relativizeSymlinkPath method.

The method can be simplified by removing the intermediate variable and directly returning the result.

 private def relativizeSymlinkPath(targetPath: Path, originPath: Path): Path = {
   val absoluteTargetPath = targetPath.toAbsolutePath
-  val relativeTargetPath = originPath.getParent.toAbsolutePath.relativize(absoluteTargetPath)
-  relativeTargetPath
+  originPath.getParent.toAbsolutePath.relativize(absoluteTargetPath)
 }

169-246: Add high-level documentation for the complex moveLayer method.

This method handles multiple complex scenarios and would benefit from clear documentation that explains its purpose and the different cases it handles.

+  /**
+   * Moves a layer from one dataset to another when it's fully linked by other datasets.
+   * This operation involves:
+   * 1. Identifying a target dataset/layer to move the source layer to
+   * 2. Moving the physical layer directory to the target location
+   * 3. Updating all symlinks that were pointing to the original layer
+   * 4. Handling individual mag symlinks for partially linked layers
+   *
+   * @param sourceDataSource The data source ID of the layer being moved
+   * @param sourceLayer The name of the layer being moved
+   * @param fullLayerLinks List of dataset/layer pairs that fully link to this layer
+   * @param layerMags Information about mags in this layer and their links
+   */
   private def moveLayer(sourceDataSource: DataSourceId,
                         sourceLayer: String,
                         fullLayerLinks: Seq[(DataSourceId, String)],
                         layerMags: List[MagLinkInfo]): Unit = {

191-192: Improve log message clarity.

The log message could be more clear about the purpose of the operation being performed.

-      s"Found complete symlinks to layer; Moving layer $sourceLayer from $sourceDataSource to $moveToDataSource/$moveToDataLayer")
+      s"Found complete symlinks to layer. Moving layer $sourceLayer from dataset $sourceDataSource to dataset $moveToDataSource layer $moveToDataLayer to preserve data")

248-303: Consider breaking down handleLayerSymlinks into smaller functions.

This method is quite complex and handles multiple scenarios. Breaking it down into smaller, focused functions would improve readability and maintainability.

Consider refactoring this large method into smaller helper methods:

  1. A method for the fully linked layer case
  2. A method for handling individual mag symlinks
  3. A method specifically for the local data movement

This would make the code more maintainable and easier to test.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6572e32 and 8d14be3.

📒 Files selected for processing (2)
  • util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (2 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala (2)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (1)
  • failure (67-71)
util/src/main/scala/com/scalableminds/util/time/Instant.scala (1)
  • toString (15-15)
🔇 Additional comments (6)
util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala (2)

95-99: LGTM: Well-structured new method for exception formatting.

The addition of the firstException method provides a clean way to extract and format the exception message from a Failure instance. This follows the established pattern of helper methods in this trait and will improve error reporting by displaying exception information clearly.


117-118: Enhanced error reporting with improved message structure.

The change to include the first exception at the beginning of the formatted message provides better context when debugging failures. This will be particularly useful when handling errors during the layer deletion process mentioned in the PR objectives, making it easier to diagnose issues with symlink operations and data movement.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (4)

65-68: LGTM: Good error handling in the deletion process.

The for-comprehension with appropriate error messages ensures that symlink handling failures are properly reported before attempting to move the dataset to trash.


147-167: LGTM: Good permission checks before symlink operations.

The method correctly checks write permissions before attempting to modify symlinks, which will help prevent partial operations and provide clear error messages to users.


2-10: LGTM: Good import organization.

The imports are well organized, clearly separating the different types of dependencies needed for the symlink handling functionality.


75-145:

❓ Verification inconclusive

Verify proper error recovery in symlink handling.

The symlink handling involves multiple filesystem operations. Ensure that the system can recover properly if some operations fail in the middle of the process, particularly in cases of permission issues or network failures when dealing with remote resources.


🏁 Script executed:

#!/bin/bash
# Check for exception handling patterns in the codebase

# Look for similar error handling patterns in other parts of the codebase
echo "Checking for tryo usage patterns in other deletion operations:"
rg "tryo.*\{.*delete" --type scala

echo "Checking for recovery mechanisms in similar operations:"
rg "recover|fallback|onFailure" --type scala -A 2 -B 2

Length of output: 65883


Action: Confirm and Document Error Recovery in Symlink Handling

The implementation in DatasetDeleter.scala leverages the Fox monad (with constructs like ?~> and tryo) consistent with error handling patterns across the codebase. However, please verify that:

  • Partial Operation Rollback: In the moveSymlinks method, if some filesystem or remote operations (e.g., moving symlinks or deleting the dataset properties file) succeed while subsequent ones fail, the system does not end up in an inconsistent state. If not already handled, consider adding a rollback mechanism or documenting that partial changes are acceptable.

  • Robust Error Wrapping: Ensure that all critical filesystem operations (such as Files.delete) and remote resource calls are adequately wrapped to capture exceptions related to permission issues or network failures.

If the current error recovery is by design, please add documentation clarifying that these failure paths have been considered. Otherwise, adjustments may be needed to manage partial failures more gracefully.

Comment on lines +123 to +139
JsonHelper.validatedJsonFromFile[DataSource](propertiesPath, dataBaseDir) match {
case Full(dataSource) =>
val updatedDataSource = dataSource.copy(dataLayers = dataSource.dataLayers.map {
case dl: DataLayerWithMagLocators =>
if (dl.mags.forall(_.path.exists(_.startsWith(s"${DataVaultService.schemeFile}://")))) {
// Setting path to None means using resolution of layer/mag directories to access data
dl.mapped(magMapping = _.copy(path = None))
} else {
dl
}
case dl => dl
})
// Write properties back
tryo(Files.delete(propertiesPath)) match {
case Full(_) => JsonHelper.jsonToFile(propertiesPath, updatedDataSource)
case e => e
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

❓ Verification inconclusive

Add error handling for JSON parsing failures.

The method handles errors when deleting files but doesn't provide detailed error logging for JSON parsing failures. Consider adding more specific error handling for this case.


🏁 Script executed:

#!/bin/bash
# Check if there are any JSON related failures in the logs

# Look for JSON parsing errors in the logs
rg -i "json.*error|failed.*json|invalid.*json" --type log

Length of output: 208


Action Required: Enhance JSON Parsing Error Handling

  • In DatasetDeleter.scala (lines 123–139), the code only handles file deletion errors but does not account for JSON parsing failures from the call to JsonHelper.validatedJsonFromFile.

  • Please add a dedicated case in the match expression (e.g., a Failure branch) that logs detailed error messages when JSON parsing fails. For example:

    JsonHelper.validatedJsonFromFile[DataSource](propertiesPath, dataBaseDir) match {
      case Full(dataSource) =>
        // … existing processing …
      case Failure(msg, exception, chain) =>
        logger.error(s"JSON parsing failed for file $propertiesPath: $msg", exception)
        // Consider handling the failure further, e.g., returning an error response or safe fallback
      case _ =>
        logger.error(s"Unexpected error during JSON parsing for file $propertiesPath.")
    }
  • Note: Our initial log search using rg did not yield any JSON parsing errors (possibly due to filtering issues), so please verify manually that any JSON parsing failures in your runtime or production environment are captured and logged appropriately.

@frcroth
Copy link
Member Author

frcroth commented Mar 31, 2025

Place this script into sample_organization to generate test datasets to test scenarios (not testing locally explored layers) with a dataset as argument that has a color layer and mags 1, 2-2-1, 4-4-1 ( I have used l4_sample with only the color layer and reduced mags to make copying faster)

#!/bin/bash

set -e  # Exit on error

if [ "$#" -ne 1 ]; then
    echo "Usage: $0 <dataset_directory>"
    exit 1
fi

SOURCE_DIR="$1"
BASE_DIR="$(dirname "$SOURCE_DIR")"

# Function to copy dataset while maintaining symlinks
copy_test_dataset() {
    local src="$1"
    local dest="$2"
    mkdir -p "$dest"
    cp -a "$src"/. "$dest"  # Preserve symlinks and structure
    cp "$src/datasource-properties.json" "$dest/"  # Copy properties file
}

# Test Case 1: DS1 has color with local data, DS2 has color symlinked to DS1
TC1_DS1="$BASE_DIR/TC1_DS1"
TC1_DS2="$BASE_DIR/TC1_DS2"
copy_test_dataset "$SOURCE_DIR" "$TC1_DS1"
mkdir -p "$TC1_DS2"  # Ensure parent directory exists
ln -s "../TC1_DS1/color" "$TC1_DS2/color"
cp "$TC1_DS1/datasource-properties.json" "$TC1_DS2/"  # Copy properties file

# Test Case 2: TC2_DS1 has color with local data, TC2_DS3 has color symlinked to TC2_DS1
TC2_DS1="$BASE_DIR/TC2_DS1"
TC2_DS2="$BASE_DIR/TC2_DS2"
TC2_DS3="$BASE_DIR/TC2_DS3"
copy_test_dataset "$SOURCE_DIR" "$TC2_DS1"
mkdir -p "$TC2_DS3"  # Ensure parent directory exists
ln -s "../TC2_DS1/color" "$TC2_DS3/color"
cp "$TC2_DS1/datasource-properties.json" "$TC2_DS3/"  # Copy properties file
mkdir -p "$TC2_DS2"  # Ensure parent directory exists
ln -s "../TC2_DS1/color" "$TC2_DS2/color"
cp "$TC2_DS1/datasource-properties.json" "$TC2_DS2/"  # Copy properties file

# Test Case 3: TC3_DS1 has color with local data, TC3_DS2 has color with mags symlinked to TC3_DS1
TC3_DS1="$BASE_DIR/TC3_DS1"
TC3_DS2="$BASE_DIR/TC3_DS2"
copy_test_dataset "$SOURCE_DIR" "$TC3_DS1"
mkdir -p "$TC3_DS2/color"
for mag in "$TC3_DS1/color/"*; do
    ln -s "../../$mag" "$TC3_DS2/color/$(basename "$mag")"
done
cp "$TC3_DS1/datasource-properties.json" "$TC3_DS2/"  # Copy properties file

# Test Case 4: TC4_DS1 has color with local data, TC4_DS2 has one mag symlinked to TC4_DS1
TC4_DS1="$BASE_DIR/TC4_DS1"
TC4_DS2="$BASE_DIR/TC4_DS2"
copy_test_dataset "$SOURCE_DIR" "$TC4_DS1"
mkdir -p "$TC4_DS2/color"
ln -s "../../$TC4_DS1/color/1" "$TC4_DS2/color/1"
cp -r "$TC4_DS1/color/2-2-1" "$TC4_DS2/color/2-2-1"
cp -r "$TC4_DS1/color/4-4-1" "$TC4_DS2/color/4-4-1"
cp "$TC4_DS1/datasource-properties.json" "$TC4_DS2/"  # Copy properties file

# Test Case 5: TC5_DS1 has color with local data, TC5_DS2 and TC5_DS3 both symlink the same mag
TC5_DS1="$BASE_DIR/TC5_DS1"
TC5_DS2="$BASE_DIR/TC5_DS2"
TC5_DS3="$BASE_DIR/TC5_DS3"
copy_test_dataset "$SOURCE_DIR" "$TC5_DS1"
mkdir -p "$TC5_DS2/color" "$TC5_DS3/color"
ln -s "../../$TC5_DS1/color/1" "$TC5_DS2/color/1"
cp -r "$TC5_DS1/color/2-2-1" "$TC5_DS2/color/2-2-1"
cp -r "$TC5_DS1/color/4-4-1" "$TC5_DS2/color/4-4-1"
ln -s "../../$TC5_DS1/color/1" "$TC5_DS3/color/1"
cp -r "$TC5_DS1/color/2-2-1" "$TC5_DS3/color/2-2-1"
cp -r "$TC5_DS1/color/4-4-1" "$TC5_DS3/color/4-4-1"
cp "$TC5_DS1/datasource-properties.json" "$TC5_DS2/"  # Copy properties file
cp "$TC5_DS1/datasource-properties.json" "$TC5_DS3/"  # Copy properties file

echo "Test datasets created in $BASE_DIR"
echo "To test deletion scenarios, remove the DS1 datasets"

``

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
app/controllers/WKRemoteDataStoreController.scala (1)

240-255: Consider handling potential error cases more explicitly.

While using the Fox monad provides error handling, consider adding explicit error messaging for the layer retrieval operation to improve debugging and user feedback.

     Action.async { implicit request =>
       dataStoreService.validateAccess(name, key) { _ =>
         for {
           organization <- organizationDAO.findOne(organizationId)(GlobalAccessContext)
           dataset <- datasetDAO.findOneByDirectoryNameAndOrganization(directoryName, organization._id)(
             GlobalAccessContext)
-          layers <- datasetLayerDAO.findAllForDataset(dataset._id)
+          layers <- datasetLayerDAO.findAllForDataset(dataset._id) ?~> 
+            s"Failed to retrieve layers for dataset ${dataset.name} (${dataset._id})"
           magsAndLinkedMags <- Fox.serialCombined(layers)(l => datasetService.getPathsForDataLayer(dataset._id, l.name))
           magLinkInfos = magsAndLinkedMags.map(_.map { case (mag, linkedMags) => MagLinkInfo(mag, linkedMags) })
           layersAndMagLinkInfos = layers.zip(magLinkInfos).map {
             case (layer, magLinkInfo) => LayerMagLinkInfo(layer.name, magLinkInfo)
           }
         } yield Ok(Json.toJson(layersAndMagLinkInfos))
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d026b65 and cbb431d.

📒 Files selected for processing (5)
  • CHANGELOG.unreleased.md (1 hunks)
  • app/controllers/WKRemoteDataStoreController.scala (4 hunks)
  • conf/webknossos.latest.routes (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • conf/webknossos.latest.routes
  • CHANGELOG.unreleased.md
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (3)
app/controllers/WKRemoteDataStoreController.scala (3)

8-8: LGTM: Required import for new functionality.

The import for LayerMagLinkInfo and MagLinkInfo is necessary to support the return type of the new getPaths method.


38-38: Good addition of explicit duration import.

The import scala.concurrent.duration.DurationInt enables the 30 seconds syntax used on line 203, making the code more readable and ensuring the implicit conversion is explicitly imported.


51-51: LGTM: Required dependency for new functionality.

Adding the datasetLayerDAO dependency to the controller is necessary for retrieving layer information in the new getPaths method, maintaining the dependency injection pattern used throughout the controller.

Comment on lines +240 to +255
def getPaths(name: String, key: String, organizationId: String, directoryName: String): Action[AnyContent] =
Action.async { implicit request =>
dataStoreService.validateAccess(name, key) { _ =>
for {
organization <- organizationDAO.findOne(organizationId)(GlobalAccessContext)
dataset <- datasetDAO.findOneByDirectoryNameAndOrganization(directoryName, organization._id)(
GlobalAccessContext)
layers <- datasetLayerDAO.findAllForDataset(dataset._id)
magsAndLinkedMags <- Fox.serialCombined(layers)(l => datasetService.getPathsForDataLayer(dataset._id, l.name))
magLinkInfos = magsAndLinkedMags.map(_.map { case (mag, linkedMags) => MagLinkInfo(mag, linkedMags) })
layersAndMagLinkInfos = layers.zip(magLinkInfos).map {
case (layer, magLinkInfo) => LayerMagLinkInfo(layer.name, magLinkInfo)
}
} yield Ok(Json.toJson(layersAndMagLinkInfos))
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

❓ Verification inconclusive

Ensure proper test coverage for new endpoint.

This new endpoint is critical for managing layer symlinks during deletion operations. Ensure comprehensive test coverage exists to verify its behavior with various dataset and layer configurations.


🏁 Script executed:

#!/bin/bash
# Check if tests exist for this new endpoint
rg -A 3 -B 3 "getPaths.*Action" --type scala test/
rg -A 3 -B 3 "LayerMagLinkInfo" --type scala test/

Length of output: 103


Action Required: Add Comprehensive Tests for the getPaths Endpoint

It appears that no tests referencing the new getPaths endpoint or its usage of LayerMagLinkInfo were found in the test suite. Since this endpoint is critical for managing layer symlinks during deletion operations, please ensure that comprehensive tests are added to verify the behavior with various dataset and layer configurations.

  • File: app/controllers/WKRemoteDataStoreController.scala (lines 240-255)
  • Areas to Cover:
    • Validate that the access control (validateAccess) correctly gates the endpoint.
    • Ensure that retrievals from organizationDAO, datasetDAO, and datasetLayerDAO behave as expected.
    • Test the complete mapping logic that produces LayerMagLinkInfo with varying dataset and layer scenarios.

💡 Verification agent

🧩 Analysis chain

Add documentation and consider performance optimizations.

The implementation of getPaths is well-structured and follows the existing patterns in the controller. However, there are a few improvements to consider:

  1. Add method documentation explaining its purpose and usage context
  2. The Fox.serialCombined performs operations sequentially, which could be a performance bottleneck with many layers - consider if these operations could be parallelized
  3. Ensure comprehensive error handling for all database operations
-  def getPaths(name: String, key: String, organizationId: String, directoryName: String): Action[AnyContent] =
+  /**
+   * Retrieves paths for all layers in a dataset.
+   * Used to support layer symlink management when deleting layers.
+   *
+   * @param name Datastore name
+   * @param key Datastore access key
+   * @param organizationId ID of the organization owning the dataset
+   * @param directoryName Directory name of the dataset
+   * @return JSON response with LayerMagLinkInfo objects containing layer paths
+   */
+  def getPaths(name: String, key: String, organizationId: String, directoryName: String): Action[AnyContent] =
     Action.async { implicit request =>
       dataStoreService.validateAccess(name, key) { _ =>
         for {
           organization <- organizationDAO.findOne(organizationId)(GlobalAccessContext)
           dataset <- datasetDAO.findOneByDirectoryNameAndOrganization(directoryName, organization._id)(
             GlobalAccessContext)
           layers <- datasetLayerDAO.findAllForDataset(dataset._id)
-          magsAndLinkedMags <- Fox.serialCombined(layers)(l => datasetService.getPathsForDataLayer(dataset._id, l.name))
+          // Perform path lookups in parallel for better performance with many layers
+          magsAndLinkedMags <- Fox.combined(layers)(l => datasetService.getPathsForDataLayer(dataset._id, l.name))
           magLinkInfos = magsAndLinkedMags.map(_.map { case (mag, linkedMags) => MagLinkInfo(mag, linkedMags) })
           layersAndMagLinkInfos = layers.zip(magLinkInfos).map {
             case (layer, magLinkInfo) => LayerMagLinkInfo(layer.name, magLinkInfo)
           }
         } yield Ok(Json.toJson(layersAndMagLinkInfos))
       }
     }

Consider verifying if parallel execution using Fox.combined instead of Fox.serialCombined is appropriate for this case:


🏁 Script executed:

#!/bin/bash
# Check implementation of Fox.serialCombined vs Fox.combined to understand differences
rg -A 5 "def serialCombined" --type scala
rg -A 5 "def combined" --type scala

# Look for similar patterns in the codebase
rg -B 2 -A 2 "Fox\.combined" --type scala

Length of output: 23365


Attention: Required changes on getPaths method for parallel execution

The updated method documentation is a good improvement, and parallelizing the data layer lookups can boost performance. However, based on our investigation:

  • Parallel Execution: The current diff replaces the sequential call

    Fox.serialCombined(layers)(l => datasetService.getPathsForDataLayer(dataset._id, l.name))

    with

    Fox.combined(layers)(l => datasetService.getPathsForDataLayer(dataset._id, l.name))

    However, the definition of Fox.combined in our codebase expects a sequence of Fox[T] values (i.e. a single-argument list). To safely execute these operations in parallel, please refactor the call by mapping the function over the layers before combining the results. For example, update the call to:

    magsAndLinkedMags <- Fox.combined(layers.map(l => datasetService.getPathsForDataLayer(dataset._id, l.name)))

    This change ensures that each lookup is executed concurrently according to Fox.combined’s signature.

  • Error Handling: While the core logic remains sound, please verify that comprehensive error handling is applied to all database operations to avoid unhandled failures.

Please incorporate these changes to resolve the concerns.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Delete layers that are referenced by other datasets
2 participants