Skip to content

Commit

Permalink
doc: clean up and expand on traversal pkg docs
Browse files Browse the repository at this point in the history
  • Loading branch information
rvagg committed Mar 28, 2023
1 parent 4888b08 commit d013fc0
Show file tree
Hide file tree
Showing 2 changed files with 147 additions and 63 deletions.
137 changes: 91 additions & 46 deletions traversal/doc.go
Original file line number Diff line number Diff line change
@@ -1,48 +1,94 @@
// This package provides functional utilities for traversing and transforming
// IPLD nodes.
//
// The traversal.Path type provides a description of how to perform
// several steps across a Node tree. These are dual purpose:
// Paths can be used as instructions to do some traversal, and
// Paths are accumulated during traversals as a log of progress.
//
// "Focus" functions provide syntactic sugar for using ipld.Path to jump
// to a Node deep in a tree of other Nodes.
//
// "FocusedTransform" functions can do the same such deep jumps, and support
// mutation as well!
// (Of course, since ipld.Node is an immutable interface, more precisely
// speaking, "transformations" are implemented rebuilding trees of nodes to
// emulate mutation in a copy-on-write way.)
//
// "Walk" functions perform a walk of a Node graph, and apply visitor
// functions multiple Nodes. The more advanced Walk functions can be guided
// by Selectors, which provide a declarative mechanism for guiding the
// traversal and filtering which Nodes are of interest.
// (See the selector sub-package for more detail.)
//
// "WalkTransforming" is similar to Traverse, but with support for mutations.
// Like "FocusTransform", "WalkTransforming" operates in a copy-on-write way.
//
// All of these functions -- the "Focus*" and "Walk*" family alike --
// work via callbacks: they do the traversal, and call a user-provided function
// with a handle to the reached Node. Further "Focus" and "Walk" can be used
// recursively within this callback.
//
// All of these functions -- the "Focus*" and "Walk*" family alike --
// include support for automatic resolution and loading of new Node trees
// whenever IPLD Links are encountered. This can be configured freely
// by providing LinkLoader interfaces to the traversal.Config.
//
// Some notes on the limits of usage:
//
// The "*Transform" family of methods is most appropriate for patterns of usage
// which resemble point mutations.
// More general transformations -- zygohylohistomorphisms, etc -- will be best
// implemented by composing the read-only systems (e.g. Focus, Traverse) and
// handling the accumulation in the visitor functions.
//
// (Why? The "point mutation" use-case gets core library support because
// Package traversal provides functional utilities for traversing and
// transforming IPLD graphs.
//
// Two primary types of traversal are implemented in this package: "Focus" and
// "Walk". Both types have a "Transforming" variant, which supports mutation
// through emulated copy-on-write tree rebuilding.
//
// Traversal operations use the Progress type for configuration and state
// tracking. Helper functions such as Focus and Walk exist to avoid manual setup
// of a Progress struct, but they cannot cross link boundaries without a
// LinkSystem, which needs to be configured on the Progress struct.
//
// A typical traversal operation involves creating a Progress struct, setting up
// the LinkSystem, and calling one of the Focus or Walk functions on the
// Progress object. Various other configuration options are available when
// traversing this way.
//
// # Focus
//
// "Focus" and "Get" functions provide syntactic sugar for using ipld.Path to
// access Nodes deep within a graph.
//
// "FocusedTransform" resembles "Focus" but supports user-defined mutation using
// its TransformFn.
//
// # Walk
//
// "Walk" functions perform a recursive walk of a Node graph, applying visitor
// functions to matched parts of the graph.
//
// The selector sub-package offers a declarative mechanism for guiding
// traversals and filtering relevant Nodes.
// (Refer to the selector sub-package for more details.)
//
// "WalkLocal" is a special case of Walk that doesn't require a selector. It
// walks a local graph, not crossing link boundaries, and calls its VisitFn for
// each encountered Node.
//
// "WalkMatching" traverses according to a selector, calling the VisitFn for
// each match based on the selector's matching rules.
//
// "WalkAdv" performs the same traversal as WalkMatching, but calls its
// AdvVisitFn on every Node, regardless of whether it matches the selector.
//
// "WalkTransforming" resembles "WalkMatching" but supports user-defined
// mutation using its TransformFn.
//
// # Usage Notes
//
// These functions work via callbacks, performing traversal and calling a
// user-provided function with a handle to the reached Node(s). Further "Focus"
// and "Walk" operations can be performed recursively within this callback if
// desired.
//
// All traversal functions operate on a Progress object, except "WalkLocal",
// which can be configured with a LinkSystem for automatic resolution and
// loading of new Node trees when IPLD Links are encountered.
//
// The "*Transform" methods are best suited for point-mutation patterns. For
// more general transformations, use the read-only systems (e.g., Focus,
// Traverse) and handle accumulation in the visitor functions.
//
// A common use case for walking traversal is running a selector over a graph
// and noting all the blocks it uses. This is achieved by configuring a
// LinkSystem that can handle and observe block loads. Be aware that a selector
// might visit the same block multiple times during a traversal, as IPLD graphs
// often form "diamond patterns" with the same block referenced from multiple
// locations.
//
// The LinkVisitOnlyOnce option can be used to avoid duplicate loads, but it
// must be used carefully with non-trivial selectors, where repeat visits of
// the same block may be essential for traversal or visit callbacks.
//
// A Budget can be set at the beginning of a traversal to limit the number of
// Nodes and/or Links encountered before failing the traversal (with the
// ErrBudgetExceeded error).
//
// The "Preloader" option provides a way to parallelize block loading in
// environments where block loading is a high-latency operation (such as
// fetching over the network).
// The traversal operation itself is not parallel and will proceed strictly
// according to path or selector order. However, a Preloader can be used to load
// blocks asynchronously, and prepare the LinkSystem that the traversal is using
// with already-loaded blocks.
//
// A Preloader and a Budget option can be used on the same traversal, BUT the
// Preloader may not receive the same links that the traversal wants to load
// from the LinkSystem. Use with care. See notes below.
package traversal

// Why only "point-mutation"? This use-case gets core library support because
// it's both high utility and highly clear how to implement it.
// More advanced transformations are nontrivial to provide generalized support
// for, for three reasons: efficiency is hard; not all existing research into
Expand All @@ -53,4 +99,3 @@
// Therefore, attempts at generalization are not included here; handling these
// issues in concrete cases is easy, so we call it an application logic concern.
// However, exploring categorical recursion schemes as a library is encouraged!)
package traversal
73 changes: 56 additions & 17 deletions traversal/fns.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,12 @@ type AdvVisitFn func(Progress, datamodel.Node, VisitReason) error
type VisitReason byte

const (
VisitReason_SelectionMatch VisitReason = 'm' // Tells AdvVisitFn that this node was explicitly selected. (This is the set of nodes that VisitFn is called for.)
VisitReason_SelectionParent VisitReason = 'p' // Tells AdvVisitFn that this node is a parent of one that will be explicitly selected. (These calls only happen if the feature is enabled -- enabling parent detection requires a different algorithm and adds some overhead.)
VisitReason_SelectionCandidate VisitReason = 'x' // Tells AdvVisitFn that this node was visited while searching for selection matches. It is not necessarily implied that any explicit match will be a child of this node; only that we had to consider it. (Merkle-proofs generally need to include any node in this group.)
// VisitReason_SelectionMatch tells AdvVisitFn that this node was explicitly selected. (This is the set of nodes that VisitFn is called for.)
VisitReason_SelectionMatch VisitReason = 'm'
// VisitReason_SelectionParent tells AdvVisitFn that this node is a parent of one that will be explicitly selected. (These calls only happen if the feature is enabled -- enabling parent detection requires a different algorithm and adds some overhead.)
VisitReason_SelectionParent VisitReason = 'p'
// VisitReason_SelectionCandidate tells AdvVisitFn that this node was visited while searching for selection matches. It is not necessarily implied that any explicit match will be a child of this node; only that we had to consider it. (Merkle-proofs generally need to include any node in this group.)
VisitReason_SelectionCandidate VisitReason = 'x'
)

// Progress tracks a traversal as it proceeds. It is used initially to begin a traversal, and it is then passed to the visit function as the traversal proceeds.
Expand All @@ -46,25 +49,56 @@ const (
// Currently a best-guess approach is used to try and have the preloader adhere to the budget, but with typical real-world graphs, this is likely to be inaccurate.
// In the case of inaccuracies, the budget will be properly applied to the traversal-proper, but the preloader may receive a different set of links than the traversal-proper will.
type Progress struct {
Cfg *Config
Path datamodel.Path // Path is how we reached the current point in the traversal.
LastBlock struct { // LastBlock stores the Path and Link of the last block edge we had to load. (It will always be zero in traversals with no linkloader.)
// Cfg is the configuration for the traversal, set by user.
Cfg *Config

// Budget, if present, tracks "budgets" for how many more steps we're willing to take before we should halt.
// Budget is initially set by user, but is then updated as the traversal proceeds.
Budget *Budget

// Path is how we reached the current point in the traversal.
Path datamodel.Path

// LastBlock stores the Path and Link of the last block edge we had to load. (It will always be zero in traversals with no linkloader.)
LastBlock struct {
Path datamodel.Path
Link datamodel.Link
}
PastStartAtPath bool // Indicates whether the traversal has progressed passed the StartAtPath in the config -- use to avoid path checks when inside a sub portion of a DAG that is entirely inside the "not-skipped" portion of a traversal
Budget *Budget // If present, tracks "budgets" for how many more steps we're willing to take before we should halt.
SeenLinks map[datamodel.Link]struct{} // Set used to remember which links have been visited before, if Cfg.LinkVisitOnlyOnce is true.

// PastStartAtPath indicates whether the traversal has progressed passed the StartAtPath in the config -- use to avoid path checks when inside a sub portion of a DAG that is entirely inside the "not-skipped" portion of a traversal
PastStartAtPath bool

// SeenLinks is a set used to remember which links have been visited before, if Cfg.LinkVisitOnlyOnce is true.
SeenLinks map[datamodel.Link]struct{}
}

// Config is a set of options for a traversal. Set a Config on a Progress to customize the traversal.
type Config struct {
Ctx context.Context // Context carried through a traversal. Optional; use it if you need cancellation.
LinkSystem linking.LinkSystem // LinkSystem used for automatic link loading, and also any storing if mutation features (e.g. traversal.Transform) are used.
LinkTargetNodePrototypeChooser LinkTargetNodePrototypeChooser // Chooser for Node implementations to produce during automatic link traversal.
LinkVisitOnlyOnce bool // By default, we visit across links wherever we see them again, even if we've visited them before, because the reason for visiting might be different than it was before since we got to it via a different path. If set to true, track links we've seen before in Progress.SeenLinks and do not visit them again. Note that sufficiently complex selectors may require valid revisiting of some links, so setting this to true can change behavior noticably and should be done with care.
StartAtPath datamodel.Path // If set, causes a traversal to skip forward until passing this path, and only then begins calling visit functions. Block loads will also be skipped wherever possible.
Preloader preload.Loader // Receives a list of links within each block prior to traversal-proper. This can be used to asynchronously load blocks that will be required at a later phase of the retrieval, or even to load blocks in a different order than the traversal would otherwise do. Preload calls are not de-duplicated, it is up to the receiver to do so if desired. Beware of using both Budget and Preloader! See the documentation on Progress for more information.
// Ctx is the context carried through a traversal.
// Optional; use it if you need cancellation.
Ctx context.Context

// LinkSystem is used for automatic link loading, and also any storing if mutation features (e.g. traversal.Transform) are used.
LinkSystem linking.LinkSystem

// LinkTargetNodePrototypeChooser is a chooser for Node implementations to produce during automatic link traversal.
LinkTargetNodePrototypeChooser LinkTargetNodePrototypeChooser

// LinkVisitOnlyOnce controls repeat-link visitation.
// By default, we visit across links wherever we see them again, even if we've visited them before, because the reason for visiting might be different than it was before since we got to it via a different path.
// If set to true, track links we've seen before in Progress.SeenLinks and do not visit them again.
// Note that sufficiently complex selectors may require valid revisiting of some links, so setting this to true can change behavior noticably and should be done with care.
LinkVisitOnlyOnce bool

// StartAtPath, if set, causes a traversal to skip forward until passing this path, and only then begins calling visit functions.
// Block loads will also be skipped wherever possible.
StartAtPath datamodel.Path

// Preloader receives links within each block prior to traversal-proper by performing a lateral scan of a block without descending into links themselves before backing up and doing a traversal-proper.
// This can be used to asynchronously load blocks that will be required at a later phase of the retrieval, or even to load blocks in a different order than the traversal would otherwise do.
// Preload calls are not de-duplicated, it is up to the receiver to do so if desired.
// Beware of using both Budget and Preloader! See the documentation on Progress for more information on this usage and the likely surprising effects.
Preloader preload.Loader
}

// Budget is a set of monotonically-decrementing "budgets" for how many more steps we're willing to take before we should halt.
Expand All @@ -75,9 +109,14 @@ type Config struct {
// If you set any budgets (by having a non-nil Progress.Budget field), you must set some value for all of them.
// Traversal halts when _any_ of the budgets reaches zero.
// The max value of an int (math.MaxInt64) is acceptable for any budget you don't care about.
//
// Beware of using both Budget and Preloader! See the documentation on Progress for more information on this usage and the likely surprising effects.
type Budget struct {
NodeBudget int64 // A monotonically-decrementing "budget" for how many more nodes we're willing to visit before halting.
LinkBudget int64 // A monotonically-decrementing "budget" for how many more links we're willing to load before halting. (This is not aware of any caching; it's purely in terms of links encountered and traversed.)
// NodeBudget is a monotonically-decrementing "budget" for how many more nodes we're willing to visit before halting.
NodeBudget int64
// LinkBudget is a monotonically-decrementing "budget" for how many more links we're willing to load before halting.
// (This is not aware of any caching; it's purely in terms of links encountered and traversed.)
LinkBudget int64
}

// Clone returns a copy of the budget.
Expand Down

0 comments on commit d013fc0

Please # to comment.