Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

StridedArray should be an abstract class, not a union #2345

Closed
stevengj opened this issue Feb 17, 2013 · 32 comments
Closed

StridedArray should be an abstract class, not a union #2345

stevengj opened this issue Feb 17, 2013 · 32 comments
Labels
needs decision A decision on this change is needed
Milestone

Comments

@stevengj
Copy link
Member

Currently, Julia defines

typealias StridedArray{T,N,A<:Array}  Union(Array{T,N}, SubArray{T,N,A})

It seems like it would be better to define

abstract StridedArray{T,N} <: AbstractArray{T,N}

and have Array and SubArray both inherit from StridedArray.

The reason is that I would like to define a new StridedArray type (wrapping a NumPy ndarray). There is no reason, as far as I can tell, that I can't provide all the functionality of a SubArray and be used everywhere the latter is usable, but currently it does not seem possible to do this. Changing to abstract StridedArray would fix this.

@JeffBezanson
Copy link
Member

See also #987 .

@stevengj
Copy link
Member Author

It would be great to have a whole hierarchy, but we don't need to have it all at once. We can start with StridedArray since this is used everywhere now (just as a Union) and add more subtypes later. This will also make a good laboratory to see if any warts appear when the change to an abstract type is made.

@ViralBShah
Copy link
Member

See also #1276

@ViralBShah
Copy link
Member

One difficulty here to even experiment is that AbstractArray and Array are defined in C.

Once jb/immutable is merged, perhaps we can have these definitions in abstractarray.jl and array.jl. Also, if AbstractArray can contain fields, we may be able to do away with StridedArray altogether and just have Array and SubArray.

@JeffBezanson
Copy link
Member

No, there might be other StridedArray types like NumPyArray. We might also need a StridedSubArray that only views StridedArrays, and only with index types that preserve stridedness. There can be a more general SubArray that allows any indexes.

@ViralBShah
Copy link
Member

Ok. How about getting rid of Array in favour of StridedArray as the default storage type? We could then have SubArray <: StridedArray <: AbstractArray.

@pao
Copy link
Member

pao commented Mar 11, 2013

Renaming Array will break a lot of code and seems likely to confuse new users; also in the hierarchy you show that would make StridedArray abstract, wouldn't it?

@ViralBShah
Copy link
Member

Just thinking if we can reduce one of these things. Array could be made to include the functionality of StridedArray, for example - in order to avoid changing the name.

@stevengj
Copy link
Member Author

You need at least three types (regardless of what you call them):

  • An AbstractArray type for any data structure that can be accessed as a[i1,i2,...], regardless of how it is stored (or whether it is stored at all, in the case of Range).
  • An abstract StridedArray <: AbstractArray type, representing any data structure where a[i1,i2,...] is accessed as unsafe_ref(ptr, 1 + (i1-1)*stride1+(i2-1)*stride2+...) for some pointer and strides. This needs to be abstract so that different concrete types (e.g. wrappers for NumPy arrays) can be bolted on underneath.
  • A concrete Array <: StridedArray type, which is an implementation of StridedArray with contiguous column-major data.

As much as possible, our routines should work with the most abstract possible type. For StridedArray types, we should have special-case code for many operations when the strides correspond to contiguous column-major order or contiguous row-major order (e.g. things like copying, scaling, etcetera, can be more efficient).

The LAPACK routines should work with StridedArray types where possible. Note that many of the LAPACK routines require the stride of the first dimension to be 1, but the stride of the second dimension can be arbitrary. Alternatively, they can also work if the stride of the second dimension is 1, as long as you interpret the data as transposed, possibly transposing the output if necessary. The routines could make an Array copy if the input is not in one of these two cases.

In order to avoid the need for explicit transposes in many cases, it might be nice to have a concrete TransposedArray <: StridedArray type that represents an Array with transposed (i.e. row-major/C-order, i.e. reversed dimensions) storage. But this is a second-order priority compared to making StridedArray abstract. (Once StridedArray is abstract and most operations work on StridedArray arguments, adding new StridedArray subclasses will be a lot less work.)

JeffBezanson added a commit that referenced this issue Jan 6, 2014
ref #987, #2345

this seems to be the less-controversial core of what we need.
this commit just adds the types and doesn't do anything else with them yet.
tknopp pushed a commit to tknopp/julia that referenced this issue Jan 13, 2014
add StoredArray and DenseArray to emacs mode
@timholy
Copy link
Member

timholy commented Sep 4, 2014

Thoughts triggered by #8235 (comment) reminded me that because StridedArray is a Union, it's not extensible. Perhaps we can fix this.

Example:

  • an Image is a wrapper for an AbstractArray,
type Image{T,N,A<:AbstractArray} <: AbstractArrary{T,N}
    data::A
    properties::Dict
end
  • I have real-world cases (most of the data collected in my lab) where there is reason not to restrict A to be a DenseArray, hence an Image can't be a DenseArray. For this reason, an Image is not a StridedArray.
  • However, for most people, most of the time, an Image will in fact wrap a DenseArray. It seems to be a shame not to get "credit" for that, especially if one thinks about passing image data to C functions.

One can fix it this way: rewrite

function somefunction(A::StridedArray, ...)
    ...
end

as

somefunction(A::AbstractArray, ...) = _somefunction(A, ..., isstrided(A))
function _somefunction(A, ..., ::Type{TypeConst{true}})
    ...
end

using the definitions

isstrided(A::DenseArray) = TypeConst{true}
isstrided{T,N,A<:DenseArray}(A::View{T,N,A}) = TypeConst{true}
isstrided{T,N,A<:DenseArray}(img::Image{T,N,A}) = TypeConst{true}
isstrided(A) = TypeConst{false}

It's a little ugly, but perhaps with a @stridefunction macro it wouldn't be too bad, at least for functions that have all-strided arguments. (An alternative is #8154.)

Thoughts?

@stevengj
Copy link
Member Author

stevengj commented Sep 4, 2014

@timholy, could you expand upon why your images cannot be DenseArrays? Note that DenseArray does not mean contiguous, it means "arrays that are laid out at regular offsets in memory" (i.e. strided).

@timholy
Copy link
Member

timholy commented Sep 4, 2014

Briefly, our raw data files represent 4d images (3d over time), but they require some computation to be performed upon them before you get out what you'd call an "image". So there's no physical memory address corresponding to a single voxel at a particular time.

@stevengj
Copy link
Member Author

stevengj commented Sep 5, 2014

Why would this be a type of StridedArray, then (if StridedArray were extensible)? Sounds like a poster child for AbstractArray...

@timholy
Copy link
Member

timholy commented Sep 5, 2014

It's not. But other things that are Images are StridedArrays. The point is that Images as a whole can't be, but certain Images are.

@nalimilan
Copy link
Member

@timholy Maybe you could create two different types of Images depending on whether they're StridedArrays or not, deciding which one to create transparently in a common constructor?

@timholy
Copy link
Member

timholy commented Sep 5, 2014

And then how do I share code among the different types of Images?

@timholy
Copy link
Member

timholy commented Sep 5, 2014

Maybe it will be easier to appreciate this if I say it this way: this proposal is a traits-based alternative to multiple inheritance, something people have been wanting in Julia for a long time. In some ways it's nicer, because I don't have to create specialized types that live at the intersection of parents, I can just declare that, based on their parameters, some objects have all the properties of two different types of objects.

But this does make me think that a simpler way of achieving this is needed. I wonder if #8154 is worth a second thought?

@nalimilan
Copy link
Member

And then how do I share code among the different types of Images?

By writing generic code for the Image type, which would work for both StridedImage and NonStridedImage? I don't know whether that's reasonable, that was just a guess.

Your traits idea is interesting, but it would be good to see what other use cases require it. Is it a common pattern?

@timholy
Copy link
Member

timholy commented Sep 5, 2014

The advantages of a traits-based approach is that it scales better. Let's say I have code in Base, package A, and package B that are all useful for my package C. For a "conceptual catagory" I could create 4 types:

CBase <: AbstractTypeDefinedInBase
CA <: AbstractTypeDefinedInA
CB <: AbstractTypeDefinedInB
CC <: AbstractTypeThatDoesn'tFitAnyOfThese

typealias CType Union(CBase,CA,CB,CC)

but this doesn't scale very nicely, and foists similar headaches on anyone else who wants to create a package D that interacts with my C, and has needs I didn't anticipate.

@timholy
Copy link
Member

timholy commented Sep 5, 2014

@nalimilan, see issue #5. The only single-digit issue left open. Can I close it? Oh, please, let me!!

@JeffBezanson
Copy link
Member

@timholy that is really clever. I closed #8154 not because we won't do it, but because I think we will get it in any case from other planned work.

Dispatch on values might not even be necessary for this. For example

f(a, storage::Contiguous) = ...
f(a, storage::Strided) = ...
f(a) = f(a, getstorage(a))

...

getstorage(::MyType) = Contiguous()

@timholy
Copy link
Member

timholy commented Sep 5, 2014

Right, I don't actually care whether the user writes functions that return values or types, and I like your syntax. In cases where it's most likely to be a true/false decision, there might be some value in auto-generating the variant of f with the extra parameter. In that case, f(a::isstrided(a)) wouldn't actually have to be something in the language, just something that the parser expanded to

f(a) = ##12345_f(a, TypeConst{isstrided(a)})
##12345_f(a, ::Type{TypeConst{true}})

EDIT: although I guess you have to worry about what happens with the false branch!

Another bit of "collateral damage" from this suggestion is that triangular dispatch just drops out, if one wants to allow such crazy syntax:

f(a::(typeof(a) <: ColorValue && eltype(a) <: FloatingPoint))

That said, I 100% agree with you that we also need a way to write triangular types. So if we implement your proposal, this doesn't actually count as an attraction of this one.

@timholy
Copy link
Member

timholy commented Sep 5, 2014

But yes, I'm excited to realize we've had Multiple Inheritance all along, and just didn't recognize it.

@JeffBezanson
Copy link
Member

There seems to be a lot of potential for nifty syntactic sugar here. For example

f(A::(AbstractArray, storage=Contiguous))

could expand to

f(A::AbstractArray) = f((A,storage(A)))
function f(tmp::(AbstractArray,Contiguous))
    A = tmp[1]
end

@timholy
Copy link
Member

timholy commented Sep 5, 2014

Ooh, I like that a lot. In generating the expanded version, it clearly denotes what part of the typing should be retained and what part results in extra arguments. Presumably my ColorValue example could turn into

f(a::(ColorValue, eltype<:FloatingPoint))

which is much nicer in terms of the signature of the expanded function.

@JeffBezanson
Copy link
Member

This trick is really pretty amazing.

@mauro3
Copy link
Contributor

mauro3 commented Sep 16, 2014

I was intrigued by Tim's trick and implemented a way to do traits/interfaces with it:
https://gist.github.com/mauro3/f3431a24d2737cd932ac
I checked the LLVM IR and the trait-ed functions seem to produce the same IR as normal ones, which is encouraging.

I think this does indeed implement traits but maybe I missunderstood. @timholy, could you comment?

@timholy
Copy link
Member

timholy commented Sep 17, 2014

I'm not sure I qualify as the arbiter of what counts as "traits" and what does not. Your gist definitely shows off some relevant points, but to the extent that I understand your intentions it seems to have a slightly different thrust at least than how I was thinking of it. My thinking was that traits would generally be defined by a set of functions, and that one application would be to "steer" dispatch to appropriate implementations.

There are examples of this idea in the discussion above. To throw another one out there, let's say you want to be able to dispatch on whether an AbstractArray has efficient linear indexing. The "classic" Julia approach might be

typealias HasFastLinear Union(Array, BitArray, ContiguousView)

function myfunction(A::HasFastLinear)
     # implementation for AbstractArrays with fast linear indexing
end

function myfunction(A::AbstractArray)
    # fallback version
end

You already seem to understand that the big problem with this is that it's not extensible---if I define a new type of AbstractArray that also has fast linear indexing, I can't use the first implementation unless I copy it.

The way I think about traits, they are just functions (I'd classify things like typemax, widen, sizeof as trait-functions, too). You could define the relevant trait like this:

type TypeConst{T} end  # could be used for many traits besides has_fast_linear

has_fast_linear(::AbstractArray) = TypeConst{false}()
has_fast_linear(::Array) = TypeConst{true}()
has_fast_linear(::BitArray) = TypeConst{true}()
has_fast_linear(::ContiguousView) = TypeConst{true}()

and then write these variants of your function

myfunction(A::AbstractArray) = _myfunction(A, has_fast_linear(A))
function _myfunction(A::AbstractArray, ::TypeConst{true})
    # version optimized for an object that has fast linear indexing
end

function _myfunction(A::AbstractArray, ::TypeConst{false})
    # version for when it doesn't have fast linear indexing
end

In contrast to the Union-based approach, this can be extended to a new type simply by defining has_fast_linear(::MyNewAA) = TypeConst{true}(). In this view, an interface is simply a way of designing an API where you leverage a specified set of trait functions---it doesn't necessarily require any new types.

Jeff's syntax is much more clever than this because he groups the object and its traits together into a tuple. That solves the problem of keeping track of which extra arguments serve as traits for which objects.

I don't know if this really answers your question; you may be thinking about a whole new direction.

@timholy
Copy link
Member

timholy commented Sep 25, 2014

@JeffBezanson, out of curiosity, what are your thoughts on the desirability, & timing, of adding sugar for the "traits trick"? I'm asking because I'm mulling over a design issue: in Images, I implemented loading of most grayscale images as an Array{Gray{T},N}, where T is some numeric type (likely a Ufixed). Gray is just a wrapper, by analogy with RGB from Color, that allows me to dispatch based on colorspace. We also still handle raw numeric images, but in that case colorspace gets looked up in a dictionary of properties---Gray lets me skip that for known-grayscale images. I'd call this advantage "modest" (the pixelwise operations dominate execution time), but it seems like a consistent extension of the design of Color.

The problem I'm seeing right now is that while Gray <: ColorValue, which is nice, we don't also have Gray <: Number, which would also be nice. In particular, there are quite a large number of math functions, and their vectorized versions, that I'd need to provide implementations for. One alternative is to use this traits/multiple inheritance trick to endow Gray with the properties of Number. (Another is to go back to encoding grayscale images as raw numeric arrays.)

Neither path is exactly easy, of course, but that's not the main issue: I'm just wanting to put my effort in the right place. Is this something you're sufficiently interested in that it might happen during 0.4, or is this 0.5, or "never," material? I'd also understand if you're unsure, in which case I'll make up my own mind.

@JeffBezanson
Copy link
Member

So far I'm pretty sure I like this better than actual multiple inheritance, and it is much easier to implement as well. It might take a while to get widespread buy-in though. I wouldn't depend on this going in 0.4.

@timholy
Copy link
Member

timholy commented Sep 30, 2014

Sounds reasonable; basically I agree on all counts. Even though it's "trivial," in practice I think it's a pretty big change to how one thinks about types and dispatch in Julia---we might end up re-architecting quite a lot of stuff. Sounds like 0.5/0.6 material to me.

@mauro3
Copy link
Contributor

mauro3 commented Oct 21, 2014

@timholy, thanks for the comments. I got it all wrong in above gist: https://gist.github.com/mauro3/f3431a24d2737cd932ac

But after some more thought I think I got the gist of the trait trick: #6975 (comment)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
needs decision A decision on this change is needed
Projects
None yet
Development

No branches or pull requests

7 participants