Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Clarify how grids should be flattened #118

Open
aaraney opened this issue Feb 17, 2023 · 11 comments
Open

Clarify how grids should be flattened #118

aaraney opened this issue Feb 17, 2023 · 11 comments

Comments

@aaraney
Copy link

aaraney commented Feb 17, 2023

In reading through the BMI documentation and best practices, I came across several places that helpfully note that arrays are always flattened in BMI, even if the model uses dimensional variables (one instance). The best practice documentation says the following about the matter:

BMI functions always use flattened, one-dimensional arrays. This avoids any issues stemming from row/column-major indexing when coupling models written in different languages. It’s the developer’s responsibility to ensure that array information is flattened/redimensionalized in the correct order.

However, I was not able to find documentation that says how arrays are flattened, meaning are they flattened using row-major (C) or column-major (Fortran) ordering? Without knowing how arrays are flattened, it is ambiguous as to how they should be unflattened. In practice, if coupled models use different flattening conventions their output will almost certainly be spurious.

For example how should the following 2x2 array be flattened and then unflattened?

[
  [1, 2],
  [3, 4],
]

# 1. row-major flattened
[1, 2, 3, 4]

# 2. column-major flattened
[1, 3, 2, 4]

# 1. unflattened in column-major
[
  [1, 3],
  [2, 4]
]

# 2. unflattened in row-major
[
  [1, 3],
  [2, 4]
]
@mcflugen
Copy link
Member

@aaraney This is an excellent question and one that is not addressed adequately (at all?) in the docs.

BMI doesn't really take a position on how one might flatten a multidimensional array, only that it is done in a consistent way. For example, if your BMI returns two quantities through a call to get_value,

get_value("val1", array1);
get_value("val2", array2);

the value at array1[n] corresponds to that at array2[n].

If your values are on a grid, the ordering of those values must also be consistent with the ordering of your grid elements. In the case of an unstructured grid, for example, the x-coordinates of the elements are given by get_grid_x,

get_grid_x(grid_id, x);
get_value("val1", array1);

In this case, the x-coordinate for the n-th element of array1 is x[n].

There is some ambiguity in the case of a uniform rectilinear grid (i.e. raster grid) since the BMI defines the topology, not through get_grid_x and get_grid_y, but through get_grid_shape and get_grid_spacing, etc. Thus far we have assumed that a tool that uses these functions to get the coordinates of a model's grid elements, does so row-by-row. This means that if a model stores its values internally column-by-column, then it will have to copy those values to transpose them when writing into a buffer provided by a call to get_value.

To get around this, one could potentially treat their raster grid as an unstructured mesh and then provide the grid elements column-by-column.

The BMI has prioritized hiding a model's implementation details (e.g. how values are stored in memory—row-major, column-major, non-unit stride, etc.) over performance (i.e. extra copies). These are certainly topics for discussion, though.

Did that help to answer you question? If not, or you have more questions, I would be happy to discuss them here or we could arrange a meeting sometime.

@mdpiper
Copy link
Member

mdpiper commented Feb 20, 2023

We could add language to the docs to the effect of

BMI doesn't specify how to flatten arrays. This is left to the implementation.

@aaraney If you have suggestions on how to amend (and improve) the docs, let us know!

@hellkite500
Copy link

BMI doesn't specify how to flatten arrays. This is left to the implementation.

The problem with this is that without an agreed upon definition, you cannot guarantee the interoperability of any any data between any two BMI models/components.

@mattw-nws
Copy link

Thus far we have assumed that a tool that uses these functions to get the coordinates of a model's grid elements, does so row-by-row. This means that if a model stores its values internally column-by-column, then it will have to copy those values to transpose them when writing into a buffer provided by a call to get_value.

This is basically what we are unsure of, or were expecting to be part of the interface specification... is that assumption documented anywhere, say for PyMT or anywhere else in CSDMS?

The BMI has prioritized hiding a model's implementation details (e.g. how values are stored in memory—row-major, column-major, non-unit stride, etc.) over performance (i.e. extra copies).

Bingo... we don't care how the model is managing its data internally, but to pass data between two models we have to either know how the data will come out of get_value (and how it expects things to be provided in set_value) or have a way for the model to communicate information about how the data is arranged for passing in and out. Inside it can be columns, interleaved, or a different endianness even I suppose, but we need a way of knowing what it should be when coming through the interface functions.

@aaraney
Copy link
Author

aaraney commented Feb 21, 2023

First off, thanks for the quick reply, @mcflugen and @mdpiper! Just so we are on the same page, you answered my question. However, in line with the comments my colleagues (@hellkite500 and @mattw-nws) made above it would be extremely helpful if the BMI took a stance on this issue or provided a convention through something like get_value for determining how a grid was flattened. As @hellkite500's noted above, you can't guarantee model interoperability without knowing a given model's flattening convention a priori.

To hopefully move the conversation forward, I see several ways work around this. Im sure i'll miss a few other obvious ones along the way, so please add other viable options you think of as well.

  • BMI takes a stance on how matrices / tensors are to be flattened. From the comments above, namely,

The BMI has prioritized hiding a model's implementation details (e.g. how values are stored in memory—row-major, column-major, non-unit stride, etc.) over performance (i.e. extra copies). These are certainly topics for discussion, though.

this option seems to break with some of the core goals of BMI and is likely not viable.

  • BMI introduces a new getter function that returns a models flattening convention. Something like int get_grid_flatten_order(in int grid, out string type) although it might not be desirable to include in grid in this context if BMI wants to implicitly enforce that a given model flattens all grids using the same order convention. This feels like the best option, however introducing a new function to the spec also feels like a heavy lift. I would hope this is something we can discuss and the BMI will consider in the future.

  • Lastly, as a compromise solution, BMI could introduce a conventional way to query for flattening order using get_value. This would mean that grid flattening order would become a model input variable that conventionally specifies the order of a flattened grid. It's not the most elegant solution, however I it would not require a major version change to BMI, so that is a plus.

@mcflugen
Copy link
Member

@hellkite500

BMI doesn't specify how to flatten arrays. This is left to the implementation.

The problem with this is that without an agreed upon definition, you cannot guarantee the interoperability of any any data between any two BMI models/components.

BMI doesn't specify how to flatten arrays if the model doesn't implement the grid functions. It is the get_grid functions that define the ordering of values and so how one would ravel/unravel data.

@mattw-nws

This is basically what we are unsure of, or were expecting to be part of the interface specification... is that assumption documented anywhere, say for PyMT or anywhere else in CSDMS?

This is just barely touched upon in the get_grid section of the docs but, now that I look it over, it is not at all clear how this ordering would affect the ordering of data passed through the get_value and set_value functions. When we talked about ij ordering, we had it in our minds that the ordering of the grid would be row-by-row but we didn't actually say that. The data ordering for both the set_value and get_value are defined through the get_grid functions.

For models that don't implement the get_grid functions, even if you knew the ordering, without a priori knowledge of the models you couldn't confidently exchange data between them. Even if the two models ordered their values is the same way, you wouldn't know that those values were located at the same points.

@aaraney

BMI takes a stance on how matrices / tensors are to be flattened

I think we've actually done this in the get_grid methods for structured grids. Although these functions don't say how a model should store its values internally they do specify the ordering for exchange through get_value and set_value.

Do you think that a better description of the get_grid methods and how they relate to the get_value and set_value functions would help clear up this confusion? That is, using the get_grid_shape and get_grid_spacing functions one can determine—by assuming row-by-row ordering—the locations of a model's elements and so how they would be flattened.

BMI introduces a new getter function that returns a models flattening convention.

If we were to do something like this, I think it would be a part of the get_grid functions. Using numpy as a guide, it could be get_grid_order, would only be implemented for structured grids, and the order could be something like "C" or "F".

To summarize:

  • The ordering of data are provided through the get_grid functions
  • For structured grids, it's not clear what the ordering of elements are. Two potential solutions are:
    • Introduce a new function (e.g. get_grid_order) that specifies the ordering.
    • Update the docs to specify the element ordering, which is currently row major.

Sorry for the confusion on this. Are we getting closer to a solution?

@hrajagers
Copy link
Collaborator

hrajagers commented Feb 21, 2023 via email

@aaraney
Copy link
Author

aaraney commented Feb 21, 2023

Just so we are all on the same page, I am both concerned with the visual ordering (think nested loop over a matrix) as well as the in-memory ordering.

It is the get_grid functions that define the ordering of values and so how one would ravel/unravel data.

I think this is where I am a little lost. In my review of the get_grid functions, it does seem possible to to infer if a matrix is in row major or column major order, by looking at the values of the x and y axis (have to look at both), however that assumes that the values of x or y are strictly monotonic. This implies that grid spacing is always non-negative or always negative. However, I am not seeing that that the get_grid functions define the ordering. Can you explain what im missing, @mcflugen?

... When we talked about ij ordering, we had it in our minds that the ordering of the grid would be row-by-row but we didn't actually say that. The data ordering for both the set_value and get_value are defined through the get_grid functions.

Ahhh, okay so the implicit assumption is that matrices are in row-major order.

Do you think that a better description of the get_grid methods and how they relate to the get_value and set_value functions would help clear up this confusion? That is, using the get_grid_shape and get_grid_spacing functions one can determine—by assuming row-by-row ordering—the locations of a model's elements and so how they would be flattened.

Yes! And it sounds like, from what you stated, that the BMI expects that arrays are flattened and exchanged using row-major order (that's what im taking row-by-row to mean). If that is the case, I think adding that to the docs would be tremendously helpful.

If we were to do something like this, I think it would be a part of the get_grid functions. Using numpy as a guide, it could be get_grid_order, would only be implemented for structured grids, and the order could be something like "C" or "F".

👍🏻 that's in line with what I was thinking!

  • The ordering of data are provided through the get_grid functions

Im still not sold on this, but I asked for clarification above. Just marking it, more or less.

@aaraney
Copy link
Author

aaraney commented Feb 21, 2023

I always struggle with the word row major or column major. This suggests to me that the data would be ordered differently in C and Fortran, but that's only the case if you require that the size should be visually the same in the two languages.

I prefer to look at it from the other side. If you transfer data from Fortran to C, the values remain stored in the same order ... that's the most efficient for data exchange. To be even more pragmatic, let's just assume that we pass the c_loc of (the start of) a Fortran array, and hence nothing can change the data order.

Thanks for weighing in, @hrajagers! To your point, row-major and column-major are confusing terms. Im not sure that you meant this in you comment, but just so we are on the same page (and for the sake of other future readers), the in-memory representation of array's with rank greater than 1 in C and Fortran differ. Fortran uses column-major order for contiguous representation of a matrix and C uses row-major order.

# C repr
int a[2][2] = {
    {0, 1},
    {2, 3}
  }

# C in-memory
# 0: 0x7ff7bddebf10
# 1: 0x7ff7bddebf14
# 2: 0x7ff7bddebf18
# 3: 0x7ff7bddebf1c

# Fortran repr
integer, dimension(2,2) :: a
a(1,1) = 0
a(1,2) = 1
a(2,1) = 2
a(2,2) = 3

# Fortran in-memory
# 0:  1859615136
# 1:  1859615144
# 2:  1859615140 # notice 2 is stored after 0, not 1
# 3:  1859615148

Since the BMI only exchanges 1D arrays (get_value and set_value), it is ambiguous how exchanged arrays should be represented in memory (how they should be flattened). Consequently, without knowing how the array was flattened, you don't know how to reshape / index correctly into the exchanged data.

This will work fine if the shape information of the array is adjusted: the size [Imax,Jmax,Kmax] becomes [Kmax,Jmax,Imax] when crossing the language bridge.

Right, but that also means you have to know the origin language or convention of the model to operate on its data. That kind of defeats a major part of having the BMI in the first place, right?

A Fortran/MATLAB component doesn't have to
flip the data to be BMI compliant, but it should flip the size array when
it exposes the shape of the grid via the BMI API.

Just so we are on the same page, I am assuming you mean, get_grid_shape. If this is the case, how would you couple a C and Fortran model that run on the same grid without knowing that information a priori? That is, that they run on the same grid although their exposed grid shapes differ?

@hrajagers
Copy link
Collaborator

I didn't want to cause confusion @aaraney
Thank you for the example. Yes, if you define a [nx,ny]-sized array in FORTRAN and C then the quickest dimension in FORTRAN is the loop over nx, and in C it would be the loop over ny. So the memory storage would indeed be different if the size expression is visually the same. However, neither FORTRAN or C language, nor computer memory, has a formal definition of rows and columns. This terminology only gets its meaning if you associate the [nx,ny]-sized array with an nx x ny matrix or table. In our structured grid code some parts of the code have been written in FORTRAN and some parts in C/C++, so sometimes we're indexing n,m and sometimes m,n ... and sometimes just linearly using nm. In all cases I'm thinking of the same matrix, but I'm just changing the order of indexing.

If we want to keep the order of the values in the memory the same as in model component then the question how to flatten the array transforms into the question: how to properly describe the size of the array. The FORTRAN [nx,ny] array becomes a C [ny,nx] array. Yes, I agree with you that for the shape specification in get_grid_shape you need to either define the convention: order fastest to slowest dimension (like FORTRAN) or order slowest to fastest dimension (like C). The implicit convention seems to be now that the latter (C) convention should always be used independent of the programming language used for coding. If we all adhere to that the components should be interoperable. If some components deviate from this convention, the interoperability is broken.

There seems, however, to be a fundamental restriction in BMI in the sense that it assumes that for the uniform rectilinear grid (described using get_grid_shape, get_grid_spacing and get_grid_origin) and the rectilinear grid (described using get_grid_shape, get_grid_x, get_grid_y and get_grid_z) the fastest loop is always the loop over the cells in x direction, then in y direction and then in z direction. So, if your model has the z values closest together (which we're currently actually considering for our structured curvilinear code) then you are stuck and forced to reorder the data when communicating via BMI; luckily this restriction doesn't seem to hold for the structure quadrilateral grid type which covers the general structured curvilinear grids that we need.

@aaraney
Copy link
Author

aaraney commented Feb 24, 2023

Thanks for the dialog, @hrajagers! I think your response better frames the issue I was trying to describe originally. Namely, the BMI does not provide a way to describe the stride of an array. In your response, for clarity, you used the word size, but I think stride is slightly more accurate. However, I think we are talking about the same thing. Please correct me if i'm wrong :).

Earlier this week, in an offline discussion with my colleague, @mattw-nws, proposed describing the strides of an array with a new get_grid_strides BMI function. I think this is the direction you were headed @hrajagers?

@mattw-nws wrote:

I wonder if something like get_grid_strides would make sense, with an array of integers which are either the number of multiples of get_var_itemsize (preferable) or actual number of bytes to advance for one element in that direction. E.g., for a 3D structured grid that has x,y,z dimensions 3,4,5 get_grid_strides might return [1, 4, 20] or [20, 4, 1] depending on the ordering. You could even allow negative numbers to imply u,v convention for the Y direction, for example.

A get_grid_strides function would also address your observation, @hrajagers, that:

There seems, however, to be a fundamental restriction in BMI in the sense that it assumes that for the uniform rectilinear grid (described using get_grid_shape, get_grid_spacing and get_grid_origin) and the rectilinear grid (described using get_grid_shape, get_grid_x, get_grid_y and get_grid_z) the fastest loop is always the loop over the cells in x direction, then in y direction and then in z direction. So, if your model has the z values closest together (which we're currently actually considering for our structured curvilinear code) then you are stuck and forced to reorder the data when communicating via BMI;

Admittedly, initially I was a little skeptical of get_grid_strides. I had some reservations about the possibility of a compiler optimizing the layout of an array, such that, the strides were unexpectedly changed. However, I have settled those reservations and I think get_grid_strides is a really promising solution to this problem. It solves the problem of describing the layout of an array via BMI and makes it nearly unambiguous how to index into that collection. I say nearly because there may be situations where an offset is also required to correctly index into the collection. An example where an offset might be required is providing a subset of a model's grid through BMI. Admittedly, this is a niche case but you could imagine valid use cases for doing this.

For concreteness, revisiting my original example and assuming a[y][x] indexing, strides would be defined like:

[
  [1, 2], # assume 4 byte int
  [3, 4],
]

# strides in C order / row-major / slowest to fastest
[8, 4]
# more generally
[len(dim x) * size_t, size_t]

# strides in Fortran order / column-major / fastest to slowest
[4, 8]
# more generally
[size_t, len(dim y) * size_t]

# indexing using strides
index = y_index * y_stride + x_index * x_stride 

While thinking about get_grid_strides, it appears another fundamental BMI assumption is x, y, z, and grid values are stored as a group of parallel arrays. If for example, you store your model's state as an array of structs with members (not saying this is a good idea) {x: u32, y: u32, z: u32, /* some data fields */}, a call to get_value_ptr would required copying all the variable's values into an array and providing a pointer to that array.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants