Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Implement a unique function returning only the unique values in a vector. #940

Open
loiseaujc opened this issue Feb 24, 2025 · 4 comments
Open
Labels
idea Proposition of an idea and opening an issue to discuss it

Comments

@loiseaujc
Copy link

loiseaujc commented Feb 24, 2025

Motivation

Recently, I've run into the problem of extracting unique values in a vector (of any integer, real or complex type or possibly even character). Consider for instance the following vector x = [1, 2, 3, 3, 4]. What I'd need is a function taking x as input and returning the vector y = [1, 2, 3, 4] as output. The interface for a real-valued vector could be as simple as

pure function unique(x, sorted) result(y)
     real(dp), intent(in) :: x(:)
     !! Array whose unique values need to be extracted.
     logical(lk), optional, intent(in) :: sorted
     !! Whether the output vector needs to be sorted or not (default .false. ?)
     real(dp), allocatable :: y(:)
     !! Vector containing only the unique values from x.
end function

The output vector could be sorted or not, depending on the user's choice. I know that there are no Fortran intrinsic functions for that purpose, but I ain't sure something like that is already available in stdlib. If I'm wrong, could anyone point me to the correct function?

Prior Art

  • In Matlab, there is the unique function whose description is available here.
  • Python has the set function taking as input a list and returning only the unique elements of this list.
  • Numpy has np.unique whose description is available here.
  • @jacobwilliams provides an integer-based implementation on his blog (here).

Additional Information

Both Matlab and Numpy's implementations cover a relatively large set of cases (1D-array, multidimensional arrays, different types, etc) and return values (the unique elements, the corresponding indices, indices to the reconstruct the original array from this unique set, etc).

I don't know if absolutely all these cases need to be covered (at least as a starting point). I would probably recommend to start with the simplest ones (i.e. only input vectors and output vector with the unique elements) as these are probably the most common situations where a unique function might be needed. That would include integer, real, complex and character 1D-arrays.

@loiseaujc loiseaujc added the idea Proposition of an idea and opening an issue to discuss it label Feb 24, 2025
@loiseaujc
Copy link
Author

I'm not sure either into which module this utility function should be included. Maybe stdlib_sorting?

@perazz
Copy link
Member

perazz commented Feb 24, 2025

Good idea @loiseaujc, please note there is an open discussion at #670, should we merge this issue with that one?

@loiseaujc
Copy link
Author

Oh sure! I completely overlooked this issue.

@demoncoder-crypto
Copy link

is this issue open to solve? would love to contribute

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
idea Proposition of an idea and opening an issue to discuss it
Projects
None yet
Development

No branches or pull requests

3 participants