-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Unique values in a 1-D array and their locations #670
Comments
I wonder why you say that this code is inefficient. I tried to think of
different implementations, but the code as written seems to be O(N**2),
with sorted data you might get down to O(N*ln(N)), but that would presume
the data can be sorted (in that case, you would need more functionality
than just an equality operation). Just wondering if you have a particular
algorithm in mind.
Op za 30 jul. 2022 om 16:57 schreef Beliavsky ***@***.***>:
… Given a 1-D array of character variables, integers, or a user-defined type
such as dates, if there are repeated values one may want to create factors
<https://www.stat.berkeley.edu/~spector/s133/factors.html> (the R term)
that are integer variables corresponding to the unique values. Below is a
small code that does this inefficiently for an array of characters. A
library subroutine that works for any data type for which an equality
operator is defined would be useful.
module factors_mod
implicit none
private
public :: compress
contains
pure subroutine compress(words,factors,values)
character (len=*) , intent(in) :: words(:) ! (n)
integer , intent(out) :: factors(:) ! (n) integer values corresponding to words(:)
character (len=len(words)), allocatable, intent(out) :: values(:) ! unique values of words(:)
integer :: i,n,nfac,imatch
n = size(words)
if (size(factors) /= n) error stop "in compress, need size(words) == size(factors)"
if (n < 1) then
allocate (values(0))
return
end if
allocate (values(n))
values(1) = words(1)
factors(1) = 1
nfac = 1
do i=2,n
imatch = findloc(values(:nfac),words(i),dim=1)
if (imatch == 0) then
nfac = nfac + 1
factors(i) = nfac
values(nfac) = words(i)
else
factors(i) = imatch
end if
end do
values = values(:nfac)
end subroutine compress
end module factors_mod
program xfactors
use factors_mod, only: compress
implicit none
integer, parameter :: n = 5, nlen = 1
character (len=nlen) :: words(n)
character (len=nlen), allocatable :: values(:)
integer :: factors(n)
words = ["a","c","a","b","c"]
call compress(words,factors,values)
print "(a,*(1x,a))", "data:",words
print "(a,*(1x,i0))", "factors:",factors
print "(a,*(1x,a))", "values:",values
end program xfactors
! output:
! data: a c a b c
! factors: 1 2 1 3 2
! values: a c b
—
Reply to this email directly, view it on GitHub
<#670>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAN6YRY7BRKCUS5X7S4HYZLVWU7GTANCNFSM55DLAWRQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Thank you @Beliavsky for this proposition.
This can be easiliy done with My current main questions concern the module (name) in which such a procedure should go, and the API of this procedure |
If stdlib gets a function to return the unique values in an array, it would be nice to have some set operations. For arrays |
Given a 1-D array of character variables, integers, or a user-defined type such as dates, if there are repeated values one may want to create factors (the R term) that are integer variables corresponding to the unique values. Below is a small code that does this inefficiently for an array of character variables. A library subroutine that works for any data type for which an equality operator is defined would be useful.
The text was updated successfully, but these errors were encountered: