Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

mergeForEach wrongly calls innerCallback when rhsItem can't be compared with lhsItem #21

Open
Sumolari opened this issue Jul 4, 2019 · 1 comment
Labels
bug Something isn't working

Comments

@Sumolari
Copy link
Contributor

Sumolari commented Jul 4, 2019

Related internal issue » https://geoblink.atlassian.net/browse/CORE-7255

If rhsCollection has items which do not have a value for primary key and lhsCollection have primary key for all items, then the comparison between the element without primary key and any element of the other collection will return SORTING_ORDER.EQUAL because the else branch of this chunk will be executed:

      if (lhsValue < rhsValue) {
        return SORTING_ORDER.LHS_BEFORE_RHS
      } else if (lhsValue > rhsValue) {
        return SORTING_ORDER.LHS_AFTER_RHS
      } else {
        return SORTING_ORDER.EQUAL
      }

However, this comparison is false. It's not entirely clear if this example is breaking some precondition on the API but it can be fixed in the algorithm by adding a new value return value, null whenever values can't be compared.

      if (lhsValue < rhsValue) {
        return SORTING_ORDER.LHS_BEFORE_RHS
      } else if (lhsValue > rhsValue) {
        return SORTING_ORDER.LHS_AFTER_RHS
      } else if (lhsValue === rhsValue) {
        return SORTING_ORDER.EQUAL
      } else {
        return SORTING_ORDER.UNCOMPARABLE
      }

If a pair is not comparable then we should not caller innerCallback for it. At the end we'll end up calling leftCallback and rightCallback on each member of the pair, according to their actual position.

@Sumolari Sumolari added the bug Something isn't working label Jul 4, 2019
@Sumolari Sumolari self-assigned this Jul 4, 2019
@Sumolari
Copy link
Contributor Author

Sumolari commented Jul 23, 2019

Initially I planed to call either leftCallback or rightCallback for non-comparable items. However, turns out that this cannot be implemented in an efficient way with current API.

Let's assume there's an element in left collection which can't be compared to any element in right collection. The only way to detect such element would be comparing it to every single element in the other collection.

Since any collection could potentially be filled with non-comparable values, this means that the algorithm will run in O(n^2) time, which defeats the purpose of this method.

I considered different alternatives for this:

  • Checking all the items in the other collection and make the algorithm O(n^2) as explained before.
  • Adding leftComparable and rightComparable parameters to the API so user can tell the algorithm when a value is not comparable to any other, so it is directly passed to the proper (left/right)Callback. The problem with this approach is that if the comparator is to be modified then these new functions should be modified, too, and there's no way to do so without introducing breaking changes.
  • Introducing a new outerCallback parameter to be called when a non-comparable pair is found. This introduces more complexity: what should be done with a pair after being passed to outerCallback? If the pair contains a comparable element this one might be found in the other collection, too, so consuming it in outerCallback might be wrong. However, being able to tell if the pair contains a comparable item brings back the O(n^2) problem of the initial approach.
  • Forcing elements to be comparable by throwing an error if this precondition is not met. Performance-wise is the best solution. API-wise introduces no changes (if userland code was already using this function with no issues then it will continue this way). It actually complies with what documentation tells about the function:

Both collections must be sortable. They will be sorted ascendantly using value returned by the corresponding iteratee.

So finally I've decided to quickly implement the last one: throwing an error if precondition is not met.

Does it sound right to you?

@Sumolari Sumolari removed their assignment Jun 11, 2020
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant