Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Rethinking severity scales #3306

Open
jonfroehlich opened this issue Jul 18, 2023 · 6 comments
Open

Rethinking severity scales #3306

jonfroehlich opened this issue Jul 18, 2023 · 6 comments

Comments

@jonfroehlich
Copy link
Member

jonfroehlich commented Jul 18, 2023

We have long-discussed updating the 5-point Likert rating scale. There are multiple high-level issues here:

  • Should accessibility features like curb ramps, pedestrian signals, and crosswalks have the same 5-point scale as accessibility barriers like surface problems, obstacles? Or missing accessibility features: missing curb ramps, lack of sidewalks.
  • Is a 5-point scale the right granularity? Should it be about "passability" or some other metric?

Yochai wrote a proposal about reframing the 5-point scales to 3-point to make it easier for the labelers:

Purpose: to compare categorical vs 5 level ranking in PS ratings of features
Currently Project Sidewalk uses a 1-5 scale from low to high passability, also reflecting low-high severity. An alternative proposed scoring is to categorize the labeled features into three categories, such as

For all negative features (obstacles, surface problem)

  • Not likely an access barrier (may not need this one actually because if put label than they think there is an issue)
  • Possibly an access barrier (some or minor tags - rater is unsure how big a sidewalk crack is or if there may be room to maneuver * around a obstacle)
  • Definitely a barrier (many or major tags - rater feels confident that the obstacle or surface problem is an barrier to walking or wheeling on the sidewalk)

For missing features (missing curb ramp and no sidewalk)

  • Not so confident the feature is missing (somewhat obscured by cars or grass)
  • Somewhat confident the feature is missing (some evidence of missing feature but still not super confident)
  • Very confident the feature is missing (clear that curb ramp or sidewalk is missing)

For feature labels (curb ramps, crosswalks, pedestrian signals)

  • No barriers to using it (no added tags)
  • Possibly a barrier to using it (some or minor added tags)
  • Definitely a barrier to using it (many or major added tags

He also supplied this handy chart to help us think through this change:
image

From my perspective, Yochai's proposal is quite interesting but is also oriented around capturing different types of information like "self-confidence in a label" than straight up severity assessments.

I do like simplifying and reoriented the ratings and the idea of capturing different types of ratings depending on whether it is an accessibility feature, accessibility barriers, and missing features.

For the accessibility barrier (what Yochai calls "negative feature"), I wonder if we want

  • Not a major barrier
  • Somewhat of a barrier
  • Significant access barrier

So, that's like a three-point Likert scale.

Additionally, I could imagine collecting "self-confidence" scores for every label (not confident, somewhat confident, very confident); however, that's a lot of extra work. Just thinking outloud...

I do also want to note that across each city and label type, we do see different types of severity distributions. So there is definitely information being captured there:

image

There are some related tickets where we talk about this, including:

We also talked about validation workflows to validate severity/tags part of labels:

@misaugstad
Copy link
Member

We had some discussion on Slack about how we would want to collapse existing data to a 3-point scale. I offered these suggestions:

Have we already discussed how we are collapsing existing severity ratings to a 3-point scale? There's nothing in the Github issues about it... For example, we could do:

  1. 1-2, 3, 4-5 -- I imagine that this is the default if we don't feel strongly about doing something different
  2. 1, 2-4, 5 -- If we really think that most ratings belong in the middle; the more I think abt this one the less I like it
  3. 1, 2-3, 4-5 -- If we think that people tend to under-rate problems, and we don't want a lot of "minor" problems that aren't rly minor

We could also potentially do something different for different label types. But I'm hoping that these discussions have already happened and I just don't remember 😁 Asking bc I need to visualize the new 3-point scale in the new Validate and want to know how to collapse them

@jonfroehlich said that he was thinking of going with option 1, and I think that that sounds good!


@jonfroehlich also brought up the need to not lose the old data that is more granular, either by saving the data in a separate table or otherwise. I suggested that if we are only going to use this for offline analysis, then saving database dumps for every city would be sufficient to have it saved for the future.


Another thing I thought of while working on the new Validate page today, where we are visually showing a 3-point scale, even though that hasn't been implemented anywhere else yet: There is some extra logic that was added that needs to be removed when we switch to a 3-point scale, or our validations will be saving the wrong severity! There's a TODO in the code to help locate it. In Label.js and LabelContainer.js.

@misaugstad
Copy link
Member

I am pasting screenshots from a thread we had on Slack about the new severity scales. Topics include 2- vs 3-point scales, how the scale might differ for Curb Ramp compared to other label types, and whether to have a severity rating for the Missing Curb Ramp label type.

I think that my summary of where we landed on some of these:

  1. We should remove severity for No Sidewalk
  2. Leaning towards removing severity for Missing Curb Ramp, but @jonfroehlich and I are still a little unsure about it.
  3. We generally agree that sticking to a 3-point scale (as opposed to 2-point) would be best. There is some disagreement over what the wording of the different levels of severity should be (low/med/high, minor/major/very major, something else?)
  4. For Curb Ramp: once we land on severity scale wording in general, then maybe we can talk about how Curb Ramp might look different.

Screenshot from 2024-06-10 17-00-37
Screenshot from 2024-06-10 17-01-06
Screenshot from 2024-06-10 17-01-21

@yeisenberg
Copy link
Collaborator

yeisenberg commented Jun 14, 2024

HI Again, I thought it made sense to continue here. An idea I had yesterday was to use: No issues, Minor Issues, Major Issues for both negative features - the surface problem, obstacle, and regular features: curbramp, crosswalk and ped signal labels. But that for negative features, the 'no issues' is greyed out. This option keeps the consistency well. But could also maybe allow during validation for people to change severity to no issues if they don't think something is an issue...Not sure about that piece, but at least as a whole i think this would be easy to train people on and result in more reliable severity data... Another approach could be to test out two different options with a small set of users?

@misaugstad
Copy link
Member

I had yesterday was to use: No issues, Minor Issues, Major Issues for both negative features - the surface problem, obstacle, and regular features: curbramp, crosswalk and ped signal labels. But that for negative features, the 'no issues' is greyed out. This option keeps the consistency well

I think that if we want to go with a 2-point scale for SurfaceProblem/Obstacle, then this is a nice way to do that!

could also maybe allow during validation for people to change severity to no issues if they don't think something is an issue...Not sure about that piece

I think for this we would just have people mark "disagree" instead of changing the severity to "no issues".

Another approach could be to test out two different options with a small set of users?

I think that you and @jonfroehlich should find some time to sit down together and hash out what the appropriate (and feasible) approach should be! Hopefully before he leaves for vacation on the 22nd since it's at the front of our minds now!

@jonfroehlich
Copy link
Member Author

Interesting idea @yeisenberg! Thanks for sharing it.

I need to think about it more... and unfortunately, with deadlines and planned travel, I'm not sure I'll get to this until mid-July (when Mikey will then be on travel). So, perhaps we go with the three-points scale that the students mocked up for now and change it later?

@yeisenberg
Copy link
Collaborator

Not sure what is easier? I just think the low, med, high is still too subjective but for sure can wait until after deadlines/travel. I can ask with others what they think. thanks

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

3 participants