-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Discovery: Script solution for Instance survey #62
Comments
Some questions:
Getting people to install something is a high barrier to entry for a survey. It might make sense to start with a Google Forms sort of approach for the first iteration of this, and to have a bundled app that operators could choose to opt into starting with the Nutmeg release. |
That app could later be used for other really useful pulse-of-the-community things like determining which sites use which feature flags, and other things that would be useful to know for support and deprecation purposes. Some sites might not want to give up their enrollment numbers, but they might be at least willing to share which features they're using if it can be collected automatically. |
Given that the nature of the project (quantifying current Open edX Instances) is fairly opaque to begin with, my initial thought is that we can err on the side of fair/reasonable accuracy. But I'm very curious for @e0d thoughts? For a framework, I'd say strike a balance between reasonable accuracy and the realistic timeline for this project, which is to gather, analyze and present the data at the April conference (sorry...that's very un-Agile-like, with such a hard deadline!)
For the purposes of the first survey, I think it's enough to be able to say the current number of learners, the current number of enrollments, and perhaps the number of certificates/credentials granted to date. Assuming we run the survey annually, next year we could put a time wrap around it (ie "in CY 2022"). Again curious for @e0d thoughts?
At the moment, I think once a year is realistic in terms of what our goals are (an annual impact report), but I also hope this project expands with community involvement and can see scenarios where more frequent updates could be of interest to the Marketing WG for example. So if not a burden to site operators, perhaps biannually and quarterly as a start?
Would the Google Form then be filled out manually for each Instance? I can see that also being a barrier to operators who are running many Instances? Even if we only got a ~10% rate of install in the first go-round, that's still more data than we have now, and would be the bar to raise next year. Maybe there's a hybrid approach where we can give folks the option, either an install or the Google form? And I like the bundled app idea with Nutmeg as a long-term sustainable solution. |
The general theme with the technical discovery is that we can get rough numbers in a relatively straightforward manner, but that true accuracy involves accounting for a number of edge cases that I don't think are worth it for the first pass at this problem.
The fastest and most reliable way to get this is a count on CourseOverview. There are a few caveats here. Just because a course exists doesn't mean that anyone can see it or use it. There are a few fields that can help guide us ( Recommended approach: Simple count of CourseOverview rows, and ignore any subtleties about scheduling or enrollments.
This would require a count on the User table. This can also be distorted by banned users (spam accounts), or from dummy-users created for the purposes of an LTI launch where Open edX is an LTI provider. Banned users are an obscure edge case though. Recommended approach: Simple count of the User model, minus a simple count of the LtiUser model.
@jmakowski1123: This could be a count of all currently active enrollments, or all enrollments that were ever made. The latter would mean that we'd still count an enrollment if someone enrolled in a course and then unenrolled some time later. When counting all enrollments ever made, we wouldn't double-count re-enrollments–i.e. if someone enrolled in a course, unenrolled, and re-enrolled, that would still count as only one enrollment. Getting all enrollments that were ever made is slightly cheaper, but both are relatively straightforward to get–it's just a matter of filtering on the
We can get this from the
We can get a count of courses by language, but this might be pretty messy and unreliable data. This can be queried using the
Same approach and caveats as (5). |
If we want to do this as a survey app in the Django Admin (accessible by site operators), we'd need the following:
Installation OptionsThere are two main ways I could see us going with this:
I actually prefer building this into edx-platform because it is so tightly coupled with that repository (at least for the data being collected here). It needs to directly query a number of edx-platform data models, and we'd want those tests to run during CI to make sure nothing breaks from release to release. It would also be really convenient if, whenever you're looking to deprecate a feature flag, you could add it to the list of things that the survey app scans for. However doing so would put us in a situation where we wouldn't be getting results back until people started running Numeg in the middle of this year (and long after the conference). An alternative is to initially develop it as a plugin app, but fold it into edx-platform in time for Nutmeg. I really don't think we're going to get many people to install it this way though. Options to considerThere can be at least two high level goals for such a script:
I suspect that more people will be willing to give (2) than (1), so it might be worth giving an option to separate the two. I am assuming that this will be strictly opt-in. |
Yes, it would be in this case. But so would the Admin option for sending the data. I suppose we could make a setting that says, "Just always send this information every X months if you haven't before." and default that to False? So most people wouldn't use it, but only those that have a hundred sites and want to opt in? |
@jmakowski1123: FWIW, I think that we should send this year's survey out via Google Form and have folks fill it in as before, and then target doing this in the Django Admin for Nutmeg. I really can't see folks installing this as a separate plugin in useful numbers–it's just going to be so much faster for them to fill in a form. My best guess at this is a couple of weeks of work if there's a really bare-bones UI and not counting any analysis work we'd do on the other end. Most of the effort is in the admin interface and making sure we don't bring sites down when running these large queries–though we should probably go through group estimation. |
The draft form was build in FormAssembly.
|
Works for me. I default to Google Forms because that's the only thing I've used. Happy to defer to those who have used other products in this area.
Sure, I can give some queries for them to run. It'd be nice if edX could run them on their read replica to test early, but it's not absolutely required. That's probably only a couple hours of actual work with the caveats I put in the recommended queries above. Might take more calendar time if someone at edX is testing and we get weird results that we need to debug. @jmakowski1123: Assigning this to you for you to weigh in on. Please feel free to move to "Done" if you're okay with the conclusions here, or assign it back to me if you have feedback, questions, or other areas you feel need further investigation. Thank you. |
This makes sense to me. I suggest we prune the number and types of questions we ask in the form, in order to make this as easy and quick as possible. Maybe we even limit it to query-based questions for now. Then we can focus on a more well-rounded question set that aligns with the long-term Nutmeg install option. |
Context
tCRIL is generating an Impact Report that quantifies the landscape of Open edX Instances globally. A large portion of this data will be elicited directly from Providers, within the boundaries of the standard Provider contract. Draft survey questions are here. To facilitate survey uptake, we'd like to automate the process of answering some or all of the questions. The results of the survey will be analyzed and summarized in aggregate, and the anonymized results shared publicly. We will present the results at the Open edX conference in April.
Acceptance Criteria:
The Provider is given a quick and seamless method by which to autogenerate data to answer the following questions for each of their Instances:
The end-result data is captured in .cvs (or similar), with a clear connection between each Instance URL and its corresponding data listed above.
Approach:
The purpose of this ticket is to explore solutions for the method by which to autogenerate data, and to propose a recommended method/approach. Based on a brainstorming session during the January 5 Standup, one highly viable approach is to write a script that Providers can embed into each of their Instances.
The text was updated successfully, but these errors were encountered: