Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Survey App in Django Admin #349

Closed
1 of 3 tasks
jmakowski1123 opened this issue Jul 1, 2022 · 5 comments
Closed
1 of 3 tasks

Survey App in Django Admin #349

jmakowski1123 opened this issue Jul 1, 2022 · 5 comments
Assignees

Comments

@jmakowski1123
Copy link

jmakowski1123 commented Jul 1, 2022

At-A-Glance

This survey tool will enable collection of aggregated, anonymized data about Open edX courses at scale, so that we can begin to track the growth and trends in Open edX usage over time, namely in the annual Open edX Impact Report. We wish to be able to answer questions such as:

How many unique courses are currently being offered across the Open edX ecosystem?
How many learners are currently registered on an Open edX Instance?
How many learners are currently using an Instance?
How many learners are currently enrolled in an Open edX course?
How many learners have completed an Open edX course?

More information

Product specs

Design files

Kanban board

Stakeholders

Primary Contributors

  • Axim Collaborative
  • edunext

Community release milestones

  • Quince: Beta version released in Quince

  • Redwood:

How to contribute

  • Has a community backlog with work to pick up?

This project is being implemented by eduNEXT and tCRIL. The final product will enable collection of aggregated, anonymized data about Open edX courses at scale, so that we can begin to track the growth and trends in Open edX usage over time, namely in the annual Open edX Impact Report. We wish to be able to answer questions such as:

How many unique courses are currently being offered across the Open edX ecosystem?
How many learners are currently registered on an Open edX Instance?
How many learners are currently using an Instance?
How many learners are currently enrolled in an Open edX course?
How many learners have completed an Open edX course?

We aim to roll out v1 of this app with Olive.

Creation of a new djangoapp in openedx/features, which will deliver against the following objectives:
● Aggregate data from other models in the platform
● Produce a record of a timestamped report with said data
● Publish the report to a URL managed by NP

Related PRs:


Discovery Findings

Earlier this year, tCRIL did some discovery work around possible approaches to gathering impact data on Open edX Instances. We explored one manual approach (google forms) and one automated approach (an app via the django admin).

The data we seek to gather on a per-instance basis is:

  • Number of unique courses currently offered |
  • Total number of learners currently using the site |
  • Total Number of Learners Ever Registered |
  • Total number of enrollments for all courses |
  • Total number of course completions

At the time, we decided to pursue the manual Google Form approach, which was less than successful in terms of gathering data across the ecosystem.

Regarding the automated approach, Dave outlined a framework and some recommendations for building an automated survey app that could be administered through the Django Admin. This is work that we intend to pick up now and pursue.

This ticket serves to collate findings from Dave's initial technical discovery. It will inform an answer to the question: Do we have enough context and technical discovery already completed in order to move ahead with implementing the work to build the app? Or do we need to pursue further scoping and definition?

Collated notes in the comments below:

@jmakowski1123 jmakowski1123 moved this to To Do - Backlog in Axim Engineering Tasks Jul 1, 2022
@jmakowski1123 jmakowski1123 moved this from To Do - Backlog to To Do - Prioritized for Current Sprint in Axim Engineering Tasks Jul 1, 2022
@jmakowski1123
Copy link
Author

Approach:

If we want to do this as a survey app in the Django Admin (accessible by site operators), we'd need the following:

  • An app to hold the model and logic for the survey itself. The model would likely be really simple, just capturing a timestamp, time it took to run, version of the codebase (i.e. what release), flags for any report-running options we offer, and a JSONField to hold the results.
  • A Django Admin interface to start the async task that would need to gather the data.
  • A celery task to run the queries for the data above.
  • (Optional) Some kind of advertising header that can go into the top level Django admin as a message notice. We could probably implement this as a middleware that checks to see if people have filled out the survey (or told it to go away) and adds a message to the Django admin homepage using Django's message framework.
  • An endpoint to actually receive the results and store it somewhere. We could spin something like this up on Heroku or Render.

Installation Options

There are two main ways I could see us going with this:

  1. An installable plugin app.
  2. Build it into edx-platform itself.

I actually prefer building this into edx-platform because it is so tightly coupled with that repository (at least for the data being collected here). It needs to directly query a number of edx-platform data models, and we'd want those tests to run during CI to make sure nothing breaks from release to release. It would also be really convenient if, whenever you're looking to deprecate a feature flag, you could add it to the list of things that the survey app scans for. However doing so would put us in a situation where we wouldn't be getting results back until people started running Numeg in the middle of this year (and long after the conference).

An alternative is to initially develop it as a plugin app, but fold it into edx-platform in time for Nutmeg. I really don't think we're going to get many people to install it this way though.

Options to consider

There can be at least two high level goals for such a script:

  1. Estimate impact (the origin of this story)
  2. Sample the options/configurations being used (useful for DEPR).

I suspect that more people will be willing to give (2) than (1), so it might be worth giving an option to separate the two. I am assuming that this will be strictly opt-in.

@jmakowski1123
Copy link
Author

Additional context:

The general theme with the technical discovery is that we can get rough numbers in a relatively straightforward manner, but that true accuracy involves accounting for a number of edge cases that I don't think are worth it for the first pass at this problem.

  1. Number of unique courses

The fastest and most reliable way to get this is a count on CourseOverview. There are a few caveats here. Just because a course exists doesn't mean that anyone can see it or use it. There are a few fields that can help guide us (start, end, and self_paced), but sometimes courses are created as scratch spaces and might not represent something that's ever seen by students.

Recommended approach: Simple count of CourseOverview rows, and ignore any subtleties about scheduling or enrollments.

  1. Total number of learners using the site

This would require a count on the User table. This can also be distorted by banned users (spam accounts), or from dummy-users created for the purposes of an LTI launch where Open edX is an LTI provider. Banned users are an obscure edge case though.

Recommended approach: Simple count of the User model, minus a simple count of the LtiUser model.

  1. Total number of enrollments for all courses

@jmakowski1123: This could be a count of all currently active enrollments, or all enrollments that were ever made. The latter would mean that we'd still count an enrollment if someone enrolled in a course and then unenrolled some time later. When counting all enrollments ever made, we wouldn't double-count re-enrollments–i.e. if someone enrolled in a course, unenrolled, and re-enrolled, that would still count as only one enrollment.

Getting all enrollments that were ever made is slightly cheaper, but both are relatively straightforward to get–it's just a matter of filtering on the is_active field. Please let me know which one you'd like (or if you'd like both).

  1. Total number of course completions/certificates granted

We can get this from the GeneratedCertificate model, but it's honestly kind of a mess in terms of ensuring accuracy when these are generated. We also have many different "modes" that a certificate can be granted in (e.g. "verified", "masters", "credit"). So it's probably best to get a simple count that is equivalent to "this person passed a course", and not try to dig too far into the types of certificates, the significance of which likely varies from site to site.

  1. Primary language of instruction

We can get a count of courses by language, but this might be pretty messy and unreliable data. This can be queried using the language field in CourseOverview.

  1. Other languages of instruction

Same approach and caveats as (5).

@jmakowski1123
Copy link
Author

7/1 - Shared with Juan for review for eduNEXT/tCRIL project.

@jmakowski1123
Copy link
Author

Nell questions/feedback:

  1. What is being delivered to tCRIL? Is there any raw data that leaves the instance and comes to tCRIL? Or is the final deliverable just an aggregate report? The latter is preferable.
  2. Need an opt-in workflow/opt-out workflow and messaging.
  3. Outline steps to anonymize where the Instance identity might be obvious.

@jmakowski1123 jmakowski1123 moved this from To Do - Prioritized for Current Sprint to To Do - Backlog in Axim Engineering Tasks Jul 8, 2022
@jmakowski1123 jmakowski1123 changed the title Discovery Findings: Survey App in Django Admin Survey App in Django Admin Oct 27, 2022
@jmakowski1123 jmakowski1123 moved this from Backlog to In Progress in Open edX Roadmap Oct 27, 2022
@jmakowski1123 jmakowski1123 added this to the Olive Release Candidate milestone Oct 27, 2022
@jmakowski1123 jmakowski1123 removed this from the Olive.1 milestone Sep 22, 2023
@jmakowski1123 jmakowski1123 moved this to New Features in Product Review Tracking Sep 22, 2023
@jmakowski1123 jmakowski1123 moved this from In Progress to New Features in Open edX Roadmap Sep 25, 2023
@jmakowski1123 jmakowski1123 moved this from New Features - In Progress to In Progress in Open edX Roadmap Mar 7, 2024
@jmakowski1123 jmakowski1123 moved this from Being Developed to Shipped in Open edX Roadmap Mar 28, 2024
@sarina
Copy link
Contributor

sarina commented Jun 4, 2024

Closing as this is marked "Shipped" on the PR board 🎉

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
Status: Shipped
Development

No branches or pull requests

2 participants