Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fix import channel in Postgresql #12709

Merged

Conversation

jredrejo
Copy link
Member

@jredrejo jredrejo commented Oct 8, 2024

Summary

When importing channel data, psycopg2 execute_values function was used. This function code converts to bytes all the strings in order to have a better performance.
However, converting an utf-8 char to byte results in more than one byte, making some strings unfit in the maximum number of chars of a column limit.

This PR:

  • Replaces the use of execute_values by executemany
  • As sqlite does not applies the column char limit, ensures limit is applied to data before is inserted in Postgresql

Note: I had to cherry-pick the commit from #12466 to fix docs builds in GH

References

Closes: #11780

Reviewer guidance

Do tests pass?
In order to test the fix, this channel was failing and can be used in a kolibri installation using PG:
kolibri manage importchannel network --baseurl=https://studio.learningequality.org 07cd1633691b4473b6fda08caf826253

Testing checklist

  • Contributor has fully tested the PR manually
  • If there are any front-end changes, before/after screenshots are included
  • Critical user journeys are covered by Gherkin stories
  • Critical and brittle code paths are covered by unit tests

PR process

  • PR has the correct target branch and milestone
  • PR has 'needs review' or 'work-in-progress' label
  • If PR is ready for review, a reviewer has been added. (Don't use 'Assignees')
  • If this is an important user-facing change, PR or related issue has a 'changelog' label
  • If this includes an internal dependency change, a link to the diff is provided

Reviewer checklist

  • PR is fully functional
  • PR has been tested for accessibility regressions
  • External dependency files were updated if necessary (yarn and pip)
  • Documentation is updated
  • Contributor is in AUTHORS.md

@jredrejo jredrejo added the TODO: needs review Waiting for review label Oct 8, 2024
@jredrejo jredrejo added this to the Kolibri 0.17: Planned Patch 2 milestone Oct 8, 2024
@jredrejo jredrejo requested a review from rtibbles October 8, 2024 17:52
@github-actions github-actions bot added the DEV: backend Python, databases, networking, filesystem... label Oct 8, 2024
…+ add required extension sphinxcontrib.jquery
@jredrejo jredrejo force-pushed the fix_channel_import_in_pg branch from 24e4490 to f1b712a Compare October 8, 2024 18:15
Copy link
Member

@rtibbles rtibbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that we are properly setting psycopg2 up with unicode handling as described here? https://www.psycopg.org/docs/usage.html#unicode-handling

This might point to a way to handle this in a way that doesn't cause a huge performance regression.

Also, I think we should add a regression test for the specific case we are fixing here - a importing a unicode string for a node title that is at the max length.

kolibri/core/content/utils/channel_import.py Outdated Show resolved Hide resolved
@jredrejo jredrejo requested a review from rtibbles October 10, 2024 16:48
@jredrejo jredrejo force-pushed the fix_channel_import_in_pg branch from dd5dd8a to 130405f Compare October 10, 2024 18:31
@jredrejo
Copy link
Member Author

Also, I think we should add a regression test for the specific case we are fixing here - a importing a unicode string for a node title that is at the max length.

Done

Copy link
Member

@rtibbles rtibbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of questions to make sure the tests are doing what they ought - if the answer to my second question is yes, then we might need two test cases, one with utf-8 that is short enough but would overflow if it was converted to bytes, and another where the tag is just too long regardless.

I will manually test this to compare speed before and after.

@jredrejo jredrejo requested a review from rtibbles October 17, 2024 18:06
@rtibbles
Copy link
Member

This should be good to go, once I've done the speed test locally just to verify, will do tomorrow.

Copy link
Member

@rtibbles rtibbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local testing shows no performance regressions with these changes.

@rtibbles rtibbles merged commit d085880 into learningequality:release-v0.17.x Oct 18, 2024
34 checks passed
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
DEV: backend Python, databases, networking, filesystem... TODO: needs review Waiting for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants