-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[Data Liberation] Expose experimental Markdown importer in the importWxr step #2080
Conversation
This PR needs to be split into smaller parts before merging. For sure the new vendor libraries will become a separate PR. Epub and HTML importers probably, too. |
Adds a forked version of the markdown parsing libraries required by the upcoming Markdown importer. We need out own fork for PHP 7.2 compatibility. The downgrade process was performed semi-automatically via Rector. This PR adds the following libraries: * `league/commonmark` * `webuni/front-matter` There are no testing steps here. This PR only adds new code without modifying the existing one. A part of #2080
Adds a forked version of the markdown parsing libraries required by the upcoming Markdown importer. We need out own fork for PHP 7.2 compatibility. The downgrade process was performed semi-automatically via Rector. This PR adds the following libraries: * `league/commonmark` * `webuni/front-matter` There are no testing steps here. This PR only adds new code without modifying the existing one. A part of: * #2080 * #1894
…Wxr step 🚧 Work in progress, don't merge 🚧 Enables importing markdown files via the `importWxr` step (to be renamed) when the data-liberation importer is enabled. Here's the Blueprint you can use to import the "data basics" tutorial from the Gutenberg repo: ```json { "$schema": "https://playground.wordpress.net/blueprint-schema.json", "landingPage": "/adding-a-delete-button/", "features": { "networking": true }, "steps": [ { "step": "resetData" }, { "step": "importWxr", "importer": "data-liberation", "phpImporterOptions": { "data_source": "markdown_directory", "source_site_url": "https://raw.githubusercontent.com/WordPress/gutenberg/HEAD/docs/how-to-guides/data-basics" }, "importData": { "resource": "git:directory", "url": "https://github.com/WordPress/gutenberg.git", "ref": "HEAD", "path": "docs/how-to-guides/data-basics" } } ] } ``` ## Remaining work * Confirm the WXR import still works both for the regular importer and the data liberation one * Add E2E coverage * Rewrite relative markdown URLs * Enable specifying additional URL mappings directly in the Blueprint * Review the code and make any architectural adjustments necessary
…zed WP_Markdown_Directory_Tree_Reader
Builds data-liberation-markdown.phar.gz (200KB) to enable downloading the Markdown importer only when needed instead of on every page load. A part of: * #2080 * #1894 ## Testing instructions Run `nx build playground-data-liberation-markdown`, confirm it finished without errors. A smoke test of the built phar file is included in the build command.
Builds data-liberation-markdown.phar.gz (200KB) to enable downloading the Markdown importer only when needed instead of on every page load. A part of: * #2080 * #1894 ## Testing instructions Run `nx build playground-data-liberation-markdown`, confirm it finished without errors. A smoke test of the built phar file is included in the build command.
f522d40
to
4a31689
Compare
I'm going to close this PR. I've reorganized it as a series of smaller ones that we can discuss granularly:
After all the API changes, I'm no longer sure setting up the importer in |
Sets the stage for the EPub importer. A part of #2080 Refactors and clean up the Data Liberation package. This includes renaming, reorganizing file paths, improving class structure, and removing deprecated/unused code. ## Key Changes **Refactor:** - Renamed `WP_WXR_Reader` to `WP_WXR_Entity_Reader` for consistency and clarity. - Adjusted references in related classes, tests, and imports. - Moved `byte-readers` to the Blueprints library (see WordPress/blueprints-library#121) **Cleanup:** - Deleted unused and redundant byte reader classes (`WP_Byte_Reader`, `WP_File_Reader`, etc.). - Removed legacy files such as `WXR_Import_Info`. **New Additions:** - Added `WP_Directory_Tree_Entity_Reader` to improve handling of directory tree imports. - Introduced `WP_Import_HTML_Processor` for better HTML import functionality. ## Testing instructions Confirm the CI tests passed
Builds data-liberation-markdown.phar.gz (200KB) to enable downloading the Markdown importer only when needed instead of on every page load. A part of: * #2080 * #1894 ## Testing instructions Run `nx build playground-data-liberation-markdown`, confirm it finished without errors. A smoke test of the built phar file is included in the build command.
🚧 Work in progress, don't merge 🚧
Enables importing markdown and epub files via the
importWxr
step (to be renamed) when the data-liberation importer is enabled.CleanShot.2024-12-13.at.21.17.10.mp4
Here's the Blueprint you can use to import the "data basics" tutorial from the Gutenberg repo:
Requires WordPress/blueprints-library#121
Other code examples
Combining the new importers APIs is getting ridiculous. Here’s two entity readers:
We can mix&match data sources (local filesystem, remote), formats (e.g. md, xhtml, wxr), and containes (plain, .zip, git in the future)
Remaining work