Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[python] lxml tries to resolve remote id's, but fails #56

Open
pvgenuchten opened this issue Sep 12, 2024 · 3 comments
Open

[python] lxml tries to resolve remote id's, but fails #56

pvgenuchten opened this issue Sep 12, 2024 · 3 comments
Labels
implementation-challenges Discussion of XSLT implementation challenges

Comments

@pvgenuchten
Copy link

this is probably not an issue related to the xslt directly, just wat to share some expirence when running the xslt in a python environment, maybe one of you knows a solution for this.

when i run the xslt in python for this record I get an error related to failure of loading this external entity

The remote link is mentioned in:

<srv:operatesOn uuidref="r_basili:-7bda2a44:134ec5768c4:-4f32" xlink:href="http://rsdi.regione.basilicata.it/Catalogo/srv/ita/csw?request=GetRecordById&service=CSW&version=2.0.2&elementSetName=full&OUTPUTSCHEMA=http://www.isotc211.org/2005/gmd&id=r_basili:-7bda2a44:134ec5768c4:-4f32"/>

I have been looking at the option of not loading external entities by lxml, but i was not successfull, any ideas?

@pvgenuchten
Copy link
Author

pvgenuchten commented Sep 13, 2024

I was able to optimize this from lxml docs by creating a custom resolver.

define a custom resolve class

import lxml.etree as ET

class LinkResolver(ET.Resolver):
    def resolve(self, url, id, context):
        print("Resolving URL '%s'" % url)
        return self.resolve_string(
            '<!ENTITY myentity "[resolved text: %s]">' % url, context)

Then use the resolver in xslt parsing

iso_parser = ET.XMLParser(ns_clean=True, recover=True, encoding='utf-8')
iso_parser.resolvers.add( LinkResolver() )
xsl = ET.fromstring(open('iso-triplify/iso-19139-to-dcat-ap.xsl', "r").read().encode('utf-8'), parser=iso_parser)
transform = ET.XSLT(xsl)
rdfxml = ET.tostring(transform(xml), pretty_print=True)

Welcoming suggestions for improvement :-)

@jakubklimek jakubklimek added the implementation-challenges Discussion of XSLT implementation challenges label Sep 16, 2024
@NielsHoffmann
Copy link

Hi Paul,
As I understand it, lxml only supports XSLT 1.0. So I would expect you'd run into other issues with lxml and the iso-19139-to-dcat-ap.xsl as well because it now uses XSLT 2.0 constructs?

@pvgenuchten
Copy link
Author

Actually we run the xslt via lxml with adjustments in production at https://github.com/soilwise-he/harvesters/blob/main/iso-triplify/iso-19139-to-dcat-ap.xsl, haven’t run into big issues with xslt2

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
implementation-challenges Discussion of XSLT implementation challenges
Projects
None yet
Development

No branches or pull requests

3 participants