This is a thread-safe Java API for Mozilla's Public Suffix List:
A "public suffix" is one under which Internet users can directly register names. Some examples of public suffixes are .com, .co.uk and pvt.k12.ma.us.
This fork adds a single function that can be used as a UDF in snowflake for the purpose of calculating th4 eTLD+1 for a domain.
Build the jar as described below, place it in a stage area (@myStage) and create the function in snowflake:
CREATE FUNCTION etldp1(domain STRING)
RETURNS STRING
LANGUAGE JAVA
IMPORTS = ('@myStage/public-suffix-list-2.2.2-SNAPSHOT.jar',
'@myStage/commons-lang3-3.14.0.jar')
HANDLER = 'de.malkusch.whoisServerList.publicSuffixList.PublicSuffixUDF.eTLDp1'
With this in place one can call it like so:
select etldp1("www.google.com")
-- "google.com"
If the functions encounters trouble (bad domain, unmanaged TLD) then is returned.
A special case results in a panic: a single domain part followed by a period causes a panic. Trimming the trailing period or detecting it and returning is on the TODO list.
This package is available in Maven central:
<dependency>
<groupId>de.malkusch.whois-server-list</groupId>
<artifactId>public-suffix-list</artifactId>
<version>2.2.0</version>
</dependency>
Create a
PublicSuffixList
with a
PublicSuffixListFactory
:
-
PublicSuffixList.getRegistrableDomain()
: Gets the registrable domain or null. E.g."www.example.net"
and"example.net"
will return"example.net"
. Null, an empty string or domains with a leading dot will return null. -
PublicSuffixList.isRegistrable()
: Returns whether a domain is registrable. E.g."example.net"
is registrable,"www.example.net"
and"net"
are not. -
PublicSuffixList.isPublicSuffix()
: Returns whether a domain is a public suffix or not. E.g."com"
is a public suffix,"example.com"
isn't. -
PublicSuffixList.getPublicSuffix()
: Returns the public suffix from a domain or null. If the domain is already a public suffix, it will be returned unchanged. E.g."www.example.net"
will return"net"
.
All methods are case insensitive.
You can use the API's methods with UTF-8 domain names or Punycode encoded ASCII domain names. The API will return the results in the same format as the input was. I.e. if you use an UTF-8 string the result will be an UTF-8 String as well. Same for Punycode.
PublicSuffixListFactory factory = new PublicSuffixListFactory();
PublicSuffixList suffixList = factory.build();
assertTrue(suffixList.isPublicSuffix("net"));
assertFalse(suffixList.isPublicSuffix("example.net"));
assertEquals("net", suffixList.getPublicSuffix("www.example.net"));
assertEquals("net", suffixList.getPublicSuffix("net"));
assertTrue(suffixList.isRegistrable("example.net"));
assertFalse(suffixList.isRegistrable("www.example.net"));
assertFalse(suffixList.isRegistrable("net"));
assertNull(suffixList.getRegistrableDomain("net"));
assertEquals("example.net", suffixList.getRegistrableDomain("example.net"));
assertEquals("example.net", suffixList.getRegistrableDomain("www.example.net"));
assertEquals("example.co.uk", suffixList.getRegistrableDomain("example.co.uk"));
assertEquals("example.co.uk", suffixList.getRegistrableDomain("www.example.co.uk"));
assertEquals("食狮.com.cn", suffixList.getRegistrableDomain("食狮.com.cn"));
assertEquals("xn--85x722f.com.cn", suffixList.getRegistrableDomain("xn--85x722f.com.cn"));
This library comes with a bundled list which is most likely out dated. You are
encouraged to follow Mozilla's Atom change feed
and use the latest effective_tld_names.dat
.
You can specify a custom path to your latest list by setting the property PROPERTY_LIST_FILE
:
PublicSuffixListFactory factory = new PublicSuffixListFactory();
Properties properties = factory.getDefaults();
properties.setProperty(
PublicSuffixListFactory.PROPERTY_LIST_FILE, "/effective_tld_names.dat");
PublicSuffixList suffixList = factory.build(properties);
You can integrate the download of the latest list in your maven build process:
<build>
<plugins>
<plugin>
<groupId>com.googlecode.maven-download-plugin</groupId>
<artifactId>download-maven-plugin</artifactId>
<version>1.2.0</version>
<executions>
<execution>
<id>package-psl</id>
<phase>generate-resources</phase>
<goals>
<goal>wget</goal>
</goals>
<configuration>
<url>https://publicsuffix.org/list/effective_tld_names.dat</url>
<outputDirectory>${project.build.outputDirectory}</outputDirectory>
<outputFileName>effective_tld_names.dat</outputFileName>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
Markus Malkusch markus@malkusch.de is the author of this project. This project is free and under the WTFPL.
If you like this project and feel generous donate a few Bitcoins here: 1335STSwu9hST4vcMRppEPgENMHD2r1REK