Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

GeoIP2 trys to write characters outside of ISO-8859 #127

Closed
manuelm opened this issue Apr 18, 2019 · 2 comments · Fixed by #131
Closed

GeoIP2 trys to write characters outside of ISO-8859 #127

manuelm opened this issue Apr 18, 2019 · 2 comments · Fixed by #131

Comments

@manuelm
Copy link
Contributor

manuelm commented Apr 18, 2019

This is related to #69 but targets GeoIP2 modules. Mainly GeoIP2::City. GeoIP2 names are UTF-8 encoded. Therefor they can contain characters outside of the ISO-8859 range. Since AWStats data files are stored in ISO the GeoIP output has to be escaped. As in #69 I suggest html escaping them, either using HTML::Entities or a stripped down version which only escapes high bytes:

sub html_escape {
  my $s = shift;
  $s =~ s/([^\x00-\x7F])/sprintf "&#x%X;", ord($1)/ge;
  return $s;
}

Regarding geoip2_city.pm the function call should be added in SectionWriteHistory_geoip2_city so that XMLEncodeForHisto($_) becomes XMLEncodeForHisto(html_escape($_)) (or XMLEncodeForHisto(encode_entities_numeric($_))).

For geoip2.pm this isn't strictly necessary as I'm not aware of any international countries names that contain characters outside the ISO range.

195.38.12.0 is an ipaddress for testing. GeoIP2::City tries to store PL_Zamość_Lublin. With escaping this gets converted to PL_Zamość_Lublin.

@manuelm
Copy link
Contributor Author

manuelm commented Apr 18, 2019

Forgot about loading: Of course we also need to unescape the characters on loading/updating the data file. So e.g. in SectionReadHistory_geoip2_city we add decode_entities($field[0]); after @field=split(/\s+/,($xmlold?XMLDecodeFromHisto($_):$_));.

@manuelm
Copy link
Contributor Author

manuelm commented Apr 19, 2019

Here is a better approach without the need of decoding and HTML::Entities as extra dependency. We simply store the escaped name in the hash:

--- b/wwwroot/cgi-bin/plugins/geoip2_city.pm.orig 2019-04-19 12:36:02.026552366 +0200
+++ a/wwwroot/cgi-bin/plugins/geoip2_city.pm  2019-04-19 12:35:52.906632209 +0200
@@ -29,6 +29,11 @@
 #use strict;
 no strict "refs";
 
+sub html_escape {
+ my $s = shift;
+ $s =~ s/([^\x00-\x7F])/sprintf "&#x%X;", ord($1)/ge;
+ return $s;
+}
 
 
 #-----------------------------------------------------------------------------
@@ -369,7 +374,7 @@
             my $countrycity=$record->country()->iso_code().'_'.$record->city()->name();
             $countrycity=~s/ /%20/g;
             if ($record->most_specific_subdivision()->name()) { $countrycity.='_'.$record->most_specific_subdivision()->name(); }
-            $_city_h{lc($countrycity)}++;
+            $_city_h{html_escape(lc($countrycity))}++;
         } else {
             $_city_h{'unknown'}++;
         }
@@ -409,7 +414,7 @@
             my $countrycity=($record->country()->iso_code()).'_'.$city;
             $countrycity=~s/ /%20/g;
             if ($record->most_specific_subdivision()->name()) { $countrycity.='_'.$record->most_specific_subdivision()->name(); }
-            $_city_h{lc($countrycity)}++;
+            $_city_h{html_escape(lc($countrycity))}++;
         } else {
             $_city_h{'unknown'}++;
         }

manuelm added a commit to manuelm/awstats that referenced this issue Apr 26, 2019
* Correctly convert dns names to ip4 and ip6 address using getaddrinfo
  (fixes eldy#120, eldy#121, obsoletes eldy#115)
* Only lookup if the IP is of type public (fixes eldy#122)
  and catch further lookup errors (obsoletes eldy#123)
* Store and display the GeoIP City output HTML escaped (fixes eldy#127)
* Code to perform and cache the actual lookup is consolidated
* General code improvement and readability
@eldy eldy closed this as completed in #131 Jun 27, 2019
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant