Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

JS Browserdetection fail and redirect #368

Open
toniritter opened this issue Jul 7, 2021 · 8 comments
Open

JS Browserdetection fail and redirect #368

toniritter opened this issue Jul 7, 2021 · 8 comments
Labels
js-engine Issues related to the js engine

Comments

@toniritter
Copy link

toniritter commented Jul 7, 2021

based on JavaScript execution exeption question on Stackoverflow

HtmlUnit Version: 2.50.0

During getPage call of webpage flashscore.com, i got following exceptions

2021-07-07 08:46:05.408  WARN 4828 --- [nio-8080-exec-1] c.g.htmlunit.IncorrectnessListenerImpl   : Obsolete content type encountered: 'text/javascript'.
2021-07-07 08:46:05.564 ERROR 4828 --- [nio-8080-exec-1] c.g.h.j.DefaultJavaScriptErrorListener   : Error during JavaScript execution

com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot find function entries in object function Object() { [native code] }. (script in https://www.flashscore.com/unsupported/ from (31, 9) to (53, 10)#35)
	at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:949) ~[htmlunit-2.50.0.jar:2.50.0]
	at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:598) ~[htmlunit-core-js-2.50.0.jar:na]
	at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:487) ~[htmlunit-core-js-2.50.0.jar:na]
	at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.callSecured(HtmlUnitContextFactory.java:353) ~[htmlunit-2.50.0.jar:2.50.0]
	at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:829) ~[htmlunit-2.50.0.jar:2.50.0]
	at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:805) ~[htmlunit-2.50.0.jar:2.50.0]
	at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:796) ~[htmlunit-2.50.0.jar:2.50.0]
	at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScript(HtmlPage.java:942) ~[htmlunit-2.50.0.jar:2.50.0]
	at com.gargoylesoftware.htmlunit.html.ScriptElementSupport.executeInlineScriptIfNeeded(ScriptElementSupport.java:378) ~[htmlunit-2.50.0.jar:2.50.0]

I've tried with two different classes and problem still occur.

@PostMapping("/startScraping")
	public ResponseEntity<FlashScraper> startScraping(@NonNull @RequestBody FlashScraper flashScraper) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
		logger.info("startScraping request incomming");
		logger.info("Call URL: " + flashScraper.getScrapeUrl());
		
	    String url = "https://flashScore.com";

	    try (final WebClient webClient = new WebClient(BrowserVersion.BEST_SUPPORTED)) {
	        HtmlPage page = webClient.getPage(url);
	        webClient.waitForBackgroundJavaScript(3_000);

	        System.out.println();
	        System.out.println();
	        System.out.println("----------------");
	        System.out.println(page.asNormalizedText());
	        System.out.println("----------------");
	    }
		
		return new ResponseEntity(flashScraper, HttpStatus.OK);
	}
@PostMapping("/startScraping")
	public ResponseEntity<FlashScraper> startScraping(@NonNull @RequestBody FlashScraper flashScraper) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
		logger.info("startScraping request incomming");
		logger.info("Call URL: " + flashScraper.getScrapeUrl());
		
		
		final WebClient webClient = new WebClient(BrowserVersion.BEST_SUPPORTED);
		webClient.getOptions().setJavaScriptEnabled(true);
		webClient.getOptions().setThrowExceptionOnScriptError(false);
		webClient.waitForBackgroundJavaScriptStartingBefore(1000);

		HtmlPage scrapePage = webClient.getPage(flashScraper.getScrapeUrl());
		webClient.waitForBackgroundJavaScript(3000);
		
		
		
		System.out.println(scrapePage.getByXPath("//*[@id=\"g_25_rwPxTVj1\"]"));
		
		return new ResponseEntity(flashScraper, HttpStatus.OK);
	}
@toniritter
Copy link
Author

After switch Dependency to 2.51.0 version, the exception is not thrown anymore but still i'm on the "Unsupported" page https://flashscore.com/unsupported/

@rbri
Copy link
Member

rbri commented Jul 11, 2021

The browser detection is done using this https://www.flashscore.com/x/js/browsercompatibility_4.js code

// !!! for update iterate manually `browser_compatibility_serial`
"use strict";
try {
	(function () {
		var cssRequirements = [["display", "flex"], ["display", "grid"], ["color", "red"]];
		for (var i in cssRequirements) {
			if (!CSS.supports(cssRequirements[i][0], cssRequirements[i][1])) {
				throw "no-" + cssRequirements[i][0] + "-" + cssRequirements[i][1];
			}
		}
		try {
			new XMLHttpRequest();
		}
		catch (pass) {
			throw "no-ajax";
		}
		try {
			eval("var foo = (x)=>x+1");
		}
		catch (pass) {
			throw "no-es6";
		}
		try {
			eval("var foo = {}; var bar = {...foo};")
		}
		catch (pass) {
			throw "no-spread";
		}
	})();
}
catch (e) {
	var utm = "";
	if (typeof e == "string" && /^[a-z0-9\-]+$/.test(e)) {
		utm = "?err=" + e;
	}
	window.location.replace("/unsupported/" + utm);
}

For the moment i can fix CSS.supports() but because Rhino not (yet) supports the spread syntax (mozilla/rhino#968) this will still fail.

The only option you have is to 'patch' the script and replace comment out some parts (see https://htmlunit.sourceforge.io/faq.html#HowToModifyRequestOrResponse). At least it is worth a try

@rbri
Copy link
Member

rbri commented Jul 11, 2021

Have done a fix for CSS.supports() - will make a new snapshot available soon (check twitter for updates)

@toniritter
Copy link
Author

I've done it as suggested and try modify the response but got now following exception on it (still on version 2.51.0

2021-07-12 19:23:13.844 ERROR 2820 --- [nio-8080-exec-2] o.a.c.c.C.[.[.[/].[dispatcherServlet]    : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is com.gargoylesoftware.htmlunit.ScriptException: syntax error (https://www.flashscore.com/x/js/browsercompatibility_4.js#1)] with root cause

net.sourceforge.htmlunit.corejs.javascript.EvaluatorException: syntax error (https://www.flashscore.com/x/js/browsercompatibility_4.js#1)
	at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory$HtmlUnitErrorReporter.error(HtmlUnitContextFactory.java:436) ~[htmlunit-2.51.0.jar:2.51.0]
	at net.sourceforge.htmlunit.corejs.javascript.Parser.addError(Parser.java:251) ~[htmlunit-core-js-2.51.0.jar:na]

@rbri
Copy link
Member

rbri commented Jul 15, 2021

looks like there is a syntax error in your replaced script - maybe you can replace it by an empty one?

@toniritter
Copy link
Author

Hey rbri, i've tried it meanwhile with this but it will still faile:

	public void startScraper() throws FailingHttpStatusCodeException, MalformedURLException, IOException {
		
		
		String url = "https://www.flashscore.com/basketball/";
		
		
		try (final WebClient webClient = new WebClient(BrowserVersion.BEST_SUPPORTED)) {
			
			webClient.getOptions().setThrowExceptionOnScriptError(false);
			webClient.getOptions().setUseInsecureSSL(true);
		    webClient.getOptions().setCssEnabled(true);
			webClient.getOptions().setJavaScriptEnabled(true);
			webClient.waitForBackgroundJavaScriptStartingBefore(1000);
			
			
			new WebConnectionWrapper(webClient) {

	            public WebResponse getResponse(WebRequest request) throws IOException {
	                WebResponse response = super.getResponse(request);
	                if (request.getUrl().toExternalForm().contains("browsercompatibility")) {
	                    String content = "";
	                    // intercept and/or change content

	                    WebResponseData data = new WebResponseData(content.getBytes(),response.getStatusCode(), response.getStatusMessage(), response.getResponseHeaders());
	                    response = new WebResponse(data, request, response.getLoadTime());
	                }
	                return response;
	            }
	        };
			
			
			
	        HtmlPage page = webClient.getPage(url);
	        webClient.waitForBackgroundJavaScript(3_000);

	        System.out.println();
	        System.out.println();
	        System.out.println("----------------");
	        System.out.println(page.asNormalizedText());
	        System.out.println("----------------");
	    }
		
		
		
	}
2021-07-16 15:22:45.844  WARN 1524 --- [           main] c.g.htmlunit.DefaultCssErrorHandler      : CSS error: 'https://www.flashscore.com/res/_fs/build/livetableresponsive.c7059bf.css' [1:8910] Error in pseudo class or element. (Invalid token ".". Was expecting one of: <S>, <NUMBER>, <IDENT>, <STRING>, "-", <PLUS>, <DIMENSION>.)
2021-07-16 15:22:45.844  WARN 1524 --- [           main] c.g.htmlunit.DefaultCssErrorHandler      : CSS warning: 'https://www.flashscore.com/res/_fs/build/livetableresponsive.c7059bf.css' [1:8910] Ignoring the whole rule.
2021-07-16 15:22:46.305  WARN 1524 --- [           main] c.g.htmlunit.IncorrectnessListenerImpl   : Obsolete content type encountered: 'text/javascript'.
2021-07-16 15:22:46.487 ERROR 1524 --- [           main] c.g.h.j.DefaultJavaScriptErrorListener   : Error during JavaScript execution

com.gargoylesoftware.htmlunit.ScriptException: invalid property id (https://www.flashscore.com/res/_fs/build/loader.5714507.js#1)
	at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:954) ~[htmlunit-2.51.0.jar:2.51.0]
	at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:580) ~[htmlunit-core-js-2.51.0.jar:na]
	at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:481) ~[htmlunit-core-js-2.51.0.jar:na]
	at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.callSecured(HtmlUnitContextFactory.java:352) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.compile(JavaScriptEngine.java:785) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.compile(JavaScriptEngine.java:751) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.compile(JavaScriptEngine.java:112) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.html.HtmlPage.loadJavaScriptFromUrl(HtmlPage.java:1122) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.html.HtmlPage.loadExternalJavaScriptFile(HtmlPage.java:1002) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.html.ScriptElementSupport.executeScriptIfNeeded(ScriptElementSupport.java:196) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.html.ScriptElementSupport$1.execute(ScriptElementSupport.java:120) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.html.ScriptElementSupport.onAllChildrenAddedToPage(ScriptElementSupport.java:143) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:191) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.endElement(HtmlUnitNekoDOMBuilder.java:551) ~[htmlunit-2.51.0.jar:2.51.0]
	at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source) ~[xercesImpl-2.12.0.jar:na]
	at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.endElement(HtmlUnitNekoDOMBuilder.java:503) ~[htmlunit-2.51.0.jar:2.51.0]
	at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1216) ~[neko-htmlunit-2.51.0.jar:2.51.0]
	at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1156) ~[neko-htmlunit-2.51.0.jar:2.51.0]
	at net.sourceforge.htmlunit.cyberneko.filters.DefaultFilter.endElement(DefaultFilter.java:219) ~[neko-htmlunit-2.51.0.jar:2.51.0]
	at net.sourceforge.htmlunit.cyberneko.filters.NamespaceBinder.endElement(NamespaceBinder.java:312) ~[neko-htmlunit-2.51.0.jar:2.51.0]
	at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3189) ~[neko-htmlunit-2.51.0.jar:2.51.0]
	at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2114) ~[neko-htmlunit-2.51.0.jar:2.51.0]
	at net.sourceforge.htmlunit.cyberneko.HTMLScanner.scanDocument(HTMLScanner.java:937) ~[neko-htmlunit-2.51.0.jar:2.51.0]
	at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:443) ~[neko-htmlunit-2.51.0.jar:2.51.0]
	at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:394) ~[neko-htmlunit-2.51.0.jar:2.51.0]
	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) ~[xercesImpl-2.12.0.jar:na]
	at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.parse(HtmlUnitNekoDOMBuilder.java:751) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoHtmlParser.parse(HtmlUnitNekoHtmlParser.java:208) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:297) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:217) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:684) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:586) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:501) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:413) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:548) ~[htmlunit-2.51.0.jar:2.51.0]
	at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:529) ~[htmlunit-2.51.0.jar:2.51.0]
Caused by: net.sourceforge.htmlunit.corejs.javascript.EvaluatorException: invalid property id (https://www.flashscore.com/res/_fs/build/loader.5714507.js#1)

@rbri
Copy link
Member

rbri commented Jul 21, 2021

Looks like another error - this time

invalid property id (https://www.flashscore.com/res/_fs/build/loader.5714507.js#1)

And this js is a huge minimized javascript. At least this uses the not supported syntax

function(...e){let t=this._configData;

I fear you have to wait until this is fixed in Rhino.

@rbri
Copy link
Member

rbri commented Mar 27, 2024

see #755

@rbri rbri added the js-engine Issues related to the js engine label Mar 27, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
js-engine Issues related to the js engine
Projects
None yet
Development

No branches or pull requests

2 participants