-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathreadme.txt
59 lines (47 loc) · 2.06 KB
/
readme.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
#####################################################
# WEBSCRAPPING #
# POLISH CONSTITUTIONAL TRIBUNAL #
# #
#####################################################
#####################################################
# REQUIREMENTS:
#####################################################
*Please download wkthmltopdf.exe, install it and specify
path to it in variable pathwkthmltopdf.
*Please install newest selenium driver
*Please download geckodriver and specify path to it in
variable driver(executable_path=...)
#####################################################
Below code is aimed at scraping jurisdiction accompanied
by separate opinions.
Specify filters and necessary parameters at PARAMETRIZATION
section!!!
Each method is described more thoroughly throughout the code
#####################################################
# OUTPUT:
#####################################################
Output is as below:
- outputL - main list containing:
* outputLDict - list of dictionaries in a form of JSON
containing fields:
** id - id of a jurisdiction
** link - direct link to the jurisdiction
** sign - signature name of a jurisdiction
** sep_opi - list of (if available) separate opinions
in a form of dictionaries with fields:
*** link - direct link to the separate
opinion
*** by - name and surname of the
separate opinion's author
* mostcommon5 - list of tuples with 5 most active
authors in separate opinions in a form of:
(name , number of separate opinions)
* file output saved in folders in a following way:
** /ID_SIGNATURE - here are all PDF and HTML files
relating to a separate jurisdiction stored, each file
named as ID_SINGATURE.PDF (.HTML)
** /ID_SIGNATURE/separate_opinions - here are all
PDF and HTML files relating to each separate opinion
of a single jurisdiction stored, named as
ID_SIGNATURE_BY.PDF (.HTML)
Program also produces a log file containing DEBUG info.