WaterSurvey

The project used Python and selenium to get data from TWBD websites (click here). The major code is in file watersurvey.py. As you can see in the webpage, TWBD manages their databasethrough Microsoft SQL server. To get the county-level water use data for all counties in Texas, we have to select the county from the dropdown menu and click download one by one (and there are 254 counties in Texas). Thus, an automated procedures is develop utilizing the web testing tool selenium with Python for this purpose.

The code is in watersurvey.py with a class defined as TWDB_Scrapter and several functions which are self-explainary:

load_page: loads the TWDB websites
get_all_options: get all options (counties) from the dropdown menu and save it in a list
select_item: select one option
download_excel: click the button to generate and download the data into a excel file
turnoff_driver: turn off the webdriver

Then, the automated procedure could be accomplised the loop as below:

## set up working direct and weblinks ##
os.chdir(r'C:\Users\markp\Desktop\WaterSurvey')
weblink=f"http://www2.twdb.texas.gov/ReportServerExt/Pages/ReportViewer.aspx?%2fWU%2fSumFinal_CountyPumpage&rs:Command=Render"
sourcedir=r'C:\Users\markp\Downloads'
destdir=r'C:\Users\markp\Desktop\WaterSurvey\data'
filename="SumFinal_CountyPumpage.xlsx"

## Initialize the driver, get all counties in a list ##
scraper = TWDB_Scraper(weblink)
scraper.load_page()
all_options = scraper.get_all_options()
all_options = [x for x in all_options if x != '<Select a Value>']
test_options = all_options[98:len(all_options)]

## loop downloading ##
os.chdir(sourcedir)
count=0
for option in all_options[1:]:
    count=count+1
    scraper.select_item(option)
    scraper.download_excel()
    print('data for ' +option+ ' downloaded, ' +str(count)+' out of '+ str(len(all_options)))
    try:
        os.rename(filename,'countypump_'+option+'.xlsx')
        print('rename finished')
    except:
        print('rename failed')
        
## turn off driver after finish ##    
scraper.turnoff_driver()

## copy the file from download folder to destination folder ##
for filename in glob.glob(os.path.join(sourcedir, '*.xlsx')):
    shutil.move(filename, destdir)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
chromedriver.exe		chromedriver.exe
debug.log		debug.log
desktop.ini		desktop.ini
test.py		test.py
watersurvey.py		watersurvey.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WaterSurvey

About

Releases

Packages

Languages

License

MarkWang90/WaterSurvey

Folders and files

Latest commit

History

Repository files navigation

WaterSurvey

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages