-
Notifications
You must be signed in to change notification settings - Fork 23
How to do various things...
Contains answers to questions on how to do things related to the Cook County Jail project.
- select jail_id from countyapi_countyinmate group by jail_id having count(*) > 1;
How to find the earliest date that has all inmates booked on that date have a computed hash and that subsequent days all booked inmates have a computed hash.
This is needed to find out the start date of the V2 date set.
An SQL query will tell us this which day that is. To answer this we need to group entries by booking date which this query does and it sorts them and shows the largest first:
- select booking_date, count() from countyapi_countyinmate group by booking_date order by count() desc;
Now we need to find out how many of those booked have computed hashes for them, again grouped by booking_date:
- select booking_date, count() from countyapi_countyinmate where person_id is not null group by booking_date order by count() desc;
Now we have the count of inmates booked by day and the count of inmates who have computed hashes for them. When these two values are the same, we have a day where all booked inmates have a computed hash. Now we to find the day that all inmates have computed hashes and all subsequent days have do as well. The solution for this is messy and complicated, however if we turn it around and find all days where not all inmates have computed hashes and find the latest one of those, then the next day is the day we are looking for and this is a much easier equation to solve for:
- select t1.booking_date from
- (select booking_date, count(jail_id) as count from countyapi_county_inmate group by booking_date order by booking_date desc) t1,
- (select booking_date, count(jail_id) as count from countyapi_county_inmate where person_id is not null group by booking_date order by booking_date desc) t2
- where t1.booking_date = t2.booking_date and t1.count != t2.count limit 1;
When run against the Cook County Jail data set the answer is 2013-08-16 so the starting date in which it and all subsequent days have computed hashes is August 17th, 2013.
The technique of inverting the problem is often the best way to solve problems, so the next time you are faced with implementing a messy and complicated solution consider if it can be solved by inverting the problem and if that is easier to solve then solve and invert its solution to give you the answer you are looking for.