Skip to content

Scale 7.0.0

Compare
Choose a tag to compare
@emimaesmith emimaesmith released this 23 Jul 13:14
· 301 commits to master since this release

NOTE!!!!!!!

The next release needs to tell users to transition to use input file meta-data (replacing parse system) and using ingest recipes (replacing trigger system) when these replacement features are added. Also the next release should comment on message changes so admins should pause Scale and flush jobs through before upgrading Scale.
When releasing Scale v6.0.0, we need to include prominently in the release notes that legacy jobs (legacy interface in their job type revision) will not be able to be queued (they will be ignored).

Major v7.0.0 Changes

-Due to major command message changes, when upgrading to Scale v7.0.0, admins should pause Scale and flush jobs through before upgrading.

  • Triggers are no longer supported. Users should update their strike configuration to include a recipe name.
  • Legacy job types are no longer supported and will not be able to be queued. Job Types should be ported to the Seed interface.

Seed Job Support

Scale now has support for Seed-compliant jobs. Seed is a standard designed to encapsulate batch jobs within Docker containers. Seed allows for a Dockerized job to specify its inputs, outputs, resource requirements, etc. and enables easy and fast deployment with a Seed-compatible job framework (such as Scale). Legacy job types are now considered deprecated. After upgrading to Scale v5.6.0, users should transition all of their jobs to be Seed-compliant. In Scale v7.0.0, all support for legacy job types will be removed.

Dependencies

  • DC/OS 1.10+
  • Docker 1.17.12+
  • ElasticSearch 5+
  • PostgreSQL 9.4+, PostGIS 2.0+
  • Message Broker (RabbitMQ 3.6 or Amazon Web Services SQS)
  • Vault 0.6.2+ (only if not using DC/OS Enterprise for secrets storage)

Breaking Changes

The v5 REST API has been deprecated and replaced by the v6/v7 REST API. See the REST API documentation for all of the details on the v6/v7 REST API. The following list details some of the major changes from v5 to v6/v7:

  • The /v5/jobs/executions/, /v5/job-executions/ and /v5/job-executions/{id}/ APIs are deprecated and have been replaced by /v6/jobs/{job-id}/executions/ and /v6/jobs/{job-id}/executions/{exe-num}/.
  • The /v5/recipes/{id}/ API has been replaced by a v6 version. The new v6 version has a different response: the "inputs" section is removed, the "data" section is renamed to "input", and the "recipe_type" section will only contain "id", "name", "version", "title", and "description". The /v5/queue/new-recipe/ and /v5/recipes/{id}/reprocess/ APIs are also replaced with new v6 versions with the same changes in the response.
  • The /v5/jobs/, /v5/jobs/{id}/, /v5/queue/new-job/, and /v5/queue/requeue-jobs/ APIs have been replaced with new v6 versions.
  • The v5 sources, products, and import/export REST APIs have been deprecated. There are no v6 versions of these APIs.
  • Deprecated the "reprocess_recipes" message in favor of a create-recipes message (Issue 1172)

Deprecated

  • Legacy job types are now deprecated. Please transition all of your job types to be Seed-compliant. Legacy jobs will no longer be queued (they will be ignored).
  • v5 Recipe Types are now deprecated. Please transition all of your recipe types to be v6 compliant.
  • Trigger Rules are now deprecated. Please transition all of your strikes to include a recipe type.

Known Issues

  • Batch is not yet completely functional. This is planned to be completed by 7.1.0

New Features

  • N/A

Enhancements/Updates

  • Now showing job/job type max tries (#1697)
  • Added is_active filtering to job type status (#1696)
  • Updated the way nvidia-docker is called (#1685)
  • Added an authentication toggle for testing purposes (#1672)
  • Added checklist for system testing (#1669)
  • Made media types not required to match (#1668)
  • Added a diagnostic recipe (#1667)
  • Added JSON output to result data (#1660)
  • Updated job_id and exe_num index to unique (#1659)
  • Removed old job_exe fields (#1658)
  • Kick off recipes from ingest jobs (#1656)
  • Allowed cancel of old strike jobs (#1644)
  • Removed old database updater code (#1641)
  • Attached conditions to optional output (#1634)
  • Added default registry to silo deploy (#1631)
  • Added documentation for the DATABASE_URL usage (#1617)
  • Added shoudl_be_retried field to error API (#1613)
  • Added retry of system errors (#1611)
  • Allowed editing of recipe type is_active flag (#1608)
  • Updated validation for seed manifest to match the seed cli (#1604)
  • Added seed-silo to the scale deploy (#1599)
  • set_recipe_input_data_v6 contained trigger references (#1591)
  • Temporarily disabled batches (#1581)
  • Added error when creating two errors with the same name/job-type-name combination (#1567)
  • Converted configurations to v6 (#1551)
  • Updated the recipe type documentation (#1540)
  • Populated legacy job data (#1537)
  • Added support for periods in API urls (#1536)
  • Replaced logstash with fluentd (#1531)
  • Updated the Job Type base serializer to include is_paused and unmet_resources (#1528)
  • Added missing v6 job type endpoints (#1527)
  • Added fields to recipes API for gallery view (#1526)
  • Backport decline offers patch (#1525)
  • Verified X-Pack protected Elasticsearch support (#1520)
  • Outstanding Scale offers in mesos (#1516)
  • Reset the unmet_resources flag after a while (#1511)
  • Added v7 support for all API calls (#1508)
  • Added the job-type-names endpoint (#1503)
  • Removed the v5 APIs (#1500,#1501,#1502)
  • Added job type resource warnings for resources with non-supported names (#1493)
  • Added resource warnings for jobs that will never be scheduled due to unfulfilled resources (#1492)
  • Added support for a Cancel All button (#1483)
  • Added ID to job type list REST API results (#1482)
  • Added warnings displaying cleanup failures (#1445)
  • Removed old recipe code (#1377)
  • Added details for queued jobs (#1335)
  • Disabled trigger rules in v5 (#1322)
  • Removed greedy GPU scheduling logic (#1306)
  • DCOS SDK integration (#1282)
  • Added the Scale v6 UI into Docker images (#1270)
  • Removed the trigger system (#1181)
  • Removed legacy job types (#1155)
  • Clear out database update (#1149)
  • Changed the job execution index to be unique (#1055)
  • Return all JSON fields in correct version for API called (#1042)
  • Scheduler Warnings/Errors (#1032)
  • Added 'Cancel All' functionality to Jobs list (#682)
  • Removed job type is_privileged (#621)
  • Removed the Job Type docker_params (#588)
  • Updated the Meso interface install (#406)
  • Django Auth Integration (#280)
  • Upgraded to the latest Mesos (#21)

Bug Fixes

  • Removed trying to set sharedmem (#1720)
  • Corrected issue with SILO_URL and UI deployment (#1709)
  • Corrected not being able to update scheduled job types (#1706)
  • Corrected invalid invalid resources warnings (#1705)
  • Corrected errors validation 400 response (#1701)
  • Blocking user from creating job and recipe with the name validation (#1695)
  • Corrected error validating recipe with condition node (#1693)
  • Corrected scans endlessly listed as QUEUED (#1691)
  • Corrected workspace description optional bug (#1689)
  • Corrected alter index bug (#1682)
  • Corrected bug where jobs weren't being canceled (#1666)
  • Corrected security vulnerability with dependency (#1661)
  • Corrected paged results returning 403 forbidden (#1654)
  • Corrected error when updating strike job input data (#1650)
  • Corrected error validating empty outputs (#1649)
  • Corrected requeuejobsview missing serializer_class (#1648)
  • Corrected stuck strike jobs issue (#1647)
  • Corrected error where v6 batches don't override priority (#1645)
  • Corrected stale status bug (#1642)
  • Corrected daily metrics job failure ([#1640](https://github.com/ngageoint/scale/issues/
  • Corrected job type details 500 error (#1639)
  • Corrected recipe types server 500 error (#1638)
  • Added logic to catch multiple POSTS of the same recipe type definition (#1636)
  • Corrected recipe type 500 error (#1633)
  • Corrected issue where bootstrap was not passing consistent BACKEND variable (#1628)
  • Corrected bug where legacy job types crash Scale (#1626)
  • Corrected the Scale Casino recipe type (#1624)
  • Corrected migration causing OOM with large dataset (#1620)
  • Corrected invalid .yml files bug (#1615)
  • Corrected reprocessing recipe doesn't use latest recipe revision bug (#1601)
  • Corrected requeue all button can't filter on job type in 5.9 (#1600)
  • Removed unnecessary required recipe type revision number from ingest configuration (#1598)
  • v5.9.x job-type validation throws no attribute error (#1595)
  • Corrected bug where updating recipe type revision broke strikes (#1594)
  • Corrected recipe message bomb (#1593)
  • Corrected REST API broken in 5.9.x (#1590)
  • Corrected bug where fluentd deploy sometimes drops (#1589)
  • Corrected batch job priority bug (#1586)
  • Corrected issue where batches were taking too long (#1580)
  • Corrected issue where updated recipe definition message wasn't being added to message factory (#1574)
  • Corrected error queueing v6 recipe from parse job (#1572)
  • Corrected issue where batch_id column in the scale_file table was empty (#1571)
  • Corrected issue where Scale UI wasn't working in IE 11 (#1568)
  • Corrected bug where couldn't edit legacy job type used in v6 recipe (#1566)
  • Corrected bug where INPUT_METADATA environment variable is too long (#1565)
  • Corrected workspace lost when superseding v6 recipe (#1564)
  • Corrected duplicate job type creation attempt returns 500 error (#1562)
  • Corrected job configuration bugs (#1559)
  • Corrected reprocess recipe failures (#1555)
  • Corrected infinite message loop (#1554)
  • Corrected bug where data_type_tags was being saved as a set() (#1552)
  • Check in GPU resource validation was missing (#1548)
  • Typo in models.py prevented logging from being displayed (#1457)
  • Single File Directory Bug (#1545)
  • Corrected bug in the 6.9.0 job_data save_parse_results (#1542)
  • Corrected issue in deployment where LOGSTASH_DOCKER_IMAGE was unset (#1541)
  • Corrected v5 job details serializer bug (#1538)
  • Corrected v6 error api filtering (#1524)
  • Added support for other characters in data type tags (#1518)
  • Corrected bootstrap failure on DCOS 1.10.9 (#1479)
  • Corrected bug where recipe jobs were stuck in pending state (#1438)
  • Corrected deadlocks when locking job models (#1102)

Database Migrations

These are relatively quick database changes applied using Django migrations to alter tables, constraints, indexes, etc. The migrations are applied during the start up of the Scale scheduler.

  • Added migration to move the old Ingest data_type value to the new data_type_tags value
  • Added migration to move the old ScaleFile data_type value to the new data_type_tags value
  • Added migration to update should_be_retried Error field
  • Added migration to convert legacy Job Type interfaces to seed manifests and update the RecipeJobTypeLink with the new legacy job type name
  • Added migration to convert legacy Strike Jobs input_data to the new data interface
  • Removed obsolete product/migrations/0008_auto_20170221_1413 migration
  • Deleted the BatchJob model
  • Deleted the BatchRecipe model
  • Removed obsolete Job columns as part of removing the v5 APIs: priority, timeout
  • Removed obsolete JobType columns as part of removing the v5 APIs: author_name, author_url, category, cpus_required, custom_resources, disk_out_const_required, disk_out_mult_required, docker_params, docker_privileged, error_mapping, is_operational, mem_const_required, mem_mult_required, priority, shared_mem_required, timeout, uses_docker, description, title, trigger_rule
  • Added unmet_resources column to the JobType model
  • Removed obsolete columns of the JobExecution table: stdout, stderr, status, results_manifest, results, pre_started, pre_exit_code, pre_completed, post_started, post_exit_code, post_completed, mem_scheduled, last_modified, job_started, job_metrics, job_exit_code, job_completed, error, environment, ended, disk_total_scheduled, disk_out_scheduled, cpus_scheduled, command_arguments
  • Replaced the JobExecution job and exe_num index together with unique together
  • Replaced the JobExecutionEnd job and exe_num index together with unique together
  • Replaced the JobExecutionOutput job and exe_num index together with unique together
  • Updated the storage/workspace description field to allow for Null values
  • Removed ScaleFile is_operational column
  • Removed RecipeType trigger_rule column
  • Removed RecipeType version column

Database Updates

These are long running database changes related to actual data row updates that run in the background in a Scale system task. These updates are performed while Scale is running so they don't increase down time when deploying a new Scale version.

  • Obsolete database updates were removed.
  • Database update added to convert legacy job type interfaces to seed manifests and to update the title of Job Types to LEGACY .