Skip to content

Intern Review Jotter

Stephen Fortune edited this page Jul 24, 2015 · 30 revisions

TOC

Purpose:

keep track of overall awareness of, and causes of confusions (or unclear code) within CSVlint
anything that would exceed length for a comment line should be stored in here

User Story / Sequence Evaluations

suggested documentation improvements listed in actions

_Data uploaded to the tool is deemed to be “pre-publication” so the validation reports are not logged. This allows publishers to validate and improve their data files before making them public. _ All other data is deemed to be public and validation reports are added to the list of recent validations. This provides a feedback loop to help highlight common errors. [src]

Having used the app I would say the above distinction is not very apparent in using the app
Action - make this more obvious

Action explain more thoroughly what revalidating the CSV with an additional dialect accomplishes

Code Comprehension

Overall

Package, Validation and Summary models all persist their data via the $object.create method (i.e. initialise and save in one go)

Current Data Persistence Assumptions

WRT - 463

Suspected Cause of CSV congestion - a quick way of storing file to "allow publishers to validate and improve their data files before making them public"
Stuart confirms there is no need for saving of CSV

Note regarding testing/byebug

rails tests empty the 'test' mongoDB after (or subsequent to) each test run, therefore testing the 'revalidation' is perhaps best done by hand (using revalidate.csv)

  • Completed task
  • Isolate existing specs and scenarios describing the behaviour which new data persistence solution must address
  • Remove GridFS, substitute in tempfile or similar [BC]
  • Fix the revalidation error [SF]
  • Incomplete task
  • Incomplete task

the method validation->csv appears to be only invoked rarely (see coveralls)

trace
ValidationController.update
^~> Validation.update_validation(dialect#Hash)
~~~~^~> Validation.csv

Cucumber Scenario

  Scenario: Upload a file for validation
    When I go to the homepage
    And I attach the file "csvs/valid.csv" to the "file" field
    And I press "Validate"
    Then I should see a page of validation results
    And my file should be persisted in the database
    And my file should be saved in the database

Commits of note

https://github.com/theodi/csvlint/commit/4c3f6f5c643c31c22ff8c4bded34f545de142159

https://github.com/theodi/csvlint/commit/2f3210e2d7fd3b2ea21cd128c0e3cd0769522a4f

https://github.com/theodi/csvlint/commit/f9b1ac53970ea3b41c60adc9d98fe3b5bbcd939e

Mongo GridFS -

^~> useful for describing some assumptions as to why GridFS is desirable

gem currently used by app = https://github.com/ahoward/mongoid-grid_fs

####Package Model

Package contains many validations, and validations and schemas are interlinked. Package is the superior thing

Quite confusing initially tracking back to how errors/warnings/info messages were created. Tracked it down to build_errors/build_warnings etc. in /lib/csvlint/error_collector.rb in csvlint.rb.

Validation Model

unpacking the Validation model

Validation model contains some confusing replication of the method name validate

self.validate = appears to create a Validator object using the CSVlint gem. This object appears to remain as a private object to this class - i.e. it cannot be accessed of manipulated by any class external to Validation class.

Summary Model

Responsible for the itemising of entries at http://csvlint.io/validation

Unknowns

What the purpose of delay functions are e.g. @package.delay.create_package

`// suspect that the above is tied into performance issues

what is going on in method validation>check_validation

RestClient.head(url, if_modified_since: updated_at.rfc2822 ) if updated_at
        update_validation(validator.dialect) if updated_at <= 2.hours.ago

Purpose of empty badge method in app/models/validation.rb

###Marshal(l)ing

a lazy way of storing the validation result returned by csvlint.rb. I can't see any reason why we can't be a bit smarter and save the results as, say an embedded Mongo object.

#operational code
app/controllers/package_controller.rb:47:    @dataset = Marshal.load(@package.dataset) rescue nil
app/controllers/validation_controller.rb:47:      data = Marshal.load(validation.result).data
app/models/package.rb:26:      :dataset => Marshal.dump(dataset),
app/models/validation.rb:56:      :result => Marshal.dump(validator).force_encoding("UTF-8")
app/models/validation.rb:167:    Marshal.load(self.result)
###tests
####spec
spec/controllers/package_controller_spec.rb:170:      validation = Marshal.load(Validation.first.result)
spec/controllers/package_controller_spec.rb:179:      validation = Marshal.load(Validation.first.result)
spec/controllers/package_controller_spec.rb:188:      validation = Marshal.load(Validation.first.result)
spec/controllers/validation_controller_spec.rb:29:       validator = Marshal.load Validation.first.result
spec/models/package_spec.rb:56:        result = Marshal.load validation.result
spec/models/package_spec.rb:95:        result = Marshal.load validation.result
spec/models/package_spec.rb:113:      package_dataset = Marshal.load(package.dataset)
spec/models/package_spec.rb:143:      result = Marshal.load package.validations.first.result
spec/models/package_spec.rb:214:      package_dataset = Marshal.load(package.dataset)

:result => Marshal.dump(validator).force_encoding("UTF-8")

is the code which takes the returned value of Csvlint::Validator.new() this assignment is invoked in the sequence

create_validation > validate (assigns Validation.validate to variable validation and calls mongoid function update_attributes with validation as arg.

app/models/package.rb:26: :dataset => Marshal.dump(dataset)

csvlint.rb
^~> initialise (builds multiple instance variables)
 ^~> validate
   ^~> parse_csv; validate_metadata

validator when serialised to string via Marshal has 22 variables
only 9 variables are made readable by attribute reader
so how is that data serialised? suspect that marshal returns results of methods

testing

create a validation validation.attributes.class = hash
validation.attributes.each do |k,v| puts v.class end _id => BSON::ObjectId
updated_at => Time
created_at =>Time
url => NilClass
filename => String
state => String
result => Csvlint::Validator
expirable_created_at => Time {"_id"=><BSON::ObjectId:0x70305574602940 data=55b205e85374657569000002>, "updated_at"=>2015-07-24 09:31:20 UTC, "created_at"=>2015-07-24 09:31:20 UTC, "url"=>nil, "filename"=>"valid.csv", "state"=>"valid", "result"=>#<Csvlint::Validator:0x007fe295979198 @formats=[{:string=>1}, {:string=>1}, {:string=>1}], @schema=nil, @supplied_dialect=false, @dialect={"header"=>true, "delimiter"=>",", "skipInitialSpace"=>true, "lineTerminator"=>:auto, "quoteChar"=>"""}, @csv_header=true, @limit_lines=nil, @csv_options={:col_sep=>",", :row_sep=>:auto, :quote_char=>""", :skip_blanks=>false, :encoding=>nil}, @extension=".csv20150724-30057-1g2hlp4", @errors=[], @warnings=[], @info_messages=[#<Csvlint::ErrorMessage:0x007fe295978658 @type=:assumed_header, @category=:structure, @row=nil, @column=nil, @content=nil, @constraints={}>], @encoding=nil, @content_type=nil, @headers=nil, @expected_columns=3, @col_counts=[3, 3], @data=[["firstname", "lastname", "status"], ["Sam", "Pikesley", "Prawn"], nil], @line_breaks="\r\n">, "expirable_created_at"=>2015-07-24 09:31:20 UTC} => {"_id"=><BSON::ObjectId:0x70305574602940 data=55b205e85374657569000002>, "updated_at"=>2015-07-24 09:31:20 UTC, "created_at"=>2015-07-24 09:31:20 UTC, "url"=>nil, "filename"=>"valid.csv", "state"=>"valid", "result"=>#<Csvlint::Validator:0x007fe295979198 @formats=[{:string=>1}, {:string=>1}, {:string=>1}], @schema=nil, @supplied_dialect=false, @dialect={"header"=>true, "delimiter"=>",", "skipInitialSpace"=>true, "lineTerminator"=>:auto, "quoteChar"=>"""}, @csv_header=true, @limit_lines=nil, @csv_options={:col_sep=>",", :row_sep=>:auto, :quote_char=>""", :skip_blanks=>false, :encoding=>nil}, @extension=".csv20150724-30057-1g2hlp4", @errors=[], @warnings=[], @info_messages=[#<Csvlint::ErrorMessage:0x007fe295978658 @type=:assumed_header, @category=:structure, @row=nil, @column=nil, @content=nil, @constraints={}>], @encoding=nil, @content_type=nil, @headers=nil, @expected_columns=3, @col_counts=[3, 3], @data=[["firstname", "lastname", "status"], ["Sam", "Pikesley", "Prawn"], nil], @line_breaks="\r\n">, "expirable_created_at"=>2015-07-24 09:31:20 UTC}

the last entries key is the same as its value??!!?