-
Notifications
You must be signed in to change notification settings - Fork 12
Intern Review Jotter
TOC
keep track of overall awareness of, and causes of confusions (or unclear code) within CSVlint
anything that would exceed length for a comment line should be stored in here
suggested documentation improvements listed in actions
_Data uploaded to the tool is deemed to be “pre-publication” so the validation reports are not logged. This allows publishers to validate and improve their data files before making them public. _ All other data is deemed to be public and validation reports are added to the list of recent validations. This provides a feedback loop to help highlight common errors. [src]
Having used the app I would say the above distinction is not very apparent in using the app
Action - make this more obvious
Action explain more thoroughly what revalidating the CSV with an additional dialect accomplishes
Package, Validation and Summary models all persist their data via the $object.create method (i.e. initialise and save in one go)
WRT - 463
Suspected Cause of CSV congestion - a quick way of storing file to "allow publishers to validate and improve their data files before making them public"
Stuart confirms there is no need for saving of CSV
Note regarding testing/byebug
rails tests empty the 'test' mongoDB after (or subsequent to) each test run, therefore testing the 'revalidation' is perhaps best done by hand (using revalidate.csv)
- Completed task
- Isolate existing specs and scenarios describing the behaviour which new data persistence solution must address
- Remove GridFS, substitute in tempfile or similar [BC]
- Fix the revalidation error [SF]
- Incomplete task
- Incomplete task
the method validation->csv appears to be only invoked rarely (see coveralls)
trace
ValidationController.update
^~> Validation.update_validation(dialect#Hash)
~~~~^~> Validation.csv
Cucumber Scenario
Scenario: Upload a file for validation
When I go to the homepage
And I attach the file "csvs/valid.csv" to the "file" field
And I press "Validate"
Then I should see a page of validation results
And my file should be persisted in the database
And my file should be saved in the database
Commits of note
https://github.com/theodi/csvlint/commit/4c3f6f5c643c31c22ff8c4bded34f545de142159
https://github.com/theodi/csvlint/commit/2f3210e2d7fd3b2ea21cd128c0e3cd0769522a4f
https://github.com/theodi/csvlint/commit/f9b1ac53970ea3b41c60adc9d98fe3b5bbcd939e
Mongo GridFS -
^~> useful for describing some assumptions as to why GridFS is desirable
gem currently used by app = https://github.com/ahoward/mongoid-grid_fs
####Package Model
Package contains many validations, and validations and schemas are interlinked. Package is the superior thing
Quite confusing initially tracking back to how errors/warnings/info messages were created. Tracked it down to build_errors/build_warnings etc. in /lib/csvlint/error_collector.rb in csvlint.rb.
unpacking the Validation model
Validation model contains some confusing replication of the method name validate
self.validate = appears to create a Validator object using the CSVlint gem. This object appears to remain as a private object to this class - i.e. it cannot be accessed of manipulated by any class external to Validation class.
Responsible for the itemising of entries at http://csvlint.io/validation
What the purpose of delay functions are e.g. @package.delay.create_package
`// suspect that the above is tied into performance issues
what is going on in method validation>check_validation
RestClient.head(url, if_modified_since: updated_at.rfc2822 ) if updated_at
update_validation(validator.dialect) if updated_at <= 2.hours.ago
Purpose of empty badge method in app/models/validation.rb
###Marshal(l)ing
a lazy way of storing the validation result returned by csvlint.rb. I can't see any reason why we can't be a bit smarter and save the results as, say an embedded Mongo object.
#operational code
app/controllers/package_controller.rb:47: @dataset = Marshal.load(@package.dataset) rescue nil
app/controllers/validation_controller.rb:47: data = Marshal.load(validation.result).data
app/models/package.rb:26: :dataset => Marshal.dump(dataset),
app/models/validation.rb:56: :result => Marshal.dump(validator).force_encoding("UTF-8")
app/models/validation.rb:167: Marshal.load(self.result)
###tests
####spec
spec/controllers/package_controller_spec.rb:170: validation = Marshal.load(Validation.first.result)
spec/controllers/package_controller_spec.rb:179: validation = Marshal.load(Validation.first.result)
spec/controllers/package_controller_spec.rb:188: validation = Marshal.load(Validation.first.result)
spec/controllers/validation_controller_spec.rb:29: validator = Marshal.load Validation.first.result
spec/models/package_spec.rb:56: result = Marshal.load validation.result
spec/models/package_spec.rb:95: result = Marshal.load validation.result
spec/models/package_spec.rb:113: package_dataset = Marshal.load(package.dataset)
spec/models/package_spec.rb:143: result = Marshal.load package.validations.first.result
spec/models/package_spec.rb:214: package_dataset = Marshal.load(package.dataset)
:result => Marshal.dump(validator).force_encoding("UTF-8")
is the code which takes the returned value of Csvlint::Validator.new()
this assignment is invoked in the sequence
create_validation > validate (assigns Validation.validate
to variable validation
and calls mongoid function update_attributes
with validation as arg.
app/models/package.rb:26: :dataset => Marshal.dump(dataset)
csvlint.rb
^~> initialise (builds multiple instance variables)
^~> validate
^~> parse_csv; validate_metadata
validator when serialised to string via Marshal has 22 variables
only 9 variables are made readable by attribute reader
so how is that data serialised? suspect that marshal returns results of methods
testing
create a validation
validation.attributes.class
= hash
validation.attributes.each do |k,v| puts v.class end
_id => BSON::ObjectId
updated_at => Time
created_at =>Time
url => NilClass
filename => String
state => String
result => Csvlint::Validator
expirable_created_at => Time
{"_id"=><BSON::ObjectId:0x70305574602940 data=55b205e85374657569000002>, "updated_at"=>2015-07-24 09:31:20 UTC, "created_at"=>2015-07-24 09:31:20 UTC, "url"=>nil, "filename"=>"valid.csv", "state"=>"valid", "result"=>#<Csvlint::Validator:0x007fe295979198 @formats=[{:string=>1}, {:string=>1}, {:string=>1}], @schema=nil, @supplied_dialect=false, @dialect={"header"=>true, "delimiter"=>",", "skipInitialSpace"=>true, "lineTerminator"=>:auto, "quoteChar"=>"""}, @csv_header=true, @limit_lines=nil, @csv_options={:col_sep=>",", :row_sep=>:auto, :quote_char=>""", :skip_blanks=>false, :encoding=>nil}, @extension=".csv20150724-30057-1g2hlp4", @errors=[], @warnings=[], @info_messages=[#<Csvlint::ErrorMessage:0x007fe295978658 @type=:assumed_header, @category=:structure, @row=nil, @column=nil, @content=nil, @constraints={}>], @encoding=nil, @content_type=nil, @headers=nil, @expected_columns=3, @col_counts=[3, 3], @data=[["firstname", "lastname", "status"], ["Sam", "Pikesley", "Prawn"], nil], @line_breaks="\r\n">, "expirable_created_at"=>2015-07-24 09:31:20 UTC} =>
{"_id"=><BSON::ObjectId:0x70305574602940 data=55b205e85374657569000002>, "updated_at"=>2015-07-24 09:31:20 UTC, "created_at"=>2015-07-24 09:31:20 UTC, "url"=>nil, "filename"=>"valid.csv", "state"=>"valid", "result"=>#<Csvlint::Validator:0x007fe295979198 @formats=[{:string=>1}, {:string=>1}, {:string=>1}], @schema=nil, @supplied_dialect=false, @dialect={"header"=>true, "delimiter"=>",", "skipInitialSpace"=>true, "lineTerminator"=>:auto, "quoteChar"=>"""}, @csv_header=true, @limit_lines=nil, @csv_options={:col_sep=>",", :row_sep=>:auto, :quote_char=>""", :skip_blanks=>false, :encoding=>nil}, @extension=".csv20150724-30057-1g2hlp4", @errors=[], @warnings=[], @info_messages=[#<Csvlint::ErrorMessage:0x007fe295978658 @type=:assumed_header, @category=:structure, @row=nil, @column=nil, @content=nil, @constraints={}>], @encoding=nil, @content_type=nil, @headers=nil, @expected_columns=3, @col_counts=[3, 3], @data=[["firstname", "lastname", "status"], ["Sam", "Pikesley", "Prawn"], nil], @line_breaks="\r\n">, "expirable_created_at"=>2015-07-24 09:31:20 UTC}
the last entries key is the same as its value??!!?