Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add tests for ByteArrayExtensions #38

Merged
merged 4 commits into from
Nov 8, 2016

Conversation

M-Zuber
Copy link
Contributor

@M-Zuber M-Zuber commented Oct 30, 2016

Fixed up some bugs found while testing.
Added UTF32 Big Endian
Closes #4

Fixed up some bugs found while testing.
Added UTF32 Big Endian
Copy link
Collaborator

@vladislav-karamfilov vladislav-karamfilov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the file system for unit tests is not cool. In your case a better approach would be to use Encoding.<SomeEncoding>.GetBytes() method for getting the bytes of a string.

@M-Zuber
Copy link
Contributor Author

M-Zuber commented Oct 31, 2016

Arguable, but you have a point.
Will verify in the morning that ...GetBytes() returns the proper encoding markers also, to make sure it is testing a as close to real life situation as possible.
If so will gladly update the PR.

While I have your attention, any thoughts on how to identify a UTF16 file with zero in first two bytes after marking as opposed to thinking it is UTF32?

@M-Zuber
Copy link
Contributor Author

M-Zuber commented Nov 2, 2016

@Teodor92 fixed the issue with using the FS in the tests.

@vladislav-karamfilov
Copy link
Collaborator

vladislav-karamfilov commented Nov 6, 2016

@M-Zuber, I can't think of a better approach than the additional check bytes.Length % 4 == 0 added to the check for UTF-32. UTF-32 encoding is 4 bytes fixed-width and UTF-16 is variable 2 or 4 bytes so the check won't catch all cases correct but the change of getting the correct encoding is higher.

PS I'm sorry for my late answer. I was very busy last week...
PS2 Good job removing the file system dependency! 👍

@M-Zuber
Copy link
Contributor Author

M-Zuber commented Nov 8, 2016

@vladislav-karamfilov thanks for the tip, added a check + test for a UTF16 buffer with leading zeros.

So this should be finished and ready to merge

@@ -33,18 +33,24 @@ public static string GetString(this byte[] buffer)
{
encoding = Encoding.UTF8;
}
else if (buffer[0] == 0xfe && buffer[1] == 0xff)

// In addition to preamble check the length to help UTF16 with leading zeros be recognized properly as UTF32 is 4 bytes fixed with and UTF 16 is 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typos:
fixed with -> fixed width
UTF16 -> UTF-16
UTF 16 -> UTF-16
UTF32 -> UTF-32

Also UTF-16 is has variable width of 2 or 4 bytes. It's good to update the comment. 😸

@vladislav-karamfilov
Copy link
Collaborator

It looks good to me. Just a few improvements in a comment and I can merge it. :-)

@M-Zuber
Copy link
Contributor Author

M-Zuber commented Nov 8, 2016

Comments updated. Thanks.

@vladislav-karamfilov vladislav-karamfilov merged commit cb1228b into Teodor92:master Nov 8, 2016
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants