Add tests for ByteArrayExtensions #38

M-Zuber · 2016-10-30T18:34:14Z

Fixed up some bugs found while testing.
Added UTF32 Big Endian
Closes #4

Fixed up some bugs found while testing. Added UTF32 Big Endian

vladislav-karamfilov

Using the file system for unit tests is not cool. In your case a better approach would be to use Encoding.<SomeEncoding>.GetBytes() method for getting the bytes of a string.

M-Zuber · 2016-10-31T21:25:51Z

Arguable, but you have a point.
Will verify in the morning that ...GetBytes() returns the proper encoding markers also, to make sure it is testing a as close to real life situation as possible.
If so will gladly update the PR.

While I have your attention, any thoughts on how to identify a UTF16 file with zero in first two bytes after marking as opposed to thinking it is UTF32?

M-Zuber · 2016-11-02T07:34:05Z

@Teodor92 fixed the issue with using the FS in the tests.

vladislav-karamfilov · 2016-11-06T18:36:15Z

@M-Zuber, I can't think of a better approach than the additional check bytes.Length % 4 == 0 added to the check for UTF-32. UTF-32 encoding is 4 bytes fixed-width and UTF-16 is variable 2 or 4 bytes so the check won't catch all cases correct but the change of getting the correct encoding is higher.

PS I'm sorry for my late answer. I was very busy last week...
PS2 Good job removing the file system dependency! 👍

M-Zuber · 2016-11-08T11:40:05Z

@vladislav-karamfilov thanks for the tip, added a check + test for a UTF16 buffer with leading zeros.

So this should be finished and ready to merge

vladislav-karamfilov · 2016-11-08T11:43:45Z

Source/MoreDotNet/Extensions/Common/ByteArrayExtensions.cs

@@ -33,18 +33,24 @@ public static string GetString(this byte[] buffer)
            {
                encoding = Encoding.UTF8;
            }
-            else if (buffer[0] == 0xfe && buffer[1] == 0xff)
+
+            // In addition to preamble check the length to help UTF16 with leading zeros be recognized properly as UTF32 is 4 bytes fixed with and UTF 16 is 2


Small typos:
fixed with -> fixed width
UTF16 -> UTF-16
UTF 16 -> UTF-16
UTF32 -> UTF-32

Also UTF-16 is has variable width of 2 or 4 bytes. It's good to update the comment. 😸

vladislav-karamfilov · 2016-11-08T11:45:28Z

It looks good to me. Just a few improvements in a comment and I can merge it. :-)

M-Zuber · 2016-11-08T11:47:10Z

Comments updated. Thanks.

Add tests for ByteArrayExtensions

6bcc1e6

Fixed up some bugs found while testing. Added UTF32 Big Endian

vladislav-karamfilov reviewed Oct 31, 2016

View reviewed changes

Remove use of the file system in the tests

86cf7da

Check length of buffer to help differentiate between UTF32 and UTF16

cfa53f1

vladislav-karamfilov requested changes Nov 8, 2016

View reviewed changes

Fix comments

06ff888

vladislav-karamfilov merged commit cb1228b into Teodor92:master Nov 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tests for ByteArrayExtensions #38

Add tests for ByteArrayExtensions #38

M-Zuber commented Oct 30, 2016

vladislav-karamfilov left a comment

M-Zuber commented Oct 31, 2016

M-Zuber commented Nov 2, 2016

vladislav-karamfilov commented Nov 6, 2016 •

edited

Loading

M-Zuber commented Nov 8, 2016

vladislav-karamfilov Nov 8, 2016

vladislav-karamfilov commented Nov 8, 2016

M-Zuber commented Nov 8, 2016

Add tests for ByteArrayExtensions #38

Add tests for ByteArrayExtensions #38

Conversation

M-Zuber commented Oct 30, 2016

vladislav-karamfilov left a comment

Choose a reason for hiding this comment

M-Zuber commented Oct 31, 2016

M-Zuber commented Nov 2, 2016

vladislav-karamfilov commented Nov 6, 2016 • edited Loading

M-Zuber commented Nov 8, 2016

vladislav-karamfilov Nov 8, 2016

Choose a reason for hiding this comment

vladislav-karamfilov commented Nov 8, 2016

M-Zuber commented Nov 8, 2016

vladislav-karamfilov commented Nov 6, 2016 •

edited

Loading