Discussion:
[ nunit-Bugs-1923458 ] Unexpected result from AreEqual/ANE for encoded strings
Charlie Poole
2008-03-23 18:11:40 UTC
Permalink
Hi All,

I'm forwarding this bug to the list for discussion. You can comment here or
on the bug.

The gist of it is that Assert.AreEqual on strings uses string.Compare and
can find two strings equal even if they use different encodings. This makes
sense to me but it isn't how string.Equals or operator == work.

Your thoughts?

Charlei

-----Original Message-----
From: Nobody [mailto:***@sc8-sf-web23.sourceforge.net] On Behalf Of
SourceForge.net
Sent: Saturday, March 22, 2008 10:26 PM
To: ***@sourceforge.net
Subject: [ nunit-Bugs-1923458 ] Unexpected result from AreEqual/ANE for
encoded strings

Bugs item #1923458, was opened at 2008-03-23 06:25 Message generated for
change (Tracker Item Submitted) made by Item Submitter You can respond by
visiting:
https://sourceforge.net/tracker/?func=detail&atid=110749&aid=1923458&group_i
d=10749

Please note that this message will contain a full copy of the comment
thread, including the initial issue submission, for this request, not just
the latest update.
Category: framework
Group: 2.4.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Morten Mertner (mnmr)
Assigned to: Nobody/Anonymous (nobody)
Summary: Unexpected result from AreEqual/ANE for encoded strings

Initial Comment:
I would expect the following test to pass, however, it does not.

The strings do show up as different in the debugger (obviously, since
they're encoded differently) and the built-in == operator correctly returns
false when comparing the two variables.

[Test]
public void Simple()
{
string input = "Hello World";
byte[] data = Encoding.UTF32.GetBytes( input );
string garbage = Encoding.UTF8.GetString( data );
Assert.AreNotEqual( input, garbage );
}


----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=110749&aid=1923458&group_i
d=10749



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
Kelly Anderson
2008-03-24 17:38:21 UTC
Permalink
On Sun, Mar 23, 2008 at 12:11 PM, Charlie Poole
Post by Charlie Poole
Hi All,
I'm forwarding this bug to the list for discussion. You can comment here or
on the bug.
The gist of it is that Assert.AreEqual on strings uses string.Compare and
can find two strings equal even if they use different encodings. This makes
sense to me but it isn't how string.Equals or operator == work.
Your thoughts?
If you compare an int to a double with AreEqual, doesn't it convert
and then test for equivalancy? Seems like the same deal here to me.
NUnit by default returns equivalence, if through a very simple
transformation, they have the same values.

They can always write their own Constraint if this is important to
their context as well.
Post by Charlie Poole
From my point of view, while this is an interesting edge case, it is
not, per se, a bug.

-Kelly

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
Daniel M. Pomerantz
2008-04-14 18:48:35 UTC
Permalink
Hi Charlie,

Been mia for a while. Got behind, then got married, and just now
getting caught up. Figured this would be a good one to comment on
first. My initial thought on this is that I don't really understand
it. How can a string have a different encoding than another? According
to MSDN, each System.String is "a series of Unicode characters." It
could also be thought of as a series of System.Char objects, and each
System.Char is "a Unicode character." No where for the documentation of
System.String do I see a way to specify the internal storage mechanism.
My interpretation has always been that Microsoft made an effort to
separate strings from the storage required for them, so while there
obviously is internal storage, it is opaque. That storage mechanism
could change from UTF-16 today to UTF-8 tomorrow, and no application
should care, because it is only dealing with the characters, not the
bytes. Is there something I am missing (likely) for how to specify a
specific encoding for a string? As for String.Equals() and ==, I've not
ever gotten them to behave differently than String.Compare(), but I've
not gotten separate strings to use separate encodings. There is a
culture difference, but I've never been in a situation where that makes
a difference, particularly for equality only tests.

Given that I consider a System.String to be encoding ambivalent, I would
think that the current behaviour, comparing individual Unicode
characters, discarding any encoding, makes perfect sense. Perhaps I am
thinking about it wrong, but I think it works as it should.

Thanks,

dmp
Post by Charlie Poole
Hi All,
I'm forwarding this bug to the list for discussion. You can comment here or
on the bug.
The gist of it is that Assert.AreEqual on strings uses string.Compare and
can find two strings equal even if they use different encodings. This makes
sense to me but it isn't how string.Equals or operator == work.
Your thoughts?
Charlei
-----Original Message-----
SourceForge.net
Sent: Saturday, March 22, 2008 10:26 PM
Subject: [ nunit-Bugs-1923458 ] Unexpected result from AreEqual/ANE for
encoded strings
Bugs item #1923458, was opened at 2008-03-23 06:25 Message generated for
change (Tracker Item Submitted) made by Item Submitter You can respond by
https://sourceforge.net/tracker/?func=detail&atid=110749&aid=1923458&group_i
d=10749
Please note that this message will contain a full copy of the comment
thread, including the initial issue submission, for this request, not just
the latest update.
Category: framework
Group: 2.4.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Morten Mertner (mnmr)
Assigned to: Nobody/Anonymous (nobody)
Summary: Unexpected result from AreEqual/ANE for encoded strings
I would expect the following test to pass, however, it does not.
The strings do show up as different in the debugger (obviously, since
they're encoded differently) and the built-in == operator correctly returns
false when comparing the two variables.
[Test]
public void Simple()
{
string input = "Hello World";
byte[] data = Encoding.UTF32.GetBytes( input );
string garbage = Encoding.UTF8.GetString( data );
Assert.AreNotEqual( input, garbage );
}
----------------------------------------------------------------------
https://sourceforge.net/tracker/?func=detail&atid=110749&aid=1923458&group_i
d=10749
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
nunit-developer mailing list
https://lists.sourceforge.net/lists/listinfo/nunit-developer
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
Loading...