PowerBASIC Peer Support Forums
 

Go Back   PowerBASIC Peer Support Forums > User to user Discussions > PowerBASIC for Windows

PowerBASIC for Windows User to user discussions about the PB/Win (formerly PB/DLL) product line. Discussion topics include PowerBASIC Forms, PowerGEN and PowerTree for Windows.

Reply
 
Thread Tools Display Modes
  #1  
Old Apr 9th, 2012, 12:08 AM
Gary Beene Gary Beene is online now
Administrator
 
Join Date: May 2008
Location: Dallas, Tx
Posts: 12,859
Unicode String Strategy

I'm pondering the question of whether it's best to declare all strings as WString in original PBWin10 code, and also whether it's best to modify legacy code (when using the code in PBWin10) so that all strings are declared as WString?

Or is it just fine to have some strings declared as String and others as WString - according to your own personal taste?

The compiler will let you do both in an app, and PowerBASIC will do conversions for you in many cases, but are there considerations which make it advantageous to stick with all WString declarations?

One reason to support mixed declarations is to minimize the effort to convert legacy code to PBWin10. But I'd be willing to go to more conversion effort if I saw that there were some logical reasons to stick with a single type of declaration.

A downside to using WString declarations across the board is that legacy folks who have not updated to PBWin10 would have to make more changes in order to use posted PBWin10 code (only matters for code where PBWin9 supports the features used in that code).

In the PBWin10 samples, mixed declarations (String and WString) are found in most of the samples. So PowerBASIC Inc. doesn't appear to be promoting a single declaration strategy.
__________________
Reply With Quote
  #2  
Old Apr 9th, 2012, 12:26 AM
Stuart McLachlan Stuart McLachlan is offline
Member
 
Join Date: Mar 2000
Location: Port Moresby, Papua New Guinea
Posts: 1,693
I think I'll follow PB's advice in Help - Characters, Strings, and Unicode
Quote:
At this time, and for the foreseeable future, UTF-16 is the character set of choice for all of your applications. It is the best way to store all of your data to keep it secure and understandable.
Reply With Quote
  #3  
Old Apr 9th, 2012, 01:30 AM
Josť Roca Josť Roca is online now
Moderator
 
Join Date: Mar 2004
Location: Valencia, Spain
Posts: 6,774
I use unicode strings in new code unless there is a good reason to use ansi.
__________________
Website: http://www.jose.it-berater.org/index.html
SED Editor, TypeLib Browser.
Forum: http://www.jose.it-berater.org/smfforum/index.php
Reply With Quote
  #4  
Old Apr 9th, 2012, 07:02 AM
Fred Harris Fred Harris is offline
Member
 
Join Date: Jan 2000
Location: Shamokin, PA USA
Posts: 1,546
I have gone through the process of converting some of my older code to wide string, and what I have found for myself, is that it was easiest to leave code using text files alone, i.e., leave those variables as strictly ansi. In terms of general program logic code, wide string is just as easy as ansi, so going forward, wide string is more up to date. Likewise with Api code. I believe the general concensus is that all the ...A Api functions are just wrappers that convert ansi strings to wide, then call the ...W string versions.

Shortly after PBWin10 and CC6 came out I spent some time with this topic, and posted some code ideas here ...

http://www.powerbasic.com/support/pb...hlight=Unicode
__________________
Fred
"fharris"+Chr$(64)+"evenlink"+Chr$(46)+"com"

Last edited by Fred Harris; Apr 9th, 2012 at 07:21 AM.
Reply With Quote
  #5  
Old Apr 9th, 2012, 09:52 AM
Gary Beene Gary Beene is online now
Administrator
 
Join Date: May 2008
Location: Dallas, Tx
Posts: 12,859
Hi Stuart,
Thanks for pointing out the Help comment. But this part of the quote didn't satisfy my intellectual itch:
Quote:
It is the best way to store all of your data to keep it secure and understandable.
Surely that doesn't mean that all of the ANSI data we've been working with is non-secure and not understandable? I'd guess there's a more technical argument to made for adopting an all W String strategy?

And Fred,
Thanks for your thread. I scanned it briefly and it looks very useful. I'll go read it more carefully.

And Jose,
That was a tease!
Quote:
... unless there is a good reason to use ansi.
What might be some of those reasons? With PowerBASIC's built in conversions, does what you say ever happen?

The heart of this question leads to perhaps a "best practice" issue. When I/we post code for folks here on the forum, we'll want to provide code that gives the best guidance possible. Right now, I'm mostly happy to use mix and match ANSI/Unicode variable declarations, mostly because it's slightly easier given all of the legacy code I work with. But if I understood that there were clear pitfalls in the approach, I'd be more motivated to make the switch over to an all-W coding approach not only for myself, but for the folks who use our posted code as well.

Hence, the digging nature of my questions here. Please take no offense anyone if I try to peel away at any answers given.
__________________

Last edited by Gary Beene; Apr 9th, 2012 at 10:05 AM.
Reply With Quote
  #6  
Old Apr 9th, 2012, 10:02 AM
Michael Mattias Michael Mattias is offline
Member
 
Join Date: Aug 1998
Location: Racine WI USA
Posts: 37,067
Quote:
I believe the general concensus is that all the [WinAPI] ...A Api functions are just wrappers that convert ansi strings to wide, then call the ...W string versions.
Consensus? I surely hope not because that is incorrect.

These API functions are provided by Microsoft, and use different entry points within the named libraries. (Load up USER32.DLL or KERNEL32.DLL using Show exports and imports for PB/Win 6x, 7x (original by Torsten Reinow) and you will see what I mean).

"How" Microsoft implements these functions - by converting between Wide and ANSI (eg as a 'wrapper'), or by working on the "raw" data 'as is' - is proprietary. In any event, when you call any WinAPI function, you will pass or receive string data in the format documented by the publisher.

MCM
Reply With Quote
  #7  
Old Apr 9th, 2012, 10:02 AM
Gary Beene Gary Beene is online now
Administrator
 
Join Date: May 2008
Location: Dallas, Tx
Posts: 12,859
Hey Stuart,
One point I was going to make earlier was this line of code from Help, which is an easy to find example where PowerBASIC continues to use ANSI Strings:
Code:
CONTROL ADD BUTTON, hDlg,  id&, txt$, x, y, xx,  yy
So when you posted the Help comment that Unicode was the way to go, I thought of this example where Help shows that even PowerBASIC has yet to fully adopt the all-W strategy (I do assume they are converting internally).

It may be that PowerBASIC is headed towards converting all of it's statements to use WString arguments. But for now, when Help is full of ANSI arguments, it's hard to put too much weight into the Help quote from your earlier post.

If ANSI was good enough to leave in many of the PowerBASIC statements, how can I argue to newbies that it's a non-preferred practice?

The work-in-progress approach is certainly one argument. But I'd like to have something stronger - such as demonstrable, negative side effects of staying ANSI or of using a mixed strategy.
__________________
Reply With Quote
  #8  
Old Apr 9th, 2012, 10:10 AM
Michael Mattias Michael Mattias is offline
Member
 
Join Date: Aug 1998
Location: Racine WI USA
Posts: 37,067
Quote:
But I'd like to have something stronger - such as demonstrable, negative side effects of staying ANSI or of using a mixed strategy.
In the "pure SDK" environment, other than 'maintainability' I can't think of any reason to go "all W", "all "A" or "six-five and pick 'em'"

(See Post #6 this thread)

The publisher of the compiler will have to speak for himself re "mixed datatype string operands in assignments and expressions."

I tried a little "mix and match" last week just because I could and had no problems. However, since I mostly get to write stuff "new" and I want to get to "all Wide Char" I do everything with WSTRING and WSTRINGZ types where I formerly would have used STRING and ASCIIZ types. YMMV.

MCM
Reply With Quote
  #9  
Old Apr 9th, 2012, 11:13 AM
Fred Harris Fred Harris is offline
Member
 
Join Date: Jan 2000
Location: Shamokin, PA USA
Posts: 1,546
Because the actual implementations of the ...W verses ...A functions are proprietary, we can't really know what they do, naturally. However, I have seen Microsoft code demonstrating the technique of converting ansi strings to wide character strings, then calling W functions - surely not a difficult concept to comprehend. Because of this, it makes intuitive sense to just use wide character strings which don't require any further manipulations/conversions before being fed into some Api function. That's the only point I was trying to make Michael.
__________________
Fred
"fharris"+Chr$(64)+"evenlink"+Chr$(46)+"com"
Reply With Quote
  #10  
Old Apr 9th, 2012, 11:14 AM
Josť Roca Josť Roca is online now
Moderator
 
Join Date: Mar 2004
Location: Valencia, Spain
Posts: 6,774
Quote:
What might be some of those reasons? With PowerBASIC's built in conversions, does what you say ever happen?
Indeed. To avoid conversions if I don't need unicode, to use a string as a byte buffer, to use third party libraries that don't support unicode, to save .bas files to be compiled with PB..
__________________
Website: http://www.jose.it-berater.org/index.html
SED Editor, TypeLib Browser.
Forum: http://www.jose.it-berater.org/smfforum/index.php
Reply With Quote
  #11  
Old Apr 9th, 2012, 11:31 AM
Michael Mattias Michael Mattias is offline
Member
 
Join Date: Aug 1998
Location: Racine WI USA
Posts: 37,067
Quote:
However, I have seen Microsoft code demonstrating the technique of converting ansi strings to wide character strings, then calling W functions - surely not a difficult concept to comprehend.
I think those examples may have been provided to help programmers use functions for which ONLY a "wide-char" interface is supported... e.g. I think some of the "Shxxxx" functions are like that. ( I know I've run into this).

MCM
Reply With Quote
  #12  
Old Apr 9th, 2012, 11:57 AM
Jim Dunn Jim Dunn is offline
Member
 
Join Date: May 2001
Location: USA, North of Mexico, South of Canada
Posts: 631
Quote:
Originally Posted by Gary Beene View Post
... One reason to support mixed declarations is to minimize the effort to convert legacy code to PBWin10 ...
You could use all WSTRING in your examples, or a mix, and include a line at the top of your code:

Quote:
'#define WSTRING STRING ' uncomment if not using PBWin10 or PBCC6
or

Quote:
'MACRO WSTRING = STRING ' uncomment if not using PBWin10 or PBCC6
__________________
3.14159265358979323846264338327950
"Ok, yes... I like pie... um, I meant, pi."
Reply With Quote
  #13  
Old Apr 10th, 2012, 12:00 AM
Josť Roca Josť Roca is online now
Moderator
 
Join Date: Mar 2004
Location: Valencia, Spain
Posts: 6,774
Quote:
Originally Posted by Fred Harris View Post
Because the actual implementations of the ...W verses ...A functions are proprietary, we can't really know what they do, naturally.
We can. We don't have the source code, but we can know what it does.

Quote:
Originally Posted by Fred Harris View Post
However, I have seen Microsoft code demonstrating the technique of converting ansi strings to wide character strings, then calling W functions - surely not a difficult concept to comprehend. Because of this, it makes intuitive sense to just use wide character strings which don't require any further manipulations/conversions before being fed into some Api function. That's the only point I was trying to make Michael.
A typical example, the implementation of the API function CharLowerBuffA:

Code:
DWORD
WINAPI
CharLowerBuffA(LPSTR str, DWORD len)
{
    DWORD lenW;
    WCHAR *strW;
    if (!str) return 0;

    lenW = MultiByteToWideChar(CP_ACP, 0, str, len, NULL, 0);
    strW = HeapAlloc(GetProcessHeap(), 0, lenW * sizeof(WCHAR));
    if (strW) {
        MultiByteToWideChar(CP_ACP, 0, str, len, strW, lenW);
        CharLowerBuffW(strW, lenW);
        len = WideCharToMultiByte(CP_ACP, 0, strW, lenW, str, len, NULL, NULL);
        HeapFree(GetProcessHeap(), 0, strW);
        return len;
    }
    return 0;
}
It creates an unicode copy of the string with MultiByteToWideChar, calls CharLowerBuffW to do the conversion to lower case and then calls WideCharToMultiByte to convert it back to ansi.
__________________
Website: http://www.jose.it-berater.org/index.html
SED Editor, TypeLib Browser.
Forum: http://www.jose.it-berater.org/smfforum/index.php

Last edited by Josť Roca; Apr 10th, 2012 at 12:44 AM.
Reply With Quote
  #14  
Old Apr 10th, 2012, 01:00 AM
Josť Roca Josť Roca is online now
Moderator
 
Join Date: Mar 2004
Location: Valencia, Spain
Posts: 6,774
The point is that if someone is calling the API ansi functions thinking they should be faster and use less memory, he is wrong. They would be faster and use less memory if fully implemented in ansi, instead of being mere wrappers, but it is not the case.
__________________
Website: http://www.jose.it-berater.org/index.html
SED Editor, TypeLib Browser.
Forum: http://www.jose.it-berater.org/smfforum/index.php
Reply With Quote
  #15  
Old Apr 18th, 2012, 08:57 AM
Francisco Castanedo Francisco Castanedo is offline
Member
 
Join Date: Nov 2008
Location: Caracas, Venezuela
Posts: 232
I think all depends on the code of the program you're writting.

For example: I have a commercial program that uses ANSI only. It is for spanish speaking people only (it uses the same ASCII table as US English) so I haven't had the need to use UNICODE in it. I was when I implemented a security feature that uses a MySQL database that resides in my website's server when the need to use UNICODE to communicate with it arose. But only the portion of the program dealing with MySQL needed UNICODE and I just convert the WSTRINGs into STRINGs as needed.

On the other hand, I'm about to begin a multilingual project that will use web-based SQL databases. The only way I can accomplish this is by going UNICODE all the way. If I were to mix UNICODE and ANSI on this project it could lead to very annoying problems when showing text to the user and when communicating with the SQL database.

For in-house apps (utility programs) I use mostly ANSI and convert back-and-forth from/to UNICODE as needed.

Hope my point of view helps.
__________________
Francisco Castanedo
Software Developer
Distribuidora 3HP, C.A.
http://www.distribuidora3hp.com
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -4. The time now is 05:59 PM.


Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright © 1999-2011 PowerBASIC, Inc. All Rights Reserved.
Error in my_thread_global_end(): 1 threads didn't exit