Richard H. Schwartz's thoughts on Lotus messaging and collaboration technology, news, politics, and whatever else comes to mind.
PermaLinkVersion 1.1 Of My All Characaters Database Is Posted To The Code Bin
08:57:15 PM

I've finally gotten around to polishing it up a bit and posting it. Click here to get version 1,1 of my All Characters Database. It contains...


Individual Notes documents for every Unicode character, as seen here:

unicodeData4.jpg


Each document shows one rendered character, the 16 bit Unicode code point value in hex and in decimal, the UTF-8 representatuib if the code point in hex, the Unicode standard name for the character, a bunch of other information about the character taken from the standard UnicodeData.txt file maintained by the Unicode Consortium, including stuff to make any character set geek salivate, like the lowercase mapping to (0x00FC) and the decomposition into component characters U (0x0055) and dieresis (0x0308).


There are three views organizing these documents and displaying useful info. There is a view sorted by LMBCS code points, as shown here:


unicodeData3.jpg


There is also a view that is sorted by the actual character values, so it should show in your system's local sort order -- though you might have to rebuild the view to see that. Finally, and probably most useful for anyone dealing with characters as they are known to most of the rest of the world, there is a view is sorted by Unicode UTF-16 code points.


There is also single a large "All Unicode Characteres In Rich Text" document containing every single character, as seen here:


unicodeData5.jpg


The information in this document is largely redundant with that you see in the database views, but this is where it goes from reference source to useful tool. If you have written code for processing Notes rich text and you want to make sure that you're handling all LMBCS characters correctly, you can just feed this document through it and make sure that the output is what you expect.


Finally, if you look beneath the covers of the database, you will find the agents that I used to pull information fromt he UnicodeData.txt file and populate the database with all the documents for the individual characters, and the agent to create the the


(One last note: if you go to the Code Bin, you will see four documents labeled "All Unicode Characters Database". The second one, which is the only one containing the file UnicodeData-1.1.zip, is the one you want. The otheres are the original, and my attempts to edit the original to point to the updated version, which I did without realizing/remembering that the design of the Code Bin apparently includes versioning so that updates become new documents rather than overwriting what is there -- which is not particularly user-friendly given that the button one clicks just says "Edit Document" rather than "Create New Version".)


(2)

PermaLinkAdding UTF-8 To The All Characters Database
12:35:29 AM

Another follow-up the last two posts... I found some VB code for converting a single-character Unicode (UTF-16) string to a UTF-8 byte array, The code wasn't quite complete, but I added the missing logic and incorporated it into the agent that imports data frorm UnicodeData.txt into my UnicodeData.nsf database, so now I can display each character five ways: As a UTF-16 byte sequence in hex, as a decimal value, as a UTF-8 byte sequence in hex, as a LMBCS byte sequence in hex, and as a rendered character., Here's a screen shot of the most recent view layout.


unicodeData3.jpg


I'll probably have time to finish everything off and re-post the database to the OpenNTF Code Bin over the weekend.


On the other hand, I've been thinking... now that I have this handy database, should I write some code that does conversions? Other than for the purpose of creating this database, is there actually a need to do this sort of stuff in LotusScript?


(1)

PermaLinkAdding LMBCS To The All Characters Database
01:59:04 AM

Yesterday I described my All Unicode Characters Database, but there was something that I didn't mention: I was happy that I could create a database containing documents with all the Unicode character data, but I wanted to get the database to also show LMBCS code point values and I couldn't figure out how to do it. Today I solved that problem.


Actually, I knew I could do it all along with the Notes API, but I'm lazy. I didn't want to write an LSX, an @DbDriver DLL or a stand-alone program to do the job. I wanted the LotusScript agent that imports data from the UnicodeData.txt file to be able to do it. I knew that would require dropping down from LotusScript into the Notes API to get the raw byte values of a text item, but what I couldn't figure out was how to pass a data buffer to NSFItemGetText and then interpret the data as byte values. A byte array wouldn't work because passing arrays by reference from LotusScript to C actually passes a pointer to an internal LotusScript array structure rather than just passing a pointer to the array's data buffer. From what I've seen, LMBCS characters can be as many as 6 bytes long, so I considered passing a double or a currency variable by reference, but I can't think of any way to get at the byte values of either of those data types. Finally it occurred to me that the answer was to use a user-defined type containing 7 scalar byte variables. That did the trick.


lmbcs1.jpg


I haven't posted an update to the OpenNTF Code Bin yet. The documents and views are done, but I also want to update the agents that generate documents with all the character data in rich text. I should be able get to it sometime this week. I might also try to create a document that stores the character data in MIME instead of rich text.


(0)

PermaLinkNow In The Code Bin: All Unicode Characters DB
12:26:09 AM

I started my career in the IT business 27 years ago as a international products specialist at Wang Laboratories. I became something of an expert on dealing with different character sets supported on different localized versions of Wang hardware and software, and one of the indispensible tools that I used was the All Characters Document. It was a Wang Word Processing document that contained one of every possible character that could be represented, conveniently laid out in a matrix with 16 characters per row, making it easy to read the hexadecimal value of each character's actual byte value. It was not a big document, as in those days all character sets were just 7 or 8 bits, and Wang systems did not support any type of ISO 2022 style escape mechanism to extend character sets.


Even after I moved into working on email products at Wang, I did follow the evolving standards, so I was aware of what was going on with ISO 10646 and Unicode, but over the years, my knowledge of internationalization and localization has lagged a bit. On modern systems, most of the time there's nothing much to worry about, because everything just works. In the Lotus Notes and Domino world everything is LMBCS or a properly tagged MIME character set, conversions are done automatically when you need them in LotusScript and Java, and if you're working in the C API there are functions available for conversion whenever necessary. The few occasions where I've had to dig into character set issues have typically involved uncertainty about the system or JVM default character sets rather than any confusion about what's going on in the Lotus software.


A few months ago, however, I started looking into a MIME conversion issue involving a subset of the Japanese characters known as "Hankaku-Kana". I won't go into the details of the issue here, other than to say that it took me much longer than it really should have to analyze, understand, and solve the problem -- and that even when I had the solution I was thrown off track by one particular document because it turned out that it contained damaged LMBCS. About half-way through the process I realized that what I really needed was the Lotus Notes and Domino equivalent of the All Characters Document, so I created it.


Well... what I created is more than just a document. I created a database with one document per character, and a simple view that displays them. It's technically an All Unicode Characters Database, since the starting point for it was a file on the Unicode Consortium web site called UnicodeData.txt (there's a description of the format of the file here). I'm not 100% certain that everyting in LMBCS is also in Unicode, and I'm not sure that everything in Unicode is actually in the file, so there may be some gaps, but for many if not most intents and purposes, I think my database is a complete enough reference. Here is a screen shot of the All Characters view:


unicodeData1.jpg



In addition to creating the individual documents per charcter, I also created a couple of true All Characters Documents. I created one document that has 16 characters per line of rich text, and another with each character rendered on its own line of rich text. I did not go the trouble or programming a nice tabular matrix, so it's not quite as easy to work with these documents as it was with the one that I used at Wang, but in both cases I did put the code point values in the document along with the characters so these documents can serve as nice self-documenting test case data for any process that needs to work on rich text and be sure that it is handling all character conversions correctly. Here's a screen shot of the one-character-per-line document:


unicodeData2.jpg



One caveat to bear in mind when working with this database is that its ability to actually show all Unicode characters is potentially limited by a couple of things.There are large blocks of Unicode code points that are not rendered as characters on the screen on my computer, and for any given case I don't really know what the limiting factor is. I can think of two. I've already mentioned that I'm not sure that everything in LMBCS is in Unicode, but I'm also not sure that everything in Unicode is also in LMBCS. In fact, it wouldn't surprise me at all if there are things that have been added to Unicode since after LMBCS was finalized. Any Unicode characters that are not available in LMBCS will not display correctly in the database. Also, fonts come into play. I set up the form and rich text to use the font @Arial Unicode MS, but I'm not 100% certain that this font really renders all of Unicode. Naturally, and character that this font doesn't render correctly will not display correctly in the database. As i said above, however, for most intents and purposes this database is complete enough.


At 21 MB, the database is a bit larger than I want to have downloaded directly from my blog, and unlike many Notes databases it doesn't compress much at all when zipped, so I've uploaded it to the Code Bin on the OpenNTF site. You can download the database UnicodeData.nsf from there.


UPDATE: A new version has been posted. Read about it here.


(5)

PermaLink100+ Days Of Lotusphere
11:09:57 AM

Just a brief addendum to my thoughts on Lotusphere 2010...


Last year I estimated that if I leave out the pure travel days but do count any of the Saturdays before or Fridays after Lotusphere on which I had actually scheduled pre/post conference business events, then the OGS of Lotusphere 2009 was the 95th day that I had spent "at Lotusphere". That's probably plus-or-minus one, and the actual conference-attending days are only about 85, but I am very confident that this year I crossed the 100 days mark for total time spent in Orlando specifically for Lotusphere. My best estimate is that by Thursday of Lotusphere 2010 the tally was up to 104 days. Many other Lotusphere veterans are probably close to the same mark, and some who travel much longer distances to get to Orlando than I do could probably add their travel days to the total and get closer to 120 days of their lives that have been devoted to Lotusphere. It's not that I think that this is impressive in any real way to other people, but I do find it pretty amazing just for myself. If I add in some Lotus and Advisor DevCons, some of The View's Admin conferences, one user group meeting and several Penumbra Group meetings, it's clearly somewhere between 3 and 4 months of my life that have been spent at Lotus-oriented professional gatherings.


I can't think of anything other than family, school, and my various jobs during the course of my career, that has taken up a bigger chunk of my life. I can only think of one thing that I've done in my life that has been this consistent for more years, year after year, doing the same thing at (roughly) the same time of year, never missing the occasion... and that would be my family's annual visit to Cape Cod on Mother's Day weekend, which we started two years before the first Lotusphere so this will be our 20th coming up this May.


And to think, it all started with a bit of serendipity in an interview for a short-term VB contract

.


(0)

PermaLinkBetter Late Than Never, My Take On Lotusphere 2010 And Project Vulcan
03:56:11 PM

Smooth sailing, and full speed ahead.


I really can't summarize the mood of Lotusphere 2010 and my attitude toward Project Vulcan any better than that.


I'll try to explain, but first a digression: I want to say how fitting a nautical metaphor seems to me in this case.


You see, near the climax of one of my favorite sessions of Lotusphere 2010, Ed Brill put up a slide highlighting a truly dedicated Navy man who is also a true-believer Yellow-Bleeding Lotus man. He is an aviator, not a sailor, but he's Navy through-and-through, and he's my former CEO and a friend. I recommended that Ed contact Mike Griffes because he's a great example of a someone who keeps coming back to Lotus technology to solve difficult information sharing and management problems in different arenas, and I thought stories like his would be a great way to add a personal touch to Ed's oral history presentation.



Read More . . .
(7)

PermaLinkThe Schwartz Is Back With You.
07:02:01 PM

Meet the old blog, same as the new blog. But cleaner in look, and hopefully with more regular posting.


After a long period of blog neglect, I'm going to try to re-energize myself and start blogging regularly again. To help kick things off, I decided to do my first major reviison to the look of my blog in years. I decided to stick with the old blogsphere template (really old!), and just strip out almost all of the distraction. I may decide to upgrade or switch templates eventually, but for now this is the path of least resistance.


If there's anyone still subscribing, I guess I'll find out soon enough. Look for a couple of post-Lotusphere thoughts soon, and even some technical articles. Yes, I can still do them occasionally. I'm sure I can. I just have to do it.


The occasional bit of humor, or commentary on things of interest to me in the world around us, will also occasionally appear.


(6)

PermaLinkA Sampling Of President George W. Bush's Visits To Schools
01:12:36 PM

Wow. Hard to believe it's been two three and a half months since my last post. I've set up a posterous site, which I hope to use for the occasional quickie post. And as I'm doing in this case, I may sometimes expand on the quicke posts with a full-blown blog entry.


With all the ridiculousness that certain   people   conservative commentatators and politicians   pathological Obama-hating paranoid idiots   dangerous, opportunistic media-whores and the people who lose whatever facility they have for rational thought after listening to them   people   are raising about President Obama's upcoming speech to schoolchildren, in spite of the fact that President Reagan and President George H. W. Bush also gave speeches to schoolchildren and the nation somehow survived, I thought I'd ask "The Google" about President George W. Bush's visits to schools. Here's what I came up with:


Bush Visits Md. School to Promote Education Agenda Source: Fox!! Yet somehow in their fair-and-balanced coverage, they didn't include any coverage of the all the warnings from patriotic Americans about the danger of President Bush's diabolical plan to create a private army of students and take over the country! I better hope Bush's brownshirts don't read this post. They'll be after me for sure. Oh, wait... That didn't happen.


Bush Visits High School In Missouri Source: UPI. This was a campaign event! At a High School! Can you believe the brazenness of it? It was a blatantly partisan event, promoting the Cult of Bush, yet somehow nobody thought it at all unusual. No wonder he won re-election in 2004. He brainwashed all the children! OK... He did win, but there's no actual proof that the brainwashed High School vote was the deciding factor.


Kids Meet President (Pres Bush visits Tennessee school) Source: Weekly Reader. I have fond memories of Weekly Reader from when I was young. Look how it has been subvertied! President Bush made a publication that is read by millions of school children into a tool of his evil, anti-American, anti-Freedom agenda. That's why the Constitution was changed and he's still President today! Oh wait.. That didn't happen.


President and Mrs. Bush Visit New Orleans High School, Discuss Gulf Coast Recovery  Source: the White House web site. He undoubtedly discussed all those socialist aid programs for Katrina victims. Oh, Horrors!!! All those government programs that help undeserving people, rob us of our freedom and laid the groundwork for President Bush to subvert democracy and name himself President-for-Life. Oh, wait... That didn't happen.


Bush visits Greensburg, Kan., a town torn and then reborn after 2007 tornado Source: LA Times. The visit included speaking at High School graduation, no doubt to promote yet again his socialist disaster relief programs, which were part of his master plan to steal our freedom. Oh, wait. He kind of did steal our freedom -- mostly with the consent of Congress -- but the CIA, NSA, and Justice Department were his tools for that, not schoolchildren.


Bush Visits Alabama High School Hit By Tornado. Source: Fox. Here, yet again, he was no doubt indoctrinating students about those socialist relief program. It's amazing how nobody noticed the consistency of his pattern until after Bush declared the Democratic Partty a terrorist organization and canceled the 2008 election. Oh, wait. That didn't happen.


President Bush and Laura Bush Visit Ohio Elementary School Source: CNN. Look how early it all started!! He was in schools a month after he took office! By the time 2008 rolled around, the oldest kids in that school were voters! No wonder the Republicans won in a landslide! Oh, wait. They lost. Badly.


Bush Visits N.Y.C. School Source: Education Week. Look how young the kids were when Bush started his indoctrination program! First graders! We're doomed! Doomed, I tell you!


Bush Brings Smiles to Children During Philly School Visit Source: NBC Philadelphia. Look at this! He didn't stop, even after he lost the election! He was still visitng schools in January of 2009!. That was just befrore he declared martial law and canceled Obama's inauguration. Oh, wait... That didn't happen.


President Bush Visit to Bay St. Louis MS Schools. Source: Smugmug. This one is really, really diabolical! The first sentence of the report indicates that the venue is St. Stanislaus College, yet further down it reveals that it is really a High School! This is clear proof of a conspiracy to cover up Bush's nefarious activities long enough for him to finish his brain-washing campaign, dissolve Congress and the Supreme Court, and declare himself Dictator. Oh, wait... That didn't happen.


And last, but not least...The Drama In Sarasota Source: St. Petersburg Times. On the very morning of 9/11/2001, President Bush was pursuing his plan to build his private army of schoolchildren, He was visiting the Emma E. Booker Elementary School on that morning. Does anyone else see the obvious implications? 9-11 was part of Bush's plot to deflect attention from his plan to... Oh, wait... None of it happened. We don't need insane conspiracy theories to explain the fact that Bush was a terrible President who put his administration above the law and eroded the basis of our freedoms in the name of protecting them. And who, by the way, destroyed our economy and substituted partisanship for leadership at every turn, leaving a legacy to President Obama of a broke country fighting two foreign wars, and fighting an ideological war on the home front.



But on second thought, maybe there is a danger. How long has it been since Reagan gave his speech? And how old was Glenn Beck then?


(8)

Monthly Archive
Responses Elsewhere



About The Schwartz

rss.jpg


All opinions expressed here are my own, and do not represent positions of my employer.