text conversion?

joeMich · Post by **joeMich** » Mon Feb 26, 2024 4:42 pm

Hi

For my work as a church musician I have a stack with all the metrics used in our different hymnals
Each card has its own metric variation and the song using that metric are mentioned on that card
You could call it a Stone Age database - it is a very simple form of database indeed

Now I have made scripts to export the entire database to at textfile - which will be used as a data file for a web page, where you can search for hymns by their number or title (and many more things)

My problem is that when I just export the file as is - the web page reads some chars differently.
I did make a little add-on to the script for some common anomalities as "æ, ø, å" - very common in danish

But now that I have added some more hymn book with foreign titles (German and Swedish) I cannot "translate" the umlaut chars (ä, ü, ö) and their capital letter variants

I have been trying to read about the different text conversion functions. But I haven't found a solution to my problem

best regards

Here's a link to one of the web pages
http://www.johanmichaelsen.dk/Hvor-er-sangene/
- which is made in php

Here is a part of the script:

--lille oversætter til specialtegn - da jeg ikke kan få det til at virke med online salmesedlen ellers!!!!
repeat with i = 1 to the number of chars in tempResultat
put char i of tempResultat into tChar
if tChar = "æ" then
put "Ê" into tChar
else if tChar = "ø" then
put "¯" into tChar
else if tChar = "å" then
put "Â" into tChar
else if tChar = "Æ" then
put "∆" into tChar
else if tChar = "Ø" then
put "ÿ" into tChar
else if tChar = "Å" then
put "≈" into tChar
--else if tChar = "ä" then
--put "Ã" into tChar

else if tChar = "È" then
put "…" into tChar
else if tChar = "é" then
put "È" into tChar
end if
put tChar after tempResultat2
end repeat

---put uniDecode(uniEncode(tempResultat,"utf8")) into tempResultat2
---put uniEncode(tempResultat,"utf8") into tempResultat2
---put uniDecode(tempResultat,"utf8") into tempResultat2

--put tempResultat into url destFil
put tempResultat2 into url destFil
---put textDecode(tempResultat,"UTF-8") into url destFil
---put textDecode(tempResultat,"ASCII") into url destFil

dunbarx · Post by **dunbarx** » Mon Feb 26, 2024 6:37 pm

Hi.

I do not use unicode at all, but do not quite understand your issue.

If you know beforehand the mapping of each character based on the language of interest, as you showed in the "translator" you posted, why not continue to explicitly substitute one character for another, based on the "language" required.

Now there may be a simple unicode solution to this, that does not require that "translator". And EVERYTHING is present in the unicode character set. But what went wrong with the method you already have? Or is it that you do not want to have to do that at all?

Craig

dunbarx · Post by **dunbarx** » Mon Feb 26, 2024 6:41 pm

Rereading, is it the CAPITAL letter versions that is the issue? Does the "caseSensitive" property help here?

Craig

stam · Post by **stam** » Mon Feb 26, 2024 7:09 pm

joeMich wrote: ↑

Mon Feb 26, 2024 4:42 pm

Here is a part of the script:

Code: Select all

  --lille oversætter til specialtegn - da jeg ikke kan få det til at virke med online salmesedlen ellers!!!!
  repeat with i = 1 to the number of chars in tempResultat
      put char i of tempResultat into tChar
      if tChar = "æ" then
         put "Ê" into tChar
      else if tChar = "ø" then
         put "¯" into tChar
      else if tChar = "å" then
         put "Â" into tChar
      else if tChar = "Æ" then
         put "∆" into tChar
      else if tChar = "Ø" then
         put "ÿ" into tChar
      else if tChar = "Å" then
         put "≈" into tChar
         --else if tChar = "ä" then 
         --put "Ã" into tChar
         
      else if tChar = "È" then
         put "…" into tChar
      else if tChar = "é" then 
         put "È" into tChar
      end if
      put tChar after tempResultat2
   end repeat
   
   
   
   ---put uniDecode(uniEncode(tempResultat,"utf8")) into tempResultat2
   ---put uniEncode(tempResultat,"utf8") into tempResultat2
   ---put uniDecode(tempResultat,"utf8") into tempResultat2
   
   
   --put tempResultat into url destFil
   put tempResultat2 into url destFil
   ---put textDecode(tempResultat,"UTF-8") into url destFil
   ---put textDecode(tempResultat,"ASCII") into url destFil

Not sure I understand this - why are you using a translator for 'special characters'? And after replacing the unicode chars with other tokens, you are decoding as unicode? These should all be valid unicode characters, there should be no reason to do a replacement cipher like this?

Regarding upper case etc - presumably you have tried toUpper() and so on?

Perhaps if you share a small test stack that contains some of the data you wish to export and some detail about the export format, it will be easier for others to help...

Stam

PS: Please use the code tags when sharing code - the button with the "</>" symbol, 5th along from the Bold button - just selected the code you paste in and click the button.

dunbarx · Post by **dunbarx** » Mon Feb 26, 2024 8:15 pm

Stam.

I tried some of the chars the OP posted, and "toUpper" does not work for all of them, though does for some.

The "translation" he requires seems to be some sort of explicit one-to-one relationship. These would have to come, assuming unicode cannot just do it on its own, from a custom look-up table for each language.

Craig

richmond62 · Post by **richmond62** » Tue Feb 27, 2024 9:40 am

Here's one of Richmond's "boring" stories.

In 1996, returning from the USA to Bulgaria, I worked for a bit with a font design studio using Fontographer to make Bulgarian fonts for the government and various government institutions (Universities and so forth).

These fonts were NOT Unicode and involved putting Bulgarian Cyrillic glyphs into the second ASCII table (129 - 255).

Come 2010 and I end up "paying for my sins".

Several academics at the local University had electronic documents encoded in that pre-unicode Bulgarian font layout, which came out as complete gobble-de-gook in Windows XP and so forth.

So I had to sit down and design a "Bulgarian Text Muncher" that read in the pre-unicode Bulgarian font layout texts char-by-char and swapped each one for its unicode equivalent and exported the converted text.

I did NOT mess around with stuff such as "Щ" but opened the pre-unicode Bulgarian font the text was encoded in in a font editor (fontforge: https://fontforge.org/en-US/) and write down (shock, horror: a piece of paper with a pencil) the ASCII address for each one of the Bulgarian Cyrillic glyphs: both upper and lower case.

Then I popped over here: https://unicode.org/charts/ and downloaded the relevant PDF with the Unicode standard addresses for Cyrillic.

From that to a fairly bulky SWITCH statement to perform stuff like this:

(not sure, as this is slightly mental, if I would entirely trust this code)

Code: Select all

switch CHARX
          case codepointToNum(CHARX) is (0xDF)
          put numToCodepoint(0x0411) into CHARX
          break

This stack suffers from only 1 problem:

I have lost it!

It chewed its way through those text files in an antiquated font layout as fast as a fast thing.

joeMich · Post by **joeMich** » Tue Feb 27, 2024 10:43 am

Hi again

First of all: Thanks for responses!!!

I'm sure I was not good at explaining my problem:

At first I just exported the text of card flds into a txt-file. When I open that on my Mac it's looks all fine.
But when I import the text (via PHP) certain chars are replaced

I guess it might well have something to do with how PHP reads the plain txt-file with non aschii chars...

here's a screenshot of some of the text with and without the conversion

best regards
Johan

stam · Post by **stam** » Tue Feb 27, 2024 10:57 am

is that not an issue with PHP reading the file as ASCII rather than unicode text?

joeMich · Post by **joeMich** » Tue Feb 27, 2024 11:08 am

the snippet that you brought into the discussion, richmond22, is interesting.
I'll have to dig a little more to understand how it works

richmond62 · Post by **richmond62** » Tue Feb 27, 2024 11:09 am

Can't you import the text file as a TEXT file?

No PHP.

You might like to think of exporting your data from the database as some sort of item-delimited text file . . .

Klaus · Post by **Klaus** » Tue Feb 27, 2024 12:33 pm

Hi Joe,

joeMich wrote: ↑
Tue Feb 27, 2024 10:43 am
...
At first I just exported the text of card flds into a txt-file. When I open that on my Mac it's looks all fine.
But when I import the text (via PHP) certain chars are replaced...

How did you export the text from LC to TXT file on your Mac?
Just a:

Code: Select all

put the text of fld "dansk" into url("file:...")

?

joeMich · Post by **joeMich** » Tue Feb 27, 2024 1:19 pm

Yes, Klaus

simply put into a text-file

Code: Select all

on mouseUp
   
   set the itemdel to "/"
   get the effective filename of this stack
   delete item -1 of it
   put it & "/" into stakkensFilsti
   ---nu har vi stien for mappen, der indeholder stakken
   ---lav evt en datamappe
   ---put it & "/datamappe/" into dataMappensFilsti
   put stakkensFilsti & "xport af versemaal/" into dataMappensFilsti
   put dataMappensFilsti & "versemaal.txt" into filensSti
   
   put "file:" & filensSti into destFil
   ---det er den fil som enten skabes eller skrives til
   
   put "versemålsnr,linjer,metrik,stavelser2" into meterListen
   put "CSnr,KSnr,Hjsk19nr,Hjsk17nr,GTLnr,DDTnr,glDDSnr,nyDDSnr,vmliste,vers" into salmeListen
   put "nyDDKnr,glDDKnr,andenMelBognr,prefNr,prefGlNr" into koralListen
   put "" into tempResultat
   --put tab into adskilningstegn
   put "|" into adskilningstegn
   
   set cursor to busy
   set lockScreen to true
   put the seconds into startTid
   
   put "0" into antalKortBearbejdet
   put "1" into linNumresultat
   put the number of cards of this stack into tempSidsteSide
   repeat with x = 2 to tempSidsteSide -------357 -----alle kortene!!!
      
      go card x
      
      put "" into tempTempResultat ---det midlertidige output (for dette kort!)
      -------put "" into tempResultat ---det endelige output (for alle definerede kort!)
      
      repeat with i = 1 to the number of lines in fld "vmliste"
         
         
         set the itemDelimiter to ","
         repeat with u = 1 to the number of items in meterListen
            put fld (item u of meterListen) & adskilningstegn after line i of tempTempResultat
         end repeat
         
         
         
         set the itemDelimiter to ","
         repeat with u = 1 to the number of items in salmeListen
            if item u of salmeListen = "vmliste" then
               --------- her skal lidt ekstrabehandling til
               put line i of fld (item u of salmeListen) into afkortetLinje
               delete word 1 to 2 of afkortetLinje
               if char 1 of afkortetLinje = " " then
                  delete char 1 of afkortetLinje
               end if
               
               ---get afkortetLinje
               put afkortetLinje into tempAfkortet
               if ";" is in tempAfkortet then
                  put char 1 to offset (";", tempAfkortet) - 1 of tempAfkortet into afkortetLinje
               else if "(" is in tempAfkortet then
                  put char 1 to offset ("(", tempAfkortet) - 1 of tempAfkortet into afkortetLinje
               else
                  put tempAfkortet into afkortetLinje
               end if
               
               replace tab with "" in afkortetLinje
               put afkortetLinje & adskilningstegn after line i of tempTempResultat
               ---put erstatVanskeligeBogstaver(afkortetLinje) & adskilningstegn after line i of tempTempResultat
               -----------
            else
               if line i of fld (item u of salmeListen) = "-" then
                  put "" & adskilningstegn after line i of tempTempResultat
               else if line i of fld (item u of salmeListen) = "÷" then
                  put "" & adskilningstegn after line i of tempTempResultat
               else if line i of fld (item u of salmeListen) = "" then
                  put "" & adskilningstegn after line i of tempTempResultat
               else
                  put line i of fld (item u of salmeListen) & adskilningstegn after line i of tempTempResultat
               end if
            end if
         end repeat
         
         
         
         set the itemDelimiter to ","
         repeat with u = 1 to the number of items in koralListen
            put "" into koralPræfiks
            put item u of koralListen into aktuelKoralbog
            if aktuelKoralbog = "nyDDKnr" then
               put "K " into koralPræfiks
            else if aktuelKoralbog = "glDDKnr" then
               put "gK " into koralPræfiks
            end if
            if line i of fld (item u of koralListen) = "-" then
               put "" & adskilningstegn after line i of tempTempResultat
            else if line i of fld (item u of koralListen) = "÷" then
               put "" & adskilningstegn after line i of tempTempResultat
            else if line i of fld (item u of koralListen) = "" then
               put "" & adskilningstegn after line i of tempTempResultat
            else
               put line i of fld (item u of koralListen) into tempKoralLinje
               put "," & koralPræfiks into kommaPræfiks
               replace "," with kommaPræfiks in tempKoralLinje
               put koralPræfiks & tempKoralLinje & adskilningstegn after line i of tempTempResultat
            end if
         end repeat
         
         put adskilningstegn after line i of tempTempResultat
         put "jabadaba" after line i of tempTempResultat
         replace tab with "" in tempTempResultat
      end repeat
      
      
      ------put return & tempTempResultat after url destFil
      ---put the number of lines of tempResultal into linNumresultat
      ---add 1 to linNumresultat
      put  tempTempResultat & return after tempResultat
      ---put tempTempResultat into line linNumresultat of tempResultat
      ---put the number of lines of tempResultal into linNumresultat
      ---add 1 to linNumresultat
      add 1 to antalKortBearbejdet
   end repeat ---- rep-loop for alle kortene!!
   
   --put uniEncode(tempResultat)  into url destFil
   
   
   put "" into tempResultat2
   set the caseSensitive to true
   
   
   /*
   HENTET FRA NETTET:
   
   
   --Ä
   replace "Ã„" with "Ä" in eingabe
   -- Ö
   replace "Ã–" with "Ö" in eingabe
   -- Ü
   replace "Ãœ" with "Ü" in eingabe
   -- ä
   replace "Ã¤" with "ä" in eingabe
   -- ö
   replace "Ã¶" with "ö" in eingabe
   -- ü
   replace "Ã¼" with "ü" in eingabe
   -- ß
   replace "ÃŸ" with "ß" in eingabe
   -- É
   replace "Ã‰" with "Ö" in eingabe
   */
   
   
   --lille oversætter til specialtegn - da jeg ikke kan få det til at virke med online salmesedlen ellers!!!!
   repeat with i = 1 to the number of chars in tempResultat
      put char i of tempResultat into tChar
      if tChar = "æ" then
         put "Ê" into tChar
      else if tChar = "ø" then
         put "¯" into tChar
      else if tChar = "å" then
         put "Â" into tChar
      else if tChar = "Æ" then
         put "∆" into tChar
      else if tChar = "Ø" then
         put "ÿ" into tChar
      else if tChar = "Å" then
         put "≈" into tChar
         --else if tChar = "ä" then 
         --put "Ã" into tChar
         
      else if tChar = "È" then
         put "…" into tChar
      else if tChar = "é" then 
         put "È" into tChar
      end if
      put tChar after tempResultat2
   end repeat
   
   
   
   ---put uniDecode(uniEncode(tempResultat,"utf8")) into tempResultat2
   ---put uniEncode(tempResultat,"utf8") into tempResultat2
   ---put uniDecode(tempResultat,"utf8") into tempResultat2
   
   
   --put tempResultat into url destFil
   put tempResultat2 into url destFil
   ---put textDecode(tempResultat,"UTF-8") into url destFil
   --put textDecode(tempResultat,"ASCII") into url destFil
   
   put the seconds into slutTid
   subtract startTid from slutTid
   -------convert slutTid from english seconds to long system Time
   put slutTid && sekunder && "antal kort: " && antalKortBearbejdet
   set lockScreen to false
end mouseUp

joeMich · Post by **joeMich** » Tue Feb 27, 2024 1:21 pm

and I read from the text-file into memory via php with

Code: Select all

$versemaalLinjer = file_get_contents("versemaal.txt");

richmond62 · Post by **richmond62** » Tue Feb 27, 2024 1:28 pm

So: here's a fairly silly illustration of what I meant in my last post + a correction.

I'll start with the correction:

LiveCode cannot work "in both directions" in Hexadecimal, so unicode addresses must be converted to decimal values.

This is one of the many reasons I have been involved in a log-term love affair with calculator apps (this is the macOS 12 one):
-

-

-
This silly example takes every lower case letter of the English Latin alphabet and replaces it with the equivalent upper case letter of the English Latin alphabet.

Code: Select all

on mouseUp
   put empty into fld "fOUTPUT"
   put fld "fINPUT" into TEXTX
   repeat until TEXTX is empty
      put char 1 of TEXTX into CHARX
      put codepointToNum(CHARX) into CHARP
      switch CHARP
         case 97
            put numToCodepoint(65) after fld "fOUTPUT"
            break
         case 98
            put numToCodepoint(66) after fld "fOUTPUT"
            break
         case 99
            put numToCodepoint(67) after fld "fOUTPUT"
            break
         case 100
            put numToCodepoint(68) after fld "fOUTPUT"
            break
         case 101
            put numToCodepoint(69) after fld "fOUTPUT"
            break
         case 102
            put numToCodepoint(70) after fld "fOUTPUT"
            break
         case 103
            put numToCodepoint(71) after fld "fOUTPUT"
            break
         case 104
            put numToCodepoint(72) after fld "fOUTPUT"
            break
         case 105
            put numToCodepoint(73) after fld "fOUTPUT"
            break
         case 106
            put numToCodepoint(74) after fld "fOUTPUT"
            break
         case 107
            put numToCodepoint(75) after fld "fOUTPUT"
            break
         case 108
            put numToCodepoint(76) after fld "fOUTPUT"
            break
         case 109
            put numToCodepoint(77) after fld "fOUTPUT"
            break
         case 110
            put numToCodepoint(78) after fld "fOUTPUT"
            break
         case 111
            put numToCodepoint(79) after fld "fOUTPUT"
            break
         case 112
            put numToCodepoint(80) after fld "fOUTPUT"
            break
         case 113
            put numToCodepoint(81) after fld "fOUTPUT"
            break
         case 114
            put numToCodepoint(82) after fld "fOUTPUT"
            break
         case 115
            put numToCodepoint(83) after fld "fOUTPUT"
            break
         case 116
            put numToCodepoint(84) after fld "fOUTPUT"
            break
         case 117
            put numToCodepoint(85) after fld "fOUTPUT"
            break
         case 118
            put numToCodepoint(86) after fld "fOUTPUT"
            break
         case 119
            put numToCodepoint(87) after fld "fOUTPUT"
            break
         case 120
            put numToCodepoint(88) after fld "fOUTPUT"
            break
         case 121
            put numToCodepoint(89) after fld "fOUTPUT"
            break
         case 122
            put numToCodepoint(90) after fld "fOUTPUT"
            break
         default
           put " " after fld "fOUTPUT"
      end switch
      delete char 1 of TEXTX
   end repeat
end mouseUp

richmond62 · Post by **richmond62** » Tue Feb 27, 2024 1:32 pm

I usually import text into a field like this:

Code: Select all

on mouseUp
   answer file "Choose a TEXT file to import"
   if the result = "cancel" 
   then exit mouseUp
   else
      set the TEXT of fld "fRESULT" to URL ("file:" & it)
   end if
end mouseUp

Sometimes I use RTFtext.

LiveCode Forums

text conversion?

text conversion?

Re: text conversion?

Re: text conversion?

Re: text conversion?

Re: text conversion?

Re: text conversion?

Re: text conversion?

Re: text conversion?

Re: text conversion?

Re: text conversion?

Re: text conversion?

Re: text conversion?

Re: text conversion?

Re: text conversion?

Re: text conversion?