text conversion?
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller
text conversion?
Hi
For my work as a church musician I have a stack with all the metrics used in our different hymnals
Each card has its own metric variation and the song using that metric are mentioned on that card
You could call it a Stone Age database - it is a very simple form of database indeed
Now I have made scripts to export the entire database to at textfile - which will be used as a data file for a web page, where you can search for hymns by their number or title (and many more things)
My problem is that when I just export the file as is - the web page reads some chars differently.
I did make a little add-on to the script for some common anomalities as "æ, ø, å" - very common in danish
But now that I have added some more hymn book with foreign titles (German and Swedish) I cannot "translate" the umlaut chars (ä, ü, ö) and their capital letter variants
I have been trying to read about the different text conversion functions. But I haven't found a solution to my problem
best regards
Here's a link to one of the web pages
http://www.johanmichaelsen.dk/Hvor-er-sangene/
- which is made in php
Here is a part of the script:
--lille oversætter til specialtegn - da jeg ikke kan få det til at virke med online salmesedlen ellers!!!!
repeat with i = 1 to the number of chars in tempResultat
put char i of tempResultat into tChar
if tChar = "æ" then
put "Ê" into tChar
else if tChar = "ø" then
put "¯" into tChar
else if tChar = "å" then
put "Â" into tChar
else if tChar = "Æ" then
put "∆" into tChar
else if tChar = "Ø" then
put "ÿ" into tChar
else if tChar = "Å" then
put "≈" into tChar
--else if tChar = "ä" then
--put "Ã" into tChar
else if tChar = "È" then
put "…" into tChar
else if tChar = "é" then
put "È" into tChar
end if
put tChar after tempResultat2
end repeat
---put uniDecode(uniEncode(tempResultat,"utf8")) into tempResultat2
---put uniEncode(tempResultat,"utf8") into tempResultat2
---put uniDecode(tempResultat,"utf8") into tempResultat2
--put tempResultat into url destFil
put tempResultat2 into url destFil
---put textDecode(tempResultat,"UTF-8") into url destFil
---put textDecode(tempResultat,"ASCII") into url destFil
For my work as a church musician I have a stack with all the metrics used in our different hymnals
Each card has its own metric variation and the song using that metric are mentioned on that card
You could call it a Stone Age database - it is a very simple form of database indeed
Now I have made scripts to export the entire database to at textfile - which will be used as a data file for a web page, where you can search for hymns by their number or title (and many more things)
My problem is that when I just export the file as is - the web page reads some chars differently.
I did make a little add-on to the script for some common anomalities as "æ, ø, å" - very common in danish
But now that I have added some more hymn book with foreign titles (German and Swedish) I cannot "translate" the umlaut chars (ä, ü, ö) and their capital letter variants
I have been trying to read about the different text conversion functions. But I haven't found a solution to my problem
best regards
Here's a link to one of the web pages
http://www.johanmichaelsen.dk/Hvor-er-sangene/
- which is made in php
Here is a part of the script:
--lille oversætter til specialtegn - da jeg ikke kan få det til at virke med online salmesedlen ellers!!!!
repeat with i = 1 to the number of chars in tempResultat
put char i of tempResultat into tChar
if tChar = "æ" then
put "Ê" into tChar
else if tChar = "ø" then
put "¯" into tChar
else if tChar = "å" then
put "Â" into tChar
else if tChar = "Æ" then
put "∆" into tChar
else if tChar = "Ø" then
put "ÿ" into tChar
else if tChar = "Å" then
put "≈" into tChar
--else if tChar = "ä" then
--put "Ã" into tChar
else if tChar = "È" then
put "…" into tChar
else if tChar = "é" then
put "È" into tChar
end if
put tChar after tempResultat2
end repeat
---put uniDecode(uniEncode(tempResultat,"utf8")) into tempResultat2
---put uniEncode(tempResultat,"utf8") into tempResultat2
---put uniDecode(tempResultat,"utf8") into tempResultat2
--put tempResultat into url destFil
put tempResultat2 into url destFil
---put textDecode(tempResultat,"UTF-8") into url destFil
---put textDecode(tempResultat,"ASCII") into url destFil
-
- VIP Livecode Opensource Backer
- Posts: 9731
- Joined: Wed May 06, 2009 2:28 pm
- Location: New York, NY
Re: text conversion?
Hi.
I do not use unicode at all, but do not quite understand your issue.
If you know beforehand the mapping of each character based on the language of interest, as you showed in the "translator" you posted, why not continue to explicitly substitute one character for another, based on the "language" required.
Now there may be a simple unicode solution to this, that does not require that "translator". And EVERYTHING is present in the unicode character set. But what went wrong with the method you already have? Or is it that you do not want to have to do that at all?
Craig
I do not use unicode at all, but do not quite understand your issue.
If you know beforehand the mapping of each character based on the language of interest, as you showed in the "translator" you posted, why not continue to explicitly substitute one character for another, based on the "language" required.
Now there may be a simple unicode solution to this, that does not require that "translator". And EVERYTHING is present in the unicode character set. But what went wrong with the method you already have? Or is it that you do not want to have to do that at all?
Craig
-
- VIP Livecode Opensource Backer
- Posts: 9731
- Joined: Wed May 06, 2009 2:28 pm
- Location: New York, NY
Re: text conversion?
Rereading, is it the CAPITAL letter versions that is the issue? Does the "caseSensitive" property help here?
Craig
Craig
Re: text conversion?
Not sure I understand this - why are you using a translator for 'special characters'? And after replacing the unicode chars with other tokens, you are decoding as unicode? These should all be valid unicode characters, there should be no reason to do a replacement cipher like this?joeMich wrote: ↑Mon Feb 26, 2024 4:42 pmHere is a part of the script:
Code: Select all
--lille oversætter til specialtegn - da jeg ikke kan få det til at virke med online salmesedlen ellers!!!! repeat with i = 1 to the number of chars in tempResultat put char i of tempResultat into tChar if tChar = "æ" then put "Ê" into tChar else if tChar = "ø" then put "¯" into tChar else if tChar = "å" then put "Â" into tChar else if tChar = "Æ" then put "∆" into tChar else if tChar = "Ø" then put "ÿ" into tChar else if tChar = "Å" then put "≈" into tChar --else if tChar = "ä" then --put "Ã" into tChar else if tChar = "È" then put "…" into tChar else if tChar = "é" then put "È" into tChar end if put tChar after tempResultat2 end repeat ---put uniDecode(uniEncode(tempResultat,"utf8")) into tempResultat2 ---put uniEncode(tempResultat,"utf8") into tempResultat2 ---put uniDecode(tempResultat,"utf8") into tempResultat2 --put tempResultat into url destFil put tempResultat2 into url destFil ---put textDecode(tempResultat,"UTF-8") into url destFil ---put textDecode(tempResultat,"ASCII") into url destFil
Regarding upper case etc - presumably you have tried toUpper() and so on?
Perhaps if you share a small test stack that contains some of the data you wish to export and some detail about the export format, it will be easier for others to help...
Stam
PS: Please use the code tags when sharing code - the button with the "</>" symbol, 5th along from the Bold button - just selected the code you paste in and click the button.
-
- VIP Livecode Opensource Backer
- Posts: 9731
- Joined: Wed May 06, 2009 2:28 pm
- Location: New York, NY
Re: text conversion?
Stam.
I tried some of the chars the OP posted, and "toUpper" does not work for all of them, though does for some.
The "translation" he requires seems to be some sort of explicit one-to-one relationship. These would have to come, assuming unicode cannot just do it on its own, from a custom look-up table for each language.
Craig
I tried some of the chars the OP posted, and "toUpper" does not work for all of them, though does for some.
The "translation" he requires seems to be some sort of explicit one-to-one relationship. These would have to come, assuming unicode cannot just do it on its own, from a custom look-up table for each language.
Craig
-
- Livecode Opensource Backer
- Posts: 9446
- Joined: Fri Feb 19, 2010 10:17 am
- Location: Bulgaria
Re: text conversion?
Here's one of Richmond's "boring" stories.
In 1996, returning from the USA to Bulgaria, I worked for a bit with a font design studio using Fontographer to make Bulgarian fonts for the government and various government institutions (Universities and so forth).
These fonts were NOT Unicode and involved putting Bulgarian Cyrillic glyphs into the second ASCII table (129 - 255).
Come 2010 and I end up "paying for my sins".
Several academics at the local University had electronic documents encoded in that pre-unicode Bulgarian font layout, which came out as complete gobble-de-gook in Windows XP and so forth.
So I had to sit down and design a "Bulgarian Text Muncher" that read in the pre-unicode Bulgarian font layout texts char-by-char and swapped each one for its unicode equivalent and exported the converted text.
I did NOT mess around with stuff such as "Щ" but opened the pre-unicode Bulgarian font the text was encoded in in a font editor (fontforge: https://fontforge.org/en-US/) and write down (shock, horror: a piece of paper with a pencil) the ASCII address for each one of the Bulgarian Cyrillic glyphs: both upper and lower case.
Then I popped over here: https://unicode.org/charts/ and downloaded the relevant PDF with the Unicode standard addresses for Cyrillic.
From that to a fairly bulky SWITCH statement to perform stuff like this:
(not sure, as this is slightly mental, if I would entirely trust this code)
This stack suffers from only 1 problem:
I have lost it!
It chewed its way through those text files in an antiquated font layout as fast as a fast thing.
In 1996, returning from the USA to Bulgaria, I worked for a bit with a font design studio using Fontographer to make Bulgarian fonts for the government and various government institutions (Universities and so forth).
These fonts were NOT Unicode and involved putting Bulgarian Cyrillic glyphs into the second ASCII table (129 - 255).
Come 2010 and I end up "paying for my sins".
Several academics at the local University had electronic documents encoded in that pre-unicode Bulgarian font layout, which came out as complete gobble-de-gook in Windows XP and so forth.
So I had to sit down and design a "Bulgarian Text Muncher" that read in the pre-unicode Bulgarian font layout texts char-by-char and swapped each one for its unicode equivalent and exported the converted text.
I did NOT mess around with stuff such as "Щ" but opened the pre-unicode Bulgarian font the text was encoded in in a font editor (fontforge: https://fontforge.org/en-US/) and write down (shock, horror: a piece of paper with a pencil) the ASCII address for each one of the Bulgarian Cyrillic glyphs: both upper and lower case.
Then I popped over here: https://unicode.org/charts/ and downloaded the relevant PDF with the Unicode standard addresses for Cyrillic.
From that to a fairly bulky SWITCH statement to perform stuff like this:
(not sure, as this is slightly mental, if I would entirely trust this code)
Code: Select all
switch CHARX
case codepointToNum(CHARX) is (0xDF)
put numToCodepoint(0x0411) into CHARX
break
I have lost it!
It chewed its way through those text files in an antiquated font layout as fast as a fast thing.
Re: text conversion?
Hi again
First of all: Thanks for responses!!!
I'm sure I was not good at explaining my problem:
At first I just exported the text of card flds into a txt-file. When I open that on my Mac it's looks all fine.
But when I import the text (via PHP) certain chars are replaced
I guess it might well have something to do with how PHP reads the plain txt-file with non aschii chars...
here's a screenshot of some of the text with and without the conversion
best regards
Johan
First of all: Thanks for responses!!!
I'm sure I was not good at explaining my problem:
At first I just exported the text of card flds into a txt-file. When I open that on my Mac it's looks all fine.
But when I import the text (via PHP) certain chars are replaced
I guess it might well have something to do with how PHP reads the plain txt-file with non aschii chars...
here's a screenshot of some of the text with and without the conversion
best regards
Johan
Re: text conversion?
is that not an issue with PHP reading the file as ASCII rather than unicode text?
Re: text conversion?
the snippet that you brought into the discussion, richmond22, is interesting.
I'll have to dig a little more to understand how it works
I'll have to dig a little more to understand how it works
-
- Livecode Opensource Backer
- Posts: 9446
- Joined: Fri Feb 19, 2010 10:17 am
- Location: Bulgaria
Re: text conversion?
Can't you import the text file as a TEXT file?
No PHP.
You might like to think of exporting your data from the database as some sort of item-delimited text file . . .
No PHP.
You might like to think of exporting your data from the database as some sort of item-delimited text file . . .
Re: text conversion?
Hi Joe,
Just a:
?
How did you export the text from LC to TXT file on your Mac?
Just a:
Code: Select all
put the text of fld "dansk" into url("file:...")
Re: text conversion?
Yes, Klaus
simply put into a text-file
simply put into a text-file
Code: Select all
on mouseUp
set the itemdel to "/"
get the effective filename of this stack
delete item -1 of it
put it & "/" into stakkensFilsti
---nu har vi stien for mappen, der indeholder stakken
---lav evt en datamappe
---put it & "/datamappe/" into dataMappensFilsti
put stakkensFilsti & "xport af versemaal/" into dataMappensFilsti
put dataMappensFilsti & "versemaal.txt" into filensSti
put "file:" & filensSti into destFil
---det er den fil som enten skabes eller skrives til
put "versemålsnr,linjer,metrik,stavelser2" into meterListen
put "CSnr,KSnr,Hjsk19nr,Hjsk17nr,GTLnr,DDTnr,glDDSnr,nyDDSnr,vmliste,vers" into salmeListen
put "nyDDKnr,glDDKnr,andenMelBognr,prefNr,prefGlNr" into koralListen
put "" into tempResultat
--put tab into adskilningstegn
put "|" into adskilningstegn
set cursor to busy
set lockScreen to true
put the seconds into startTid
put "0" into antalKortBearbejdet
put "1" into linNumresultat
put the number of cards of this stack into tempSidsteSide
repeat with x = 2 to tempSidsteSide -------357 -----alle kortene!!!
go card x
put "" into tempTempResultat ---det midlertidige output (for dette kort!)
-------put "" into tempResultat ---det endelige output (for alle definerede kort!)
repeat with i = 1 to the number of lines in fld "vmliste"
set the itemDelimiter to ","
repeat with u = 1 to the number of items in meterListen
put fld (item u of meterListen) & adskilningstegn after line i of tempTempResultat
end repeat
set the itemDelimiter to ","
repeat with u = 1 to the number of items in salmeListen
if item u of salmeListen = "vmliste" then
--------- her skal lidt ekstrabehandling til
put line i of fld (item u of salmeListen) into afkortetLinje
delete word 1 to 2 of afkortetLinje
if char 1 of afkortetLinje = " " then
delete char 1 of afkortetLinje
end if
---get afkortetLinje
put afkortetLinje into tempAfkortet
if ";" is in tempAfkortet then
put char 1 to offset (";", tempAfkortet) - 1 of tempAfkortet into afkortetLinje
else if "(" is in tempAfkortet then
put char 1 to offset ("(", tempAfkortet) - 1 of tempAfkortet into afkortetLinje
else
put tempAfkortet into afkortetLinje
end if
replace tab with "" in afkortetLinje
put afkortetLinje & adskilningstegn after line i of tempTempResultat
---put erstatVanskeligeBogstaver(afkortetLinje) & adskilningstegn after line i of tempTempResultat
-----------
else
if line i of fld (item u of salmeListen) = "-" then
put "" & adskilningstegn after line i of tempTempResultat
else if line i of fld (item u of salmeListen) = "÷" then
put "" & adskilningstegn after line i of tempTempResultat
else if line i of fld (item u of salmeListen) = "" then
put "" & adskilningstegn after line i of tempTempResultat
else
put line i of fld (item u of salmeListen) & adskilningstegn after line i of tempTempResultat
end if
end if
end repeat
set the itemDelimiter to ","
repeat with u = 1 to the number of items in koralListen
put "" into koralPræfiks
put item u of koralListen into aktuelKoralbog
if aktuelKoralbog = "nyDDKnr" then
put "K " into koralPræfiks
else if aktuelKoralbog = "glDDKnr" then
put "gK " into koralPræfiks
end if
if line i of fld (item u of koralListen) = "-" then
put "" & adskilningstegn after line i of tempTempResultat
else if line i of fld (item u of koralListen) = "÷" then
put "" & adskilningstegn after line i of tempTempResultat
else if line i of fld (item u of koralListen) = "" then
put "" & adskilningstegn after line i of tempTempResultat
else
put line i of fld (item u of koralListen) into tempKoralLinje
put "," & koralPræfiks into kommaPræfiks
replace "," with kommaPræfiks in tempKoralLinje
put koralPræfiks & tempKoralLinje & adskilningstegn after line i of tempTempResultat
end if
end repeat
put adskilningstegn after line i of tempTempResultat
put "jabadaba" after line i of tempTempResultat
replace tab with "" in tempTempResultat
end repeat
------put return & tempTempResultat after url destFil
---put the number of lines of tempResultal into linNumresultat
---add 1 to linNumresultat
put tempTempResultat & return after tempResultat
---put tempTempResultat into line linNumresultat of tempResultat
---put the number of lines of tempResultal into linNumresultat
---add 1 to linNumresultat
add 1 to antalKortBearbejdet
end repeat ---- rep-loop for alle kortene!!
--put uniEncode(tempResultat) into url destFil
put "" into tempResultat2
set the caseSensitive to true
/*
HENTET FRA NETTET:
--Ä
replace "Ä" with "Ä" in eingabe
-- Ö
replace "Ö" with "Ö" in eingabe
-- Ü
replace "Ãœ" with "Ü" in eingabe
-- ä
replace "ä" with "ä" in eingabe
-- ö
replace "ö" with "ö" in eingabe
-- ü
replace "ü" with "ü" in eingabe
-- ß
replace "ß" with "ß" in eingabe
-- É
replace "É" with "Ö" in eingabe
*/
--lille oversætter til specialtegn - da jeg ikke kan få det til at virke med online salmesedlen ellers!!!!
repeat with i = 1 to the number of chars in tempResultat
put char i of tempResultat into tChar
if tChar = "æ" then
put "Ê" into tChar
else if tChar = "ø" then
put "¯" into tChar
else if tChar = "å" then
put "Â" into tChar
else if tChar = "Æ" then
put "∆" into tChar
else if tChar = "Ø" then
put "ÿ" into tChar
else if tChar = "Å" then
put "≈" into tChar
--else if tChar = "ä" then
--put "Ã" into tChar
else if tChar = "È" then
put "…" into tChar
else if tChar = "é" then
put "È" into tChar
end if
put tChar after tempResultat2
end repeat
---put uniDecode(uniEncode(tempResultat,"utf8")) into tempResultat2
---put uniEncode(tempResultat,"utf8") into tempResultat2
---put uniDecode(tempResultat,"utf8") into tempResultat2
--put tempResultat into url destFil
put tempResultat2 into url destFil
---put textDecode(tempResultat,"UTF-8") into url destFil
--put textDecode(tempResultat,"ASCII") into url destFil
put the seconds into slutTid
subtract startTid from slutTid
-------convert slutTid from english seconds to long system Time
put slutTid && sekunder && "antal kort: " && antalKortBearbejdet
set lockScreen to false
end mouseUp
Re: text conversion?
and I read from the text-file into memory via php with
Code: Select all
$versemaalLinjer = file_get_contents("versemaal.txt");
-
- Livecode Opensource Backer
- Posts: 9446
- Joined: Fri Feb 19, 2010 10:17 am
- Location: Bulgaria
Re: text conversion?
So: here's a fairly silly illustration of what I meant in my last post + a correction.
I'll start with the correction:
LiveCode cannot work "in both directions" in Hexadecimal, so unicode addresses must be converted to decimal values.
This is one of the many reasons I have been involved in a log-term love affair with calculator apps (this is the macOS 12 one):
- - -
This silly example takes every lower case letter of the English Latin alphabet and replaces it with the equivalent upper case letter of the English Latin alphabet.
I'll start with the correction:
LiveCode cannot work "in both directions" in Hexadecimal, so unicode addresses must be converted to decimal values.
This is one of the many reasons I have been involved in a log-term love affair with calculator apps (this is the macOS 12 one):
- - -
This silly example takes every lower case letter of the English Latin alphabet and replaces it with the equivalent upper case letter of the English Latin alphabet.
Code: Select all
on mouseUp
put empty into fld "fOUTPUT"
put fld "fINPUT" into TEXTX
repeat until TEXTX is empty
put char 1 of TEXTX into CHARX
put codepointToNum(CHARX) into CHARP
switch CHARP
case 97
put numToCodepoint(65) after fld "fOUTPUT"
break
case 98
put numToCodepoint(66) after fld "fOUTPUT"
break
case 99
put numToCodepoint(67) after fld "fOUTPUT"
break
case 100
put numToCodepoint(68) after fld "fOUTPUT"
break
case 101
put numToCodepoint(69) after fld "fOUTPUT"
break
case 102
put numToCodepoint(70) after fld "fOUTPUT"
break
case 103
put numToCodepoint(71) after fld "fOUTPUT"
break
case 104
put numToCodepoint(72) after fld "fOUTPUT"
break
case 105
put numToCodepoint(73) after fld "fOUTPUT"
break
case 106
put numToCodepoint(74) after fld "fOUTPUT"
break
case 107
put numToCodepoint(75) after fld "fOUTPUT"
break
case 108
put numToCodepoint(76) after fld "fOUTPUT"
break
case 109
put numToCodepoint(77) after fld "fOUTPUT"
break
case 110
put numToCodepoint(78) after fld "fOUTPUT"
break
case 111
put numToCodepoint(79) after fld "fOUTPUT"
break
case 112
put numToCodepoint(80) after fld "fOUTPUT"
break
case 113
put numToCodepoint(81) after fld "fOUTPUT"
break
case 114
put numToCodepoint(82) after fld "fOUTPUT"
break
case 115
put numToCodepoint(83) after fld "fOUTPUT"
break
case 116
put numToCodepoint(84) after fld "fOUTPUT"
break
case 117
put numToCodepoint(85) after fld "fOUTPUT"
break
case 118
put numToCodepoint(86) after fld "fOUTPUT"
break
case 119
put numToCodepoint(87) after fld "fOUTPUT"
break
case 120
put numToCodepoint(88) after fld "fOUTPUT"
break
case 121
put numToCodepoint(89) after fld "fOUTPUT"
break
case 122
put numToCodepoint(90) after fld "fOUTPUT"
break
default
put " " after fld "fOUTPUT"
end switch
delete char 1 of TEXTX
end repeat
end mouseUp
- Attachments
-
- Text conversion.livecode.zip
- (1.37 KiB) Downloaded 17 times
Last edited by richmond62 on Tue Feb 27, 2024 3:58 pm, edited 3 times in total.
-
- Livecode Opensource Backer
- Posts: 9446
- Joined: Fri Feb 19, 2010 10:17 am
- Location: Bulgaria
Re: text conversion?
I usually import text into a field like this:
Sometimes I use RTFtext.
Code: Select all
on mouseUp
answer file "Choose a TEXT file to import"
if the result = "cancel"
then exit mouseUp
else
set the TEXT of fld "fRESULT" to URL ("file:" & it)
end if
end mouseUp