text conversion?
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller
-
- Livecode Opensource Backer
- Posts: 9446
- Joined: Fri Feb 19, 2010 10:17 am
- Location: Bulgaria
Re: text conversion?
Here is a seriously frightening list of non-unicode font encodings:
https://philip.html5.org/data/charsets-2.html
https://philip.html5.org/data/charsets-2.html
Re: text conversion?
OK, I guessed, try this script, it will NOT doe the replace thing but encodes your exported text to UTF8 and chances are good
that PHP will recognize the format as UTF8 and act accordingly.
However I have no idea of PHP, maybe something sels has to be set in PHP for this?
Anyway, see line 127 of my script:
Code: Select all
on mouseUp
# set the itemdel to "/"
# get the effective filename of this stack
# delete item -1 of it
# put it & "/" into stakkensFilsti
## We can now use:
put specialfolderpath("resources") & "/" into stakkensFilsti
put stakkensFilsti & "xport af versemaal/" into dataMappensFilsti
put dataMappensFilsti & "versemaal.txt" into filensSti
put "file:" & filensSti into destFil
---det er den fil som enten skabes eller skrives til
put "versemålsnr,linjer,metrik,stavelser2" into meterListen
put "CSnr,KSnr,Hjsk19nr,Hjsk17nr,GTLnr,DDTnr,glDDSnr,nyDDSnr,vmliste,vers" into salmeListen
put "nyDDKnr,glDDKnr,andenMelBognr,prefNr,prefGlNr" into koralListen
put "" into tempResultat
--put tab into adskilningstegn
put "|" into adskilningstegn
set cursor to busy
set lockScreen to true
put the seconds into startTid
put "0" into antalKortBearbejdet
put "1" into linNumresultat
put the number of cards of this stack into tempSidsteSide
repeat with x = 2 to tempSidsteSide -------357 -----alle kortene!!!
go card x
put "" into tempTempResultat ---det midlertidige output (for dette kort!)
-------put "" into tempResultat ---det endelige output (for alle definerede kort!)
repeat with i = 1 to the number of lines in fld "vmliste"
set the itemDelimiter to ","
repeat with u = 1 to the number of items in meterListen
put fld (item u of meterListen) & adskilningstegn after line i of tempTempResultat
end repeat
set the itemDelimiter to ","
repeat with u = 1 to the number of items in salmeListen
if item u of salmeListen = "vmliste" then
--------- her skal lidt ekstrabehandling til
put line i of fld (item u of salmeListen) into afkortetLinje
delete word 1 to 2 of afkortetLinje
if char 1 of afkortetLinje = " " then
delete char 1 of afkortetLinje
end if
---get afkortetLinje
put afkortetLinje into tempAfkortet
if ";" is in tempAfkortet then
put char 1 to offset (";", tempAfkortet) - 1 of tempAfkortet into afkortetLinje
else if "(" is in tempAfkortet then
put char 1 to offset ("(", tempAfkortet) - 1 of tempAfkortet into afkortetLinje
else
put tempAfkortet into afkortetLinje
end if
replace tab with "" in afkortetLinje
put afkortetLinje & adskilningstegn after line i of tempTempResultat
---put erstatVanskeligeBogstaver(afkortetLinje) & adskilningstegn after line i of tempTempResultat
-----------
else
if line i of fld (item u of salmeListen) = "-" then
put "" & adskilningstegn after line i of tempTempResultat
else if line i of fld (item u of salmeListen) = "÷" then
put "" & adskilningstegn after line i of tempTempResultat
else if line i of fld (item u of salmeListen) = "" then
put "" & adskilningstegn after line i of tempTempResultat
else
put line i of fld (item u of salmeListen) & adskilningstegn after line i of tempTempResultat
end if
end if
end repeat
set the itemDelimiter to ","
repeat with u = 1 to the number of items in koralListen
put "" into koralPræfiks
put item u of koralListen into aktuelKoralbog
if aktuelKoralbog = "nyDDKnr" then
put "K " into koralPræfiks
else if aktuelKoralbog = "glDDKnr" then
put "gK " into koralPræfiks
end if
if line i of fld (item u of koralListen) = "-" then
put "" & adskilningstegn after line i of tempTempResultat
else if line i of fld (item u of koralListen) = "÷" then
put "" & adskilningstegn after line i of tempTempResultat
else if line i of fld (item u of koralListen) = "" then
put "" & adskilningstegn after line i of tempTempResultat
else
put line i of fld (item u of koralListen) into tempKoralLinje
put "," & koralPræfiks into kommaPræfiks
replace "," with kommaPræfiks in tempKoralLinje
put koralPræfiks & tempKoralLinje & adskilningstegn after line i of tempTempResultat
end if
end repeat
put adskilningstegn after line i of tempTempResultat
put "jabadaba" after line i of tempTempResultat
replace tab with "" in tempTempResultat
end repeat
------put return & tempTempResultat after url destFil
---put the number of lines of tempResultal into linNumresultat
---add 1 to linNumresultat
put tempTempResultat & return after tempResultat
---put tempTempResultat into line linNumresultat of tempResultat
---put the number of lines of tempResultal into linNumresultat
---add 1 to linNumresultat
add 1 to antalKortBearbejdet
end repeat ---- rep-loop for alle kortene!!
--put uniEncode(tempResultat) into url destFil
##########################################################
## DO NOT the REPLACE thing below. but put the TEXTENCODED text directly into that target file here:
put textencode(tempResultat,"UTF-8") into url(stakkensFilsti)
##
EXIT TO TOP
###########################################################
set lockScreen to false
end mouseUp
-
- Livecode Opensource Backer
- Posts: 9446
- Joined: Fri Feb 19, 2010 10:17 am
- Location: Bulgaria
Re: text conversion?
Also . . .
It might be useful to remember that char hex 00FC (u umlaut) is NOT the same as hex 0075 (u) + hex 0308 (umlaut).
Even though they look the same.
- -
So, don't get distracted by diversions . . .
-
It might be useful to remember that char hex 00FC (u umlaut) is NOT the same as hex 0075 (u) + hex 0308 (umlaut).
Even though they look the same.
- -
So, don't get distracted by diversions . . .
-
Re: text conversion?
I'd NEVER take an Ü for an Ü!
-
- Livecode Opensource Backer
- Posts: 9446
- Joined: Fri Feb 19, 2010 10:17 am
- Location: Bulgaria
Re: text conversion?
Just so long as the spots are not on "U".
-
-
Re: text conversion?
Thank you for input
I shall dig into it when I get home later
And, ha ha. I like your sense of humour
I shall dig into it when I get home later
And, ha ha. I like your sense of humour
-
- Livecode Opensource Backer
- Posts: 9446
- Joined: Fri Feb 19, 2010 10:17 am
- Location: Bulgaria
Re: text conversion?
HOWEVER: there is plenty of room for further complications.
Here's a Windows 1251 Cyrillic font I have opened:
- -
The Cyrillic letters do NOT have Unicode addresses, so it is unclear how LiveCode could effect a conversion in the way I indicated in my example stack.
-
Especially as LiveCode, apparently, does NOT offer Windows 1251:
- -
I dug out the literature on my Cyrillic converter I wrote about 20 years ago, and as the Cyrillic letters in the original texts did NOT adhere to any "known" encoding (i.e. someone had just bunged them into the second ASCII table) that would not have presented any complications.
Here's a Windows 1251 Cyrillic font I have opened:
- -
The Cyrillic letters do NOT have Unicode addresses, so it is unclear how LiveCode could effect a conversion in the way I indicated in my example stack.
-
Especially as LiveCode, apparently, does NOT offer Windows 1251:
- -
I dug out the literature on my Cyrillic converter I wrote about 20 years ago, and as the Cyrillic letters in the original texts did NOT adhere to any "known" encoding (i.e. someone had just bunged them into the second ASCII table) that would not have presented any complications.
Last edited by richmond62 on Tue Feb 27, 2024 3:52 pm, edited 1 time in total.
-
- Livecode Opensource Backer
- Posts: 9446
- Joined: Fri Feb 19, 2010 10:17 am
- Location: Bulgaria
Re: text conversion?
Aha: with a little bit of poking around one can also find that even with Windows 1251 one should be able to convert a text with this encoding in the incredibly crude way I use in my example:
- -
Now . . . the unicode address for à is hex 00C3.
So, for the sake of argument if I install this font into my MacOS 12 system (ER Bukinist 1251) and do this:
- -
One can see that LiveCode CAN retrieve those addresses . . . it just involves a lot more faffing around for the programmer.
- -
Now . . . the unicode address for à is hex 00C3.
So, for the sake of argument if I install this font into my MacOS 12 system (ER Bukinist 1251) and do this:
- -
One can see that LiveCode CAN retrieve those addresses . . . it just involves a lot more faffing around for the programmer.
-
- Livecode Opensource Backer
- Posts: 9446
- Joined: Fri Feb 19, 2010 10:17 am
- Location: Bulgaria
Re: text conversion?
Oh; fantastic!
If one opens a Windows 1251 font using Fontforge (instead of a popular and extremely expensive commercial font editor) the Hex addresses are 'right there' for all to see:
- - -
Super: another reason why I should stick with Open Source software.
When in doubt I always prefer a spade to a rotavator.
If one opens a Windows 1251 font using Fontforge (instead of a popular and extremely expensive commercial font editor) the Hex addresses are 'right there' for all to see:
- - -
Super: another reason why I should stick with Open Source software.
When in doubt I always prefer a spade to a rotavator.
Re: text conversion?
But is the problem necessarily tied to a font?
Or are you just using the font (scheme) to see what to translate to?
When the webpage (PHP script) reads the text-file, no font is specified
Or are you just using the font (scheme) to see what to translate to?
When the webpage (PHP script) reads the text-file, no font is specified
-
- Livecode Opensource Backer
- Posts: 9446
- Joined: Fri Feb 19, 2010 10:17 am
- Location: Bulgaria
Re: text conversion?
The font should not be a problem, but the font layout may be.When the webpage (PHP script) reads the text-file, no font is specified
The unicode font layout scheme is meant to takeover from ALL previous encodings, but that has not happened.
If you import text that uses a different font layout from the one you use in your LiveCode stack there will be a mismatch and you will be unable to read your text.
Re: text conversion?
Now I think I got it working
I made some tests with "textEncode" in a stack (attachement)
When I tried the "UTF-16" version, then it looked much alike the one I had made earlier. So I took out the chars that I knew would be used and filled the conversion into my "translator"
That's the only "conversion" I make upon exporting to text-file
From the webpage I import/read the file as an array (through PHP).
Most of the material is numbers. But I had to do something with the title-part of the array.
Here I did
I must admit that I don't understand it - but it works for now
Thanks for your comments and for looking into the problem!
best regards
johan
I made some tests with "textEncode" in a stack (attachement)
When I tried the "UTF-16" version, then it looked much alike the one I had made earlier. So I took out the chars that I knew would be used and filled the conversion into my "translator"
Code: Select all
--little translator - otherwise some chars are not shown right in the web-page!!!!
repeat with i = 1 to the number of chars in tempResultat
put char i of tempResultat into tChar
if tChar = "§" then
put "ß" into tChar
else if tChar = "å" then
put "Â" into tChar
else if tChar = "Å" then
put "≈" into tChar
else if tChar = "æ" then
put "Ê" into tChar
else if tChar = "ø" then
put "¯" into tChar
else if tChar = "Æ" then
put "∆" into tChar
else if tChar = "Ø" then
put "ÿ" into tChar
else if tChar = "é" then
put "È" into tChar
else if tChar = "á" then
put "·" into tChar
else if tChar = "È" then
put "»" into tChar
else if tChar = "À" then
put "¿" into tChar
else if tChar = "ú" then
put "˙" into tChar
else if tChar = "Ù" then
put "Ÿ" into tChar
else if tChar = "Ü" then
put "‹" into tChar
else if tChar = "ö" then
put "ˆ" into tChar
else if tChar = "ü" then
put "¸" into tChar
else if tChar = "ä" then
put "‰" into tChar
end if
put tChar after tempResultat2
end repeat
put tempResultat2 into url destFil
From the webpage I import/read the file as an array (through PHP).
Most of the material is numbers. But I had to do something with the title-part of the array.
Here I did
Code: Select all
$titel[] = utf8_encode($linjeArray[12]);
Thanks for your comments and for looking into the problem!
best regards
johan
Re: text conversion?
Maybe that is a little part of the problem?joeMich wrote: ↑Tue Feb 27, 2024 11:12 pmFrom the webpage I import/read the file as an array (through PHP).
Most of the material is numbers. But I had to do something with the title-part of the array.
Here I didI must admit that I don't understand itCode: Select all
$titel[] = utf8_encode($linjeArray[12]);
No it doesn't!
macOS 12.6.7
Browser: Safari lastest version on the left.
Firefox latest version on the right. I'm still convinced that using a correctly encoded text file will work with PHP.
Internally PHP uses iso-8859-1 encoding, and the PHP function "utf8_encode(...)" is deprecated since a long time.
Did you try my last script, without all the manual character replacements?
And maybe just outputting your data from your Mac to a iso-8859-1 encoded file will work out of the box?
Code: Select all
...
## Collect your data...
## And convert to PHP friendly encoding:
put mactoiso(tempResultat) into url(stakkensFilsti)
...
Re: text conversion?
You are right
There are still unexpected chars
I'll try to export from Livecode to a text-file with the encoding that you suggest.
Thanks!
There are still unexpected chars
YepMaybe that is a little part of the problem?
I'll try to export from Livecode to a text-file with the encoding that you suggest.
Thanks!
Re: text conversion?
Seems that this works:
from LC: macToIso
into php:
I get strange signs if I don't use the mb-convert_encoding
Right now I think the problems are solved
from LC: macToIso
into php:
Code: Select all
$titel[] = mb_convert_encoding($linjeArray[12],'UTF-8','ISO-8859-1');
Right now I think the problems are solved