Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setlocale warning during loghi run #39

Open
viragom opened this issue Nov 13, 2024 · 14 comments
Open

setlocale warning during loghi run #39

viragom opened this issue Nov 13, 2024 · 14 comments

Comments

@viragom
Copy link

viragom commented Nov 13, 2024

Hi, I tried to install Loghi in a virtual machine Ubuntu on Windows 11 (success) and run the prog in the virtual machine (not successful).
I'm new to this kind of program, so any help is appreciated.

Warning in step HTR: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)

How to fix this?

@rvankoert
Copy link
Collaborator

Are you using the dockers or building from scratch?

@viragom
Copy link
Author

viragom commented Nov 13, 2024

Using dockers

@rvankoert
Copy link
Collaborator

could you try in the virtual machine:

sudo -i
echo "LC_ALL=en_US.UTF-8" >> /etc/environment
echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen
locale-gen en_US.UTF-8

@viragom
Copy link
Author

viragom commented Nov 13, 2024

Hi Rutger

Did this - run loghi again - no difference unfortunately - same warning

@rvankoert
Copy link
Collaborator

Just checked my own logs files and the warning is there as well if I run it. It comes from the docker and it does not cause any harm. Do you get any output after that warning?

@viragom
Copy link
Author

viragom commented Nov 13, 2024

Okay,
starting inference-pipeline.sh gives notice "Running HTR", followed by the docker run command used for the HTR, this one results in "LC-ALL cannot set .. " warning above.
It is followed by a message of CUDA Version 12.5.0 and then again the same LC_ALL warning.

Could it be the CUDA thing?

@stefanklut
Copy link

It has something to do with the ubuntu installation inside the docker. Docker by default only install C.UTF-8 as far as my quick google searches say. See this: https://unix.stackexchange.com/questions/626916/how-to-set-locale-correctly-manually

@viragom
Copy link
Author

viragom commented Nov 13, 2024

@stefanklut @rvankoert, looks like the steps you mention have to do with the locale setting... Performed the suggestions in the link mentioned, but without success ;-(
the warning still exists and no HTR is performed
Any other suggestions or information needed to get this ons solved?

@rvankoert
Copy link
Collaborator

Ignore the warning "LC-ALL cannot set..."
If you get something about CUDA then the HTR is started, but something else is wrong. Often we see errors where the path to the model is incorrect. Could you post full output? There should be a log.txt in /tmp/TMPDIR where TMPDIR is changed everytime. Somewhere earlier you should have: "Temporary directory created at: ....". Could you check that log file? Or post it if possible.

@viragom
Copy link
Author

viragom commented Nov 13, 2024

log.txt
this is the log from the most recent attempt...

@rvankoert
Copy link
Collaborator

change your model from:
/home/joep/loghi/loghi-htr/float32-generic-2023-02-15/saved_model.pb
to
/home/joep/loghi/loghi-htr/float32-generic-2023-02-15/

@viragom
Copy link
Author

viragom commented Nov 13, 2024

okay, looks simple (should I have known how to put the model in that line??)
No errors while running now (though warning remains --> = ignore)
New folder Page with a .png and a .xml file As far as I can see the xml only contains coordinates (Coords points) and baselines. Shouldn't there be text somewhere?

@rvankoert
Copy link
Collaborator

I will add some documentation on how to add the model and some meaningful warnings when it is wrong.
The png can be ignored/deleted. It contains layout information which is now in the xml.
There should be TextRegions and also TextLines. Unless no lines were found of course.

It should look like this:

<?xml version="1.0" encoding="UTF-8"?><PcGts xmlns="http://schema.primaresearch.org/PAGE/gts/pagecontent/2013-07-15" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schema.primaresearch.org/PAGE/gts/pagecontent/2013-07-15 http://schema.primaresearch.org/PAGE/gts/pagecontent/2013-07-15/pagecontent.xsd">
  <Metadata>
    <Creator>Laypa</Creator>
    <Created>2024-10-17T13:57:08</Created>
    <LastChange>2024-10-25T20:44:30</LastChange>
  </Metadata>
  <Page imageFilename="NL-HaNA_1.01.02_2993_0222.jpg" imageWidth="5380" imageHeight="4260">
    <ReadingOrder>
      <OrderedGroup id="orderedgroup_51e737bc-f0eb-42c4-b59e-2c5ab0b343ba">
        <RegionRefIndexed index="0" regionRef="b19321c3-5270-463a-adfc-d4aae71eb24d"/>
        <RegionRefIndexed index="1" regionRef="region_7d83b144-f26d-45f2-a180-1d36b2fb6334"/>
        <RegionRefIndexed index="2" regionRef="b2bd2555-63de-406f-bb5d-617c1e9061d9"/>
        <RegionRefIndexed index="3" regionRef="region_1170b2d9-57b4-48aa-945b-0bc210da1cfb"/>
        <RegionRefIndexed index="4" regionRef="f4e80f6e-7681-461b-9aa1-33f745b6968f"/>
        <RegionRefIndexed index="5" regionRef="region_05e860d4-4bba-454d-bac7-43160d3ffdff"/>
        <RegionRefIndexed index="6" regionRef="region_9b81ab47-b05b-4a6a-b1ac-c0de5eb941d7"/>
        <RegionRefIndexed index="7" regionRef="b547f612-9ff5-4d26-8ace-873789e8a5ad"/>
        <RegionRefIndexed index="8" regionRef="b696868e-1549-49a6-8c21-c1503eb32f87"/>
      </OrderedGroup>
    </ReadingOrder>
    <TextRegion id="b19321c3-5270-463a-adfc-d4aae71eb24d" custom="readingOrder {index:0;} structure {type:Text;}" primaryLanguage="Dutch">
      <Coords points="1113,455 2695,455 2695,1329 1113,1329"/>
      <TextLine id="line_2d2e0478-ac80-45cd-be4d-5d566cfda2a3" custom="readingOrder {index:0;}" primaryLanguage="Dutch">
        <Coords points="1113,468 1267,464 1275,456 1307,455 1479,459 1543,523 1563,524 1643,524 1691,476 1707,472 1926,471 1942,484 2070,483 2090,500 2598,495 2638,523 2674,523 2695,510 2695,583 2678,566 2646,579 2625,599 2338,599 2322,586 2222,586 2214,578 1962,578 1910,611 1855,611 1839,627 1679,626 1639,598 1575,598 1552,575 1507,574 1463,595 1447,611 1299,615 1267,635 1223,647 1113,647"/>
        <Baseline points="1149,549 1199,551 1299,549 1749,553 2149,553 2599,559 2649,566 2660,565"/>
        <Word id="word_03c58622-1c4f-4027-bd89-a70ac7595646">
          <Coords points="1150,514 1200,516 1298,514 1538,516 1538,561 1299,559 1199,561 1149,559"/>
          <TextEquiv>
            <PlainText>Ontfangen</PlainText>
            <Unicode>Ontfangen</Unicode>
          </TextEquiv>
        </Word>
        <Word id="word_484233b0-b340-4cfb-a8a2-ab1f7ce96892">
          <Coords points="1581,517 1711,518 1711,563 1581,562"/>
          <TextEquiv>
            <PlainText>een</PlainText>
            <Unicode>een</Unicode>
          </TextEquiv>
        </Word>

If it doesn't: send the log again.

@viragom
Copy link
Author

viragom commented Nov 13, 2024

@rvankoert My xml doesn't contain any of the Word tags in your example xml. Mine only has the coord points and baselines. Seel log2.txt I feel that we're almost there though....
log2.txt

@knaw-huc knaw-huc deleted a comment from viragom Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants