Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Custom Font #7

Open
Bengeljo opened this issue Feb 23, 2024 · 1 comment
Open

Custom Font #7

Bengeljo opened this issue Feb 23, 2024 · 1 comment

Comments

@Bengeljo
Copy link

I always get an error when I want to use a font, it is installed and can be find by windows and even looking it up works perfectly. When I run the split_training_text.py I get the following Error:
Fontconfig error: Cannot load default config file: No such file: (null)
Fontconfig error: Cannot load default config file: No such file: (null)
Could not find font named 'Quadrant'.
Pango suggested font 'Cascadia Code'.
Please correct --font arg.

I want to train the model on Quadrat-Serial-Regular.ttf but it just won't regognize it. I tried to look it up but can't find it. Modifying the font flag doesn't help since it wants a name but it can't find it even tho it is there, but tbh I don't know where it is searching for the fonts.

The Folder is located on the SSD E: and the operating system is on C: but tesseract and python are in the path of C: so they should get access to it. Please help

@Antonio-Serrat
Copy link

I always get an error when I want to use a font, it is installed and can be find by windows and even looking it up works perfectly. When I run the split_training_text.py I get the following Error: Fontconfig error: Cannot load default config file: No such file: (null) Fontconfig error: Cannot load default config file: No such file: (null) Could not find font named 'Quadrant'. Pango suggested font 'Cascadia Code'. Please correct --font arg.

I want to train the model on Quadrat-Serial-Regular.ttf but it just won't regognize it. I tried to look it up but can't find it. Modifying the font flag doesn't help since it wants a name but it can't find it even tho it is there, but tbh I don't know where it is searching for the fonts.

The Folder is located on the SSD E: and the operating system is on C: but tesseract and python are in the path of C: so they should get access to it. Please help

Hi @BengeljoI have the same issue, and finally can use the script use my font but I use Linux. Anyway I think this probably helps you to get an idea about this error. For now I don't know what is the problem.

I rewrote the python script to that. and this works for me.

I made a font.config '''locally''' only for the script, but before that you need to properly install the font.

After install you can check this using command fc-list | gresp "fontname" this should show you the font and in which dir si placed.

Then you have to use this path to place it in your <dir> in the custom font.config.

import os
import random
import pathlib
import subprocess
import tempfile

# Create the fontconfig file into temp dir
with tempfile.TemporaryDirectory() as tempdir:
    fontconfig_dir = os.path.join(tempdir, 'fontconfig')
    os.makedirs(fontconfig_dir)

    fontconfig_content = """<?xml version="1.0"?>
    <!DOCTYPE fontconfig SYSTEM "fonts.dtd">
    <fontconfig>
      <dir> <HERE/YOUR/FONT/PATH> </dir>
      <cachedir>YOUR/CACHE/DIR</cachedir>
      <config>
        <match target="scan">
          <test name="family">
            <string>YOURFONT</string>
          </test>
          <edit name="family" mode="assign">
            <string>YOURFONT</string>
          </edit>
        </match>
      </config>
    </fontconfig>
    """

    fontconfig_file_path = os.path.join(fontconfig_dir, 'fonts.conf')
    with open(fontconfig_file_path, 'w') as f:
        f.write(fontconfig_content)

    # Add the specifics env variables only for use with this script
    os.environ['FONTCONFIG_PATH'] = fontconfig_dir
    os.environ['FONTCONFIG_FILE'] = fontconfig_file_path

    # Update Fontconfig cache
    subprocess.run(['fc-cache', '-fv'], check=True)

    training_text_file = 'YOUR/LANG/TRAIINIG/DATA'
    lines = []

    with open(training_text_file, 'r') as input_file:
        for line in input_file.readlines():
            lines.append(line.strip())

    output_directory = 'WHERE_YOU/WANT_TO/OUTPUT_DATA'

    if not os.path.exists(output_directory):
        os.mkdir(output_directory)

    random.shuffle(lines)
    count = 20000
    lines = lines[:count]

    line_count = 0
    for line in lines:
        training_text_file_name = pathlib.Path(training_text_file).stem
        line_training_text = os.path.join(output_directory, f'{training_text_file_name}_{line_count}.gt.txt')
        with open(line_training_text, 'w') as output_file:
            output_file.writelines([line])

        file_base_name = f'LANG_{line_count}' 

        subprocess.run([
            'text2image',
            f'--font=YOURFONT',
            f'--text={line_training_text}',
            f'--outputbase={output_directory}/{file_base_name}',
            '--max_pages=1',
            '--strip_unrenderable_words',
            '--leading=32',
            '--xsize=3600',
            '--ysize=480',
            '--char_spacing=1.0',
            '--exposure=0',
            '--unicharset_file=langdata/eng.unicharset',
        ], check=True)

        line_count += 1

I hope this helps!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants