26 KiB
PDF24 Creator
Mittwoch, 28. September 2022
20:38
PDC OCR einrichten
Hallo,
packen Sie eine trainDataList.txt Datei für den lokalen Modus in den tesseract Ordner im Installationsverzeichnis. Im Vergleich zur online Liste enthält die lokale Liste keine URLs zum Download. Laden Sie sich dann die einzelnen Sprachdateien, die in der Liste angegeben sind, in den tessdata Order (INSTALL_DIR/tesseract/tessdata). Das war es dann schon. Achten Sie bitte drauf, dass Sie die Sprachdateien verwenden, die mit Tesseract 4.0 kompatibel sind. Die Sprachdateien kann man z.b. von GitHub aus dem tesseract Projekt herunterladen.
Die Liste für die lokale Variante kann man über die unten aufgeführte URL herunterladen. Diese Datei dann in den entsprechenden Order wie oben beschrieben kopieren. Beachten Sie bitte den Dateinamen von local-list.txt in trainDataList.txt umzubennen. Die Liste mit den downloadbaren Sprachdateien ist ebenfalls unten aufgeführt. Dort sind die direkten Links zu den Sprachdateien enthalten.
Die Liste können Sie kürzen. Sie können in der Liste nur die Sprachdateien angeben, die in der Software angezeigt werden sollen.
Liste für die lokalen Sprachdateien:
https://creator.pdf24.org/tesseract/4.0/traindata/local-list.txt
Liste für die downloadbaren Sprachdateien:
https://creator.pdf24.org/tesseract/4.0/traindata/list.txt
Stefan Ziegler Posted new comment 2019-07-11
Dringend wird immer die Sprachdatei und die OSD (letzter Eintrag)
afr Afrikaans https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/afr.traineddata
amh Amharic https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/amh.traineddata
ara Arabic https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/ara.traineddata
asm Assamese https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/asm.traineddata
aze Azerbaijani https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/aze.traineddata
aze_cyrl Azerbaijani-Cyrilic https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/aze_cyrl.traineddata
bel Belarusian https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/bel.traineddata
ben Bengali https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/ben.traineddata
bod Tibetan https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/bod.traineddata
bos Bosnian https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/bos.traineddata
bre Breton https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/bre.traineddata
bul Bulgarian https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/bul.traineddata
cat Catalan https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/cat.traineddata
ceb Cebuano https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/ceb.traineddata
ces Czech https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/ces.traineddata
chi_sim Chinese-Simpl. https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/chi_sim.traineddata
chi_tra Chinese-Trad. https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/chi_tra.traineddata
chr Cherokee https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/chr.traineddata
cym Welsh https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/cym.traineddata
dan Danish https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/dan.traineddata
deu German https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/deu.traineddata
dzo Dzongkha https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/dzo.traineddata
ell Greek https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/ell.traineddata
eng English https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/eng.traineddata
epo Esperanto https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/epo.traineddata
est Estonian https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/est.traineddata
eus Basque https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/eus.traineddata
fas Persian https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/fas.traineddata
fin Finnish https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/fin.traineddata
fra French https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/fra.traineddata
frk Frankish https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/frk.traineddata
frm French https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/frm.traineddata
gle Irish https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/gle.traineddata
glg Galician https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/glg.traineddata
grc Greek https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/grc.traineddata
guj Gujarati https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/guj.traineddata
hat Haitian https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/hat.traineddata
heb Hebrew https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/heb.traineddata
hin Hindi https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/hin.traineddata
hrv Croatian https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/hrv.traineddata
hun Hungarian https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/hun.traineddata
iku Inuktitut https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/iku.traineddata
ind Indonesian https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/ind.traineddata
isl Icelandic https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/isl.traineddata
ita Italian https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/ita.traineddata
ita_old Italian-Old https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/ita_old.traineddata
jav Javanese https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/jav.traineddata
jpn Japanese https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/jpn.traineddata
kan Kannada https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/kan.traineddata
kat Georgian https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/kat.traineddata
kat_old Georgian-Old https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/kat_old.traineddata
kaz Kazakh https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/kaz.traineddata
khm Khmer https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/khm.traineddata
kir Kirghiz https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/kir.traineddata
kor Korean https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/kor.traineddata
kor_vert Korean-vertical https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/kor_vert.traineddata
lao Lao https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/lao.traineddata
lat Latin https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/lat.traineddata
lav Latvian https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/lav.traineddata
lit Lithuanian https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/lit.traineddata
ltz Luxembourgish https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/ltz.traineddata
mal Malayalam https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/mal.traineddata
mar Marathi https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/mar.traineddata
mkd Macedonian https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/mkd.traineddata
mlt Maltese https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/mlt.traineddata
mon Mongolian https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/mon.traineddata
mri Maori https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/mri.traineddata
msa Malay https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/msa.traineddata
mya Burmese https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/mya.traineddata
nep Nepali https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/nep.traineddata
nld Dutch https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/nld.traineddata
nor Norwegian https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/nor.traineddata
oci Occitan https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/oci.traineddata
ori Oriya https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/ori.traineddata
pan Panjabi https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/pan.traineddata
pol Polish https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/pol.traineddata
por Portuguese https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/por.traineddata
pus Pushto https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/pus.traineddata
que Quechua https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/que.traineddata
ron Romanian https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/ron.traineddata
rus Russian https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/rus.traineddata
san Sanskrit https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/san.traineddata
sin Sinhala https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/sin.traineddata
slk Slovak https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/slk.traineddata
slv Slovenian https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/slv.traineddata
snd Sindhi https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/snd.traineddata
spa Spanish https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/spa.traineddata
spa_old Spanish-Old https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/spa_old.traineddata
sqi Albanian https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/sqi.traineddata
srp Serbian https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/srp.traineddata
srp_latn Serbian-Latin https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/srp_latn.traineddata
sun Sundanese https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/sun.traineddata
swa Swahili https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/swa.traineddata
swe Swedish https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/swe.traineddata
syr Syriac https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/syr.traineddata
tam Tamil https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/tam.traineddata
tat Tatar https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/tat.traineddata
tel Telugu https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/tel.traineddata
tgk Tajik https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/tgk.traineddata
tha Thai https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/tha.traineddata
tir Tigrinya https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/tir.traineddata
ton Tonga https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/ton.traineddata
tur Turkish https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/tur.traineddata
uig Uighur https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/uig.traineddata
ukr Ukrainian https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/ukr.traineddata
urd Urdu https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/urd.traineddata
uzb Uzbek https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/uzb.traineddata
uzb_cyrl Uzbek-Cyrilic https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/uzb_cyrl.traineddata
vie Vietnamese https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/vie.traineddata
yid Yiddish https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/yid.traineddata
yor Yoruba https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/yor.traineddata
osd OSD https://github.com/tesseract-ocr/tessdata_fast/raw/4.0.0/osd.traineddata
Aus <https://creator.pdf24.org/tesseract/4.0/traindata/list.txt>
//EventLog.CreateEventSource("PDF24-OCR-Service", "Application");