2014. december 23.

Experimenting with a Samsung UE32H6200 Smart LED TV

Useful links

Connecting to the TV
  • Using the HDMI port 
    • Connecting via HDMI Port (using the HDMI cable) 
    • Connecting with the HDMI (DVI) Port (using a DVI to HDMI cable)
  • Samsung Link (log in) connects content to the TV through cloud based software
  • Home Network (DLNA, UPnP) connects content to the TV through virtual network file server
  • Screen Mirroring for Samsung Mobile Devices
  • Wi-Fi Direct 
  • Samsung Smart View 2.0 for PC, for Samsung Mobile Devices, for iPhone to share the screen
DLNA connection from Ubuntu
Screen Casting with Wi-Fi Direct from Ubuntu
Read this summary first.
Read this how-to second.
Unfortunately HP Compaq TC4400 does not support Wi-Fi Direct.
I know this from the directions of the above how-to. I installed "iw" through synaptic and checked for "p2p" under "supported interface modes" in ~$ sudo iw list -- and there was none.



2014. október 6.

How to extract a unique distinct list from a column in LibreOffice Calc with range filter criteria

Using this method it is possible to filter a range of data with the INDIRECT function.

you will have to change the $A$2:$A$20 range two times to the wished range given by INDIRECT.

=INDEX($A$2:$A$20, MATCH(0, COUNTIF($B$1:B1, $A$2:$A$20), 0))

Let's say you have a third column, C with a header row 1 and data from row 2.
This function will work only if the data in column C is sorted.

You type your filter criteria in cell $E$1.
  1. using MATCH to find the first occurrence of the filter criteria:
  2. MATCH($E$1,$C$1:$C$20,0)
  3. using COUNTIF to fing out how many rows have this criteria:
    COUNTIF($C$1:$C$20,$E$1)
  4. the range will start in the row given by MATCH($E$1,$C$1:$C$20,0)
    and end in the row given by (MATCH($E$1,$C$1:$C$20,0)+COUNTIF($C$1:$C$20,$E$1)-1)
  5. now add the letter of the data column to the range like "A"&
  6. your range starting cell will be INDIRECT("A"&MATCH($E$1,$C$1:$C$20,0))
    your range ending cell will be INDIRECT("A"&(MATCH($E$1,$C$1:$C$20,0)+COUNTIF($C$1:$C$20,$E$1)-1))
  7. the string you will paste in place of $A$2:$A$20 will be INDIRECT("A"&MATCH($E$1,$C$1:$C$20,0)):INDIRECT("A"&(MATCH($E$1,$C$1:$C$20,0)+COUNTIF($C$1:$C$20,$E$1)-1))

How to extract a unique distinct list from a column in LibreOffice Calc

Hi all,

to this problem: http://nabble.documentfoundation.org/Generate-Unique-List-from-Values-in-Column-td4077250.html

this is the solution in MS Excel,
http://www.get-digital-help.com/2009/03/30/how-to-extract-a-unique-list-and-the-duplicates-in-excel-from-one-column/

and to do this in LibreOffice the single difference you have to make is to not click and drag, but to ctrl+click and drag down the cell contents!

so, step by step:

you have a header: row 1.
your list starts in row 2 in column A.
your unique list will start in row 2 column B.

you type =INDEX($A$2:$A$20, MATCH(0, COUNTIF($B$1:B1, $A$2:$A$20), 0)) in cell B2 and hit CTRL+SHIFT+ENTER

this should result in you having one of the values of your A2:A20 list in B2.

now press CRTL and click and drag down the bottom-right corner of B2 cell.

if you drag down long enough, #N/A cell contents should show up on the bottom of your unique list.

2014. május 27.

HP Compaq TC4400 specification

The model was released in 2006. (source: Wikipedia)

This is a copy of the specification of the HP website.

Processor, Operating System and Memory
Operating system installedGenuine Windows Vista® Business
ProcessorIntel® Core™2 Duo Processor T5600
• 1.83 GHz, 2 MB L2 cache, 667 MHz FSB 
Compatible Operating SystemsGenuine Windows 2000 drivers available on www.hp.com, FreeDOS
ChipsetMobile™ Intel® 945GM Express Chipset
Standard memory1 GB (512 MB x 2)
Memory UpgradeUpgradeable to 4096 MB maximum
Memory slots2 SODIMM slots
System features
Internal hard disk drive80 GB
Hard disk drive speed5400 RPM
CD-ROM and DVDVia optional HP External MultiBay II (9.5-mm) or HP Advanced Docking Station
Multi-bay devicesOptional External Multibay II devices
Portability
WeightStarting at 2.1 kg
Dimensions (w x d x h)3 (at front) x 28.5 x 23.5 cm
Display size12.1 inches diagonal
Connectivity
Wireless TechnologiesWLAN 802.11 a/b/g
Network interfaceBroadcom NetXtreme Gigabit (10/100/1000 NIC) PCI Express Ethernet Controller, HP Smart Power NIC technology
Expandability
Expansion slotsSlots available for additional devices: 1 Type I/II PC Card slot supports both 32-bit CardBus and 16-bit cards, Secure digital slot
External I/O ports3 USB 2.0 ports, VGA, headphone, microphone, AC adapter, RJ-11, RJ-45, S-video TV out, Firewire (1394a)
Software
Software includedAdditional software available from the Web: HP Client Manager Interface
Graphic / Audio
Graphic Subsystem NameIntel® Graphics Media Accelerator 950
Graphic Subsystem Video Card MemoryUp to 224 MB shared video memory
Other information
KeyboardFull-sized keyboard
Mouse/Pointing DeviceEnhanced dual pointing devices (touchpad and pointstick) with scroll zone, digital eraser pen with tether and clip
Power Features6-cell Lithium-Ion battery internal
Power RequirementsExternal 65W Smart AC adapter
Docking solutionHP Docking Station, HP Advanced Docking Station, HP External MultiBay II, HP Monitor Stands (all sold separately)
Security ManagementKensington lock slot, optional Smart Card Reader (replaces PC card slot)
Operating Temperature Range0 to 35° C
Operating Humidity Range10 to 90% RH
Non-Operating Humidity5 to 95% RH
Warranty3 years carry-in (pick-up and return in some countries (upgrades available, sold separately)) 1 year warranty on primary battery


EY608EA - HP Compaq tc4400 Tablet PC

Audio
Audio
HP Premier Sound™ High Definition Audio 24-bit DAC, headphone jack, stereo/mono microphone jack, integrated mono microphone
Internal audio
HP Premier Sound™ High Definition Audio 24-bit DAC, headphone jack, stereo/mono microphone jack, integrated mono microphone
Communications
Modem
56K modem
Connectivity
Wireless capability
Yes
Wireless technologies
Intel® Wireless LAN 802.11a/b/g mini-pci card, Bluetooth
Displays
Display size
12.1 inches diagonal
Display features
12.1" XGA WVA (1024 × 768), 160 degree, with digitizer with ambient light sensor; 12.1" XGA WVA (1024 × 768), 160 degree, outdoor viewable display with digitizer, with ambient light sensor
Expansion slots
Expansion slot
Slots available for additional devices: 1 Type I/II PC Card slot supports both 32-bit CardBus and 16-bit cards, Secure digital slot
Expansion slots
Slots available for additional devices: 1 Type I/II PC Card slot supports both 32-bit CardBus and 16-bit cards, Secure digital slot
Faxing
Fax/modem
56K modem
Input devices
Keyboard
Full-sized keyboard
Keyboard
Full-sized keyboard
Pointing device
Enhanced dual pointing devices (touchpad and pointstick) with scroll zone, digital eraser pen with tether and clip
Pointing device
Enhanced dual pointing devices (touchpad and pointstick) with scroll zone, digital eraser pen with tether and clip
Memory
Memory type
DDR2, 667 MHz, 512, 1024, 2048 MB
Memory slots
2 SODIMM slots
Memory slots
2 SODIMM slots
Memory upgrade
Upgradeable to 4096 MB maximum
Memory upgrade
Upgradeable to 4096 MB maximum
Cache external
2 MB L2 cache
Networking
Network interface
Broadcom NetXtreme Gigabit (10/100/1000 NIC) PCI Express Ethernet Controller, HP Smart Power NIC technology
Network interface
Broadcom NetXtreme Gigabit (10/100/1000 NIC) PCI Express Ethernet Controller, HP Smart Power NIC technology
Ports
I/O port
3 USB 2.0 ports, VGA, headphone, microphone, AC adapter, RJ-11, RJ-45, S-video TV out
External I/O ports
3 USB 2.0 ports, VGA, headphone, microphone, AC adapter, RJ-11, RJ-45, S-video TV out, Firewire (1394a)
Power
Battery life
Up to 5h30m (up to 12h30m with HP Extended Life Battery, up to 16 hours with HP Ultra-capacity battery)
Battery life
Up to 5h30m (up to 12h30m with HP Extended Life Battery, up to 16 hours with HP Ultra-capacity battery)
Power features
6-cell Lithium-Ion battery internal
Power features
6-cell Lithium-Ion battery, optional 8-Cell HP Extended Life Battery (sold separately), HP Fast Charge Technology (sold separately)
Power requirements
External 65W Smart AC adapter
Power requirements
External 65W Smart AC adapter
Processor
Chipset
Mobile™ Intel® 945GM Express Chipset
Chipset
Mobile™ Intel® 945GM Express Chipset
Processor speed
1.83 GHz
Centrino technology
Intel® Centrino® processor technology
Software
Software included
Additional software available from the Web: HP Client Manager Interface
Software included
Additional software available from the Web: HP Client Manager Interface
Operating system software 01
Genuine Windows XP Tablet PC Edition 2005
Pre-installed software
HP Wireless Assistant (on selected models), HP Mobile Printing Driver, HP Backup and Recovery Manager, HP ProtectTools Security Manager, Sonic Digital Media Plus (on selected models), Intervideo WinDVD 5, Adobe Acrobat Reader, HP Help and Support, HP One-Touch Buttons, Symantec Norton Internet Security, HP OpenView Radia Management Solutions, HP Qmenu Software, Tablet PC Tour, Microsoft Reader eBooks
Pre-installed software
HP Wireless Assistant (on selected models), HP Mobile Printing Driver, HP Backup and Recovery Manager, HP ProtectTools Security Manager, Sonic Digital Media Plus (on selected models), Intervideo WinDVD 5, Adobe Acrobat Reader, HP Help and Support, HP One-Touch Buttons, Symantec Norton Internet Security, HP OpenView Radia Management Solutions, HP Qmenu Software, Tablet PC Tour, Microsoft Reader eBooks
Compatible operating systems
Genuine Windows 2000 drivers available on www.hp.com, FreeDOS
Compatible operating systems
FreeDOS, Windows 2000 drivers available on www.hp.com
Operating system installed
Genuine Windows XP Tablet PC Edition
Storage
Optical drives
Via optional HP External MultiBay II (9.5-mm) or HP Advanced Docking Station
Optical drives
Via optional HP External MultiBay II (9.5-mm) or HP Advanced Docking Station
Hard disk drive
Serial ATA 60, 80 or 100 GB (5400 rpm)
Internal hard disk drive
80 GB
Hard disk drive speed
5400 rpm
Multi-bay devices
Optional External Multibay II devices
Multi-bay devices
Optional External Multibay II devices
System
Docking solution
HP Basic Docking Station, HP Advanced Docking Station, HP External MultiBay II, HP Monitor Stands (all sold separately)
Docking solution
HP Docking Station, HP Advanced Docking Station, HP External MultiBay II, HP Monitor Stands (all sold separately)
Security management
Kensington lock slot, optional Smart Card Reader (replaces PC card slot)
Security management
Kensington lock slot, optional Smart Card Reader (replaces PC card slot)
HP Protection tools
HP ProtectTools, HP Enhanced Drivelock, HP TPM Embedded Security Chip, HP Biometric Fingerprint Sensor
HP Protection tools
HP ProtectTools, HP Enhanced Drivelock, HP TPM Embedded Security Chip, HP Biometric Fingerprint Sensor
Upgradability
Windows Vista® capable
System bus
667 MHz FSB
Video
Graphic card 01
Intel® Graphics Media Accelerator 950
Graphic subsystem name
Intel® Graphics Media Accelerator 950
Graphic subsystem video card memory
Up to 224 MB shared video memory
Video resolutions description
1024 × 768 XGA WVA TFT (16 million colours)
Wireless
Wireless technologies
Intel or Broadcom Wireless LAN 802.11 and/or Bluetooth 2.0
 
 
Operating humidity range10 to 90% RH
Operating temperature range0 to 35° C
Operating temperature range0 to 35° C
 
 
WeightStarting at 2.1 kg
Dimensions (W x D x H)3 (at front) x 28.5 x 23.5 cm
Dimensions, metric3.43 (at front) x 28.5 x 23.5 cm
 

2014. május 26.

Mplayer for music

I would like to have something like this:

1.) nautilus script
right click on file/dir to copy full path of file/dir
2.) text file
to paste the full paths of to be played items or dirs
3.) utility to handle the text file as a playlist

Description of the utility:
- uses mplayer to play audio/video files
- reads the first line of the text file, starts to play it, than deletes it so the second line becomes the new first line (alternately it moves the line to another text file's end, so the user can go backward in the playlist)
- user can edit the text file on-the-fly between tracks (edit playlist while playing)
- utility should be fool-proof (in case of wrong input it should pause and wait for user to correct error in playlist)


SD card failing...

I got these kind of errors with my microSD cards when I want to write on them. One is a 32GB SDHC the other is a 2GB SD.

in nautilus
Error creating file system: helper exited with exit code 1: Error calling fsync(2) on /dev/mmcblk0p1: Input/output error
with cp
cp: cannot create regular file `/media/Zene/test.mp3': Input/output error
with rsync
rsync: rename "/media/Zene/.test.mp3.FUG0AR" -> "test.mp3": Input/output error (5)
rsync: mkstemp "/media/Zene/.test.jpg.MMYg55" failed: Read-only file system (30)
sent 149488152 bytes  received 183 bytes  499126.33 bytes/sec
total size is 149469329  speedup is 1.00
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1060) [sender=3.0.7]

and after this the drive was empty once again :-/

even in windows7, SD card dismounts after copy finished and almost everything disappears from the drive. SD slot did not come up on win7.

regarding the mp3 player I wanted to prep for usage with sd card, the files have to be loaded one after another to keep the order of the songs of the albums.
to copy sequentially use {rsync -v -r source/. for local folder/ dest} for directories.

~$ sudo fsck -v /dev/sdcard
can write out interesting information, but has been no help regarding this problem.

2014. május 21.

Mplayer music playback with playlist

the command to use:
$ mplayer -playlist playlist.txt 

to make a playlist:
1. go to the directory that contains the files you want to make a playlist of.
2.
$ find . -type f -name "*.mp3" | sort > directory_playlist.txt
3. edit txt file so it has the full path (not necessary)

to do the same with a one-line command (if you do not need to edit the playlist):
$ mplayer -playlist <(find /directory/of/your/music/ -name "*.mp3" -type f | sort)

hit enter for the next track
cit crtl+c to break playback

A couple of useful commands with or within mplayer:
-shuffle : shuffle playlist
-loop : play the playlist repeatedly (can be combined with -loop)

2014. április 8.

Google Tasks: what it can and cannot do

As you can see on this page, Tasks are accessible from Gmail, from Calendar and from Mobile, through a web browser.

For what I use it for...

Let's begin with the useful parts:

  • it is possible to turn an e-mail into a task
  • it is possible to drag and drop due tasks in the calendar
  • it is possible to make separate lists for separate fields of tasks
Now, there...
  • it is not possible to set due date for e-mail tasks with one click.
    • for me the best would be if whatever e-mail I turn into a task would turn due for today, because than I can drag and drop it in the calendar.
  • it is not possible to archive task items, it is only possible to delete them.
    • for this reason my tasks list is hundreds of items long.
  • it is not possible to multiple select tasks to move them around in your list or add due date to them
  • it is not possible to view different task lists simultaneously in calendar
    • therefor I do not use different lists
If you happen to know a solution to the functions I miss, please leave a comment to this post and tell me about it!

2014. március 16.

DJVU extract images

A useful link:
http://askubuntu.com/questions/46233/converting-djvu-to-pdf

Another useful link: http://unix.stackexchange.com/questions/20592/extract-several-pages-from-a-djvu-file
http://kheyali.blogspot.hu/2011/08/in-order-to-convert-djvu-file-into.html

And another: http://en.wikisource.org/wiki/Help:DjVu_files#DjVu_to_Images


Extract images from DJVU:
DDJVU
The easiest way to do this is probably by extracting to multipage tiff
Than split tiff file with tiffsplit: http://manpages.ubuntu.com/manpages/hardy/man1/tiffsplit.1.html
Than convert to anything useful.
Except that did not work for the file I had...

Okay, so the second way is the script way. Copy the script, paste it to an empty text file. Give it the extension .sh and give it permission to execute as program. Put it in a directory with all the djvu-s you want to extract and run in terminal.

#!/bin/bash
#Ubuntu 10.04 Lucid Lynx

#Put this script file in a directory with all the djvu-s you want to extract and run it in terminal.
#All the DJVU-s in this folder will be extracted to PNM-s.

### Select files to process
#defining the filelist
filelist=(`find -iname '*.djvu' | sort`)
#counting the files in the filelist
element_count=${#filelist[*]}
echo "alltogether $element_count djvu will be processed"

#process only 1 file at a time:
counter=1
echo "processing file:"
for i in "${filelist[@]}"
do
echo -ne "$element_count/$counter"\\r

#Get the number of pages from input file
pagecount=`djvused -e n $i`
#Set a filelist for the pages
 for (( filelist=1; filelist<=$pagecount; filelist+=1 ))
 do
 #Extract with DDJVU
 ddjvu -format=pnm -page=$filelist $i file_$counter.page_$filelist.pnm
 #Or use DjvuPs to extract PS from DJVU the same way:
 #djvups -page=$filelist $i file_$counter.page_$filelist.ps
 done
let "counter = $counter + 1"
done

echo DONE.

2014. február 19.

Modify PDF pages as images, than reintegrate

what just has happened?!

I successfully reintegrated a modified PDF in the original file!

wow! that is definitely a level up in image-pdf editing for me.

pdftk input.pdf cat 2 output page2.pdf
pdfinfo page2.pdf 
Page size: 595 x 842 pts (A4) pdfimages page2.pdf img
identify -verbose img-000.pbm
Geometry: 2496x3440+0+0 Resolution: 72x72 Page geometry: 2496x3440+0+0 Modified the image through Gimp. Open, Edit, Save -> Export -> click Ok whenever needed. convert img-000.pbm -resample 300x300 -resize 2496x3440 img2.ps
identify -verbose img2.ps
  Geometry: 599x826+0+0
  Resolution: 72x72
  Page geometry: 599x826+0+0

ps2pdf12 -sPAPERSIZE=a4 -dFIXEDRESOLUTION img2.ps page2_v2.pdf
pdfinfo page2test4pdf.pdf 
  Page size:      595 x 842 pts (A4)

pdftk A=input.pdf B=page2_V2.pdf cat A1 B A3-end output output.pdf

2014. február 12.

Crate a link file pointing to web URL

1. Create an empty text file.

2. Copy the following content in it:

[Desktop Entry]
Encoding=UTF-8
Name=Link to Ask Ubuntu
Type=Link
URL=http://www.askubuntu.com/
Icon=text-html

Edit Name and URL to desired name and url.

3. Save file
4. Close file
5. Rename file to have the extension ".desktop"

Done.

2014. január 28.

Creating searchable PDFs on Ubuntu 2nd try

Need be:

  • image layer over text layer
  • good character encoding for Hungarian ű and ő chars
  • good placement of words and lines
  • fair enough good recognition
  • handling more column layout
1st try was Tesseract output hocr embedded with hocr2pdf in a pnm file.
  • image layer over text layer - YES
  • good character encoding for Hungarian ű and ő chars - NO
  • good placement of words and lines - NO
  • fair enough good recognition - YES
  • handling more column layout - NO
the strangest is, that hOCR editor does not handle well the tesseract output hocr. actually it does not handle it at all, showing html tags and everything where the editable text shoud be...

2nd try is: OCRopus 
I had no success installing and using orcopus.
  • "recognize" does not handle languages, and/or I could not find a Hungarian data file for it.
    it works like: ocroscript recognize input.pnm > output.html
  • rec-tess-complete should recognize through tesseract, and import language files with the --tesslanguage=hun option, but instead I got this error:
    Unable to load unicharset file /usr/share/tesseract-ocr/tessdata/hun.unicharset
  • so I unpacked the hun.traineddata like this:
    combine_tessdata -u hun.traineddata hun.
  • and put the files to /usr/share/tesseract-ocr/tessdata/
  • however I got this error:
    Error: Illegal malloc request size!
    Fatal error: No error trap defined!
    Signal_termination_handler called with signal 2001
  • than I tried with --tesslanguage=eng and it gave me:
    ocroscript: /usr/share/ocropus/scripts//rec-tess-complete.lua:52: attempt to call global 'hardcoded_version_string' (a nil value)
  • so I searched and found a patch, and installed it like this:
    patch /usr/share/ocropus/scripts/rec-tess-complete.lua rec-tess-complete3_r1308.patch
  • and now it gives me (with "eng")
    ocroscript: /usr/share/ocropus/scripts//rec-tess-complete.lua:61: Leptonica is disabled, please compile with it or don't use it!
I already have the newest tesseract on board, but I failed to manage a newest ocropus installation. it had too many unknown aspects with python and all...

results with ocropus 0.3.1-2 recognize and merged with hocr2pdf:
  • image layer over text layer - YES
  • good character encoding for Hungarian ű and ő chars - NO
  • good placement of words and lines - NO (makes large characters, I cannot even tell which line it should be)
  • fair enough good recognition - NO (because of english training data)
  • handling more column layout - DON'T KNOW (text was too big, it was impossible to tell)
maybe the big text was because of the dpi of the image... I should check on this to at least be able to qualify the layout option... nope, it did not help... at all.


...to be continued with:

3rd try is: Cuneiform
4th try: Adobe Acrobat XI on Windows

2014. január 26.

Creating searchable PDFs on Ubuntu 1st try

Process is the following:

  1. have a good resolution image in leptonica allowed format to use with tesseract: JPEG, PNG, TIFF, BMP, PNM ,GIF and WEBP.
  2. produce hocr output like
    tesseract image.pbm textfile -l hun hocr
  3. merge image and hocr to searchable pdf in hocr2pdf: image layer on top of text layer like
    hocr2pdf -i input.pbm -o output.pdf < textfile.html


First impressions:

  • character encoding after hocr2pdf is off
    setting character encoding in html file header to ISO-8859-2 or Windows-1250 does not help.
    "ő" turns "Q" and "ű" turns "q" :-(
  • 2 columns was not recognized automatically in tesseract
    psm option does not solve this
    this is probably impossible in the current version.
    have to try out another way to produce hocr with an ocr software that handles layout.
  • font sizes are chaotic
    I think this probably depends on the bbox size and therefor on the ocr software.
  • output file size is 159kb from a 818kb pbm, which is way too big.
    this cannot be helped if the pdf is not generated with my own methods...

Anyhow, this looks like a disaster :-( character encoding has to work.

Istalling Tesseract

Thanks to THIS post I was able to install tesseract 3 on 10.04 Ubuntu. This is how:

Install Tesseract
Get the required packages available in the repositories:

sudo apt-get install libpng12-dev
sudo apt-get install libjpeg62-dev
sudo apt-get install libtiff4-dev
("sudo apt-get install zlibg-dev" is suggested in the Tesseract readme but isn't available. I found I didn't need this.)

I picked this up from a comment made, you need to be able to compile and make the software. Ubuntu needs some packages to help do this. For many of you these may already be present and installed but it doesn't hurt..

sudo apt-get install gcc
sudo apt-get install g++
sudo apt-get install automake


Download this program which can't be gained with apt-get:
http://www.leptonica.org/download.html version 1.70
unpack, navigate to the folder in terminal, and run:

./configure
make
sudo make install
sudo ldconfig


Now we can actually get and install Tesseract!

download tesseract: https://code.google.com/p/tesseract-ocr/downloads/list version 3.02.02
unpack, navigate to the folder in terminal, and run:

./configure
make
sudo make install
sudo ldconfig   (<-- important="" is="" p="" this="" very="">

Now for whatever reason the training data isn't installed with this.

download whatever language you need and unzip to /usr/local/share/tessdata folder (requires root permissions)
also download osd traineddata from for example here

try with:
sudo nautilus


OCR Hungarian

After six long years I gathered my courage to face the task of OCR-ing texts again.

Goals:

1.) searchable PDF and/or DJVU from image PDF or DJVU.
one guide for PDFs uses cuneiform to ocr, and hocr2pdf to emberd text in pdf.
another option is pdfsandwich

2.) formatted text file preferably HTML from image.


Utilities:

Tesseract:
homepage: http://code.google.com/p/tesseract-ocr/
wikipedia: http://en.wikipedia.org/wiki/Tesseract_(software)
Hungarian training data: http://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-3.02.hun.tar.gz&can=2&q=hun (which I lacked six years ago)
Other projects using Tesseract engine: http://code.google.com/p/tesseract-ocr/wiki/3rdParty

Cuneiform:
homepage: http://cognitiveforms.ru/products/cuneiform/
wikipedia: http://en.wikipedia.org/wiki/CuneiForm_(software)

hOcr2Pdf:
homepage: http://hocrtopdf.codeplex.com/

2014. január 15.

Burning DVDs and failing.

With my Samsung SE-208DB/TSBS Black.
It is connected to 2 USB slots, 1 directly and 1 with extension. 


GnomeBaker 
DVD+R, Speed: 8X, Mode: auto. 
1. Time: 71 min.
Output:

Writing an image:
I: -input-charset not specified, using utf-8 (detected in locale settings)
Using 321030_2052795755951_117000.JPG;1 for  Photos_2012_2/jul/321030_2052795755951_1177642115_n.jpg (321030_2052795755951_1177642115_n (1).jpg)
Total translation table size: 0
Total rockridge attributes bytes: 121887
Total directory bytes: 243712
Path table size(bytes): 1104
Max brk space used 108000
2174849 extents written (4247 MB)
Executing 'builtin_dd if=/home/user/Data25.iso of=/dev/sr0 obs=32k seek=0'
/dev/sr0: "Current Write Speed" is 6.1x1352KBps.
:-( unable to WRITE@LBA=16f290h: Input/output error
:-( write failed: Input/output error
/dev/sr0: flushing cache
:-( unable to FLUSH CACHE: Input/output error
:-( unable to SYNCHRONOUS FLUSH CACHE: Input/output error
Also failed.
this is simple non-sense. I have a brand new DVDRW, burned sucsessfully 3 DVD so far, and now it fails. This is unbelievable.




2. Time: 79 min.

Executing 'genisoimage -gui -V DiscLabel -A GnomeBaker -p à -iso-level 3 -l -r -hide-rr-moved -J -joliet-long -graft-points --path-list /tmp/GnomeBaker/gnomebaker-C0YY9W | builtin_dd of=/dev/sr0 obs=32k seek=0'
I: -input-charset not specified, using utf-8 (detected in locale settings)
/dev/sr0: "Current Write Speed" is 6.1x1352KBps.
Total translation table size: 0
Total rockridge attributes bytes: 224932
Total directory bytes: 428032
Path table size(bytes): 1850
Max brk space used 1e6000
2000215 extents written (3906 MB)
/dev/sr0: flushing cache
/dev/sr0: updating RMA

/dev/sr0: closing session





Here's a failure for 2 times:
Executing 'genisoimage -gui -V DataAngel_25 -A GnomeBaker -p A -iso-level 3 -l -r -hide-rr-moved -J -joliet-long -graft-points --path-list /tmp/GnomeBaker/gnomebaker-CC7X9W | builtin_dd of=/dev/sr0 obs=32k seek=0'
I: -input-charset not specified, using utf-8 (detected in locale settings)
Using 321030_2052795755951_117000.JPG;1 for  Photos_2012_2/képek facebookról Julinak/321030_2052795755951_1177642115_n.jpg (321030_2052795755951_1177642115_n (1).jpg)
/dev/sr0: "Current Write Speed" is 6.1x1352KBps.
:-( unable to WRITE@LBA=7c9d0h: Input/output error
:-( write failed: Input/output error
/dev/sr0: flushing cache
:-( unable to FLUSH CACHE: Input/output error
:-( unable to SYNCHRONOUS FLUSH CACHE: Input/output error

2014. január 5.

MobiPocket Creator usage

Download Mobipocket Creator

Follow these instructions to install Publisher Version through WINE

To create e-books:
Follow instructions

Follow User Manual

Be prepared for continuously ignoring software errors while creating e-books.

What Works?

This simple process works okay, e-book is built:
  • Create new publication
  • Add Content:
    • Insert HTML file
    • Insert Image file(s)
  • Add Cover Image
  • Add Metadata
  • (Save publication)
  • Build e-book
 What does not work?
Build fails with "error(htmlparser) no BODY tag found in content file"


KindleGen Usage

Download Kindlegen for Linux
Read publishing guidelines

Extract package anywhere

docs/english/Readme.txt content (relevant):

Creating Kindle ebooks - Advanced users:
-------------------------------------------
Advanced users can use the command line tool to convert EPUB/HTML to Kindle ebooks. This interface is available in Windows, Mac and Linux platform. This tool can be used for automated bulk conversions.

KindleGen for Linux 2.6 i386 :
1. Download the KindleGen tar.gz from www.amazon.com/kindleformat/kindlegen to a folder such as Kindlegen in home directory (~/KindleGen).
2. Extract the contents of the file to '~/KindleGen'. Open the terminal, move to folder containing the downloaded file using command "cd ~/KindleGen" and then use command "tar xvfz kindlegen_linux_2.6_i386_v2.tar.gz" to extract the contents.
3. Open the Terminal application and type ~/KindleGen/kindlegen. Instructions on how to run KindleGen are displayed.
4. Conversion Example: To convert a file called book.html, go to the directory where the book is located, such as cd desktop, and type ~/KindleGen/kindlegen book.html. If the conversion was successful, a new file called book.mobi displays on the desktop.
5. Please note: It is recommended to follow these steps to run KindleGen. Double-clicking the KindleGen icon does not launch this program. Run the above commands without quotes

Instructions on how to run KindleGen:
Navigate in terminal to folder
type ./kindlegen for usage information:
*************************************************************
 Amazon kindlegen(Linux) V2.9 build 0730-890adc2
 A command line e-book compiler
 Copyright Amazon.com and its Affiliates 2013
*************************************************************
Usage : kindlegen [filename.opf/.htm/.html/.epub/.zip or directory] [-c0 or -c1 or c2] [-verbose] [-western] [-o ]
Note:
   zip formats are supported for XMDF and FB2 sources
   directory formats are supported for XMDF sources
Options:
   -c0: no compression
   -c1: standard DOC compression
   -c2: Kindle huffdic compression
   -o : Specifies the output file name. Output file will be created in the same directory as that of input file. should not contain directory path.
   -verbose: provides more information during ebook conversion
   -western: force build of Windows-1252 book
   -releasenotes: display release notes
   -gif: images are converted to GIF format (no JPEG in the book)
   -locale : To display messages in selected language
      en: English
      de: German
      fr: French
      it: Italian
      es: Spanish
      zh: Chinese
      ja: Japanese
      pt: Portuguese
      ru: Russian
First impressions:
This program should be able to convert .html/.htm and .epub files...
it converts them to .mobi to a filesize at least double the original (depending on images and compression)
Uploaded to Kindle, all the files seem to work fine. Text formatting is kept in some way - not perfect, but readable. Kindle shows Title and author for the epub, and title set for the html (not filename!)

Seems okay, but I do not have a real chance to generate a beautiful book this way easily... maybe converting from epub might be a chance to keep the book beautiful...

...or should really read the  publishing guidelines to learn the proper formatting.