Skip to content

Instantly share code, notes, and snippets.

@tbrowder
Created December 22, 2024 18:00
Show Gist options
  • Save tbrowder/3f3a3f33cf4da80d024bf483157917e5 to your computer and use it in GitHub Desktop.
Save tbrowder/3f3a3f33cf4da80d024bf483157917e5 to your computer and use it in GitHub Desktop.
tbrowder-advent-2024

Santa's Print Shop

A lead-free area

Elf Nebecaneezer ('Neb') was welcoming some young new workers to his domain and doing what old folks like to do: pontificate. (His grandchildren politely, but behind his back, call it "bloviating.")

"In the old days, we used hand-set lead type, then gradually used ever-more-modern methods; now we prepare the content in PDF and send a copy to another company to print the content on real newsprint. It still takes a lot of paper, but much less ink and labor (as well as being safer [1])."

"Most young people are very familiar with html and online documents, but, unless your browser and printer are very fancy, it's not easy to create a paper copy of the thing you need to file so that it's very usable. One other advantage of hard-copy products is that they can be used without the internet or electricity. Remember, the US had a hugely-bad day recently when a major internet service provider had a problem!" [He continued to speak...]

Inspiration and perspiration

Now let's get down to 'brass tacks.' You all are supposed to be competent Raku users, so I will show you some interesting things you can do with PDF modules. But first, a little background on PDF (Adobe's Portable Document Format).

The PDF was developed by the Adobe company in 1991 to replace the PostScript (PS) format. In fact, The original PS code is at the heart of PDF. Until 2008, Adobe defined and continued to develop PDF. Beginning in 2008, the ISO defined it in its 32000 standard. That standard is about a thousnd pages long and can be obtained online at https://www.pdfa-inc.org at no cost.

One other important piece of the mix is the CLI 'ghostscript' interpreter which can be used, among other things, to compress PDF documents.

Before we continue, remember, we are a Debian shop, and any specific system software I mention is available in current versions.

Document creation

We are still improving Rakudoc (see the recent Raku Advent post by Richard, he's been a very good boy this year), but it can be used now for general text to produce PDF output. However, for almost any specific printed product needed, we can create the PDF on either a one-time case or a specific design for reuse. A good example of that is the published module 'Name::Tags' which can be modified to use different layouts, colors, fonts, images, and so forth. Its unique output feature is its reversibility--when printed two-sided, the person's name shows regardless of whether the badge flips around or not. That module was used to create the name tags we are are wearing on our lanyards, and I created that module! I'll be using parts of that module as an example as I continue.

Actually, I have a GitHub account under an alias, 'tbrowder', and I have published several useful modules to help my PDF work. I'll mention some later.

Note I always publish my finished modules via 'zef', the standard Raku package manager. Then they are easily available for installation, and they automatically get listed on https://Raku.land and can easily be found. Why do I do that, especially since no one else may care? (As '@raiph' or somone other Raku user once said, modules are usually created for the author's use.) Because, if it helps someone else, as the Golden Rule says, it may be the right thing to do (as I believe 'Font::Utils' does).

Fonts

Fonts used to be very expensive and hard to set type with, so the modern binary font is a huge improvement! Binary fonts come in different formats, and their history is fascinating. This shop prefers OpenType fonts for two practical reasons: (1) they provide extensive Unicode glyph coverage for multi-language use (we operate world-wide as you well know) and (2) they provide kerning for more attractive type setting. By using the HarffBuzz library, the final PDF will only contain the glyphs actually used by the text (the process is called 'subsetting'). If a user is not running Debian 12 or later, then he or she can additionally 'use' module 'Compress::PDF' which provides a subroutine which uses Ghostscript to remove unused glyphs. The module also provides a CLI program, 'pdf-compress', which does the same thing.

For Latin and most European languages we use the following font collections available as Debian packages:

Font Debian package
GNU Free Fonts fonts-freefont-otf
URW Base 35, fonts-urw-base35
E B Garamond fonts-egaramond, fonts-garamond-extra
Cantarell fonts-cantarell

For other languages we rely on the vast glyph coverage of Google's Noto fonts (in TrueType format). Debian has many of those fonts, but we find it easier to find the needed font at Google and download them onto our server as zip archives. After unzipping, we move the desired files to '/usr/local/share/fonts/noto'. We currently have these 10 Noto fonts available (plus 175 other variants):

Font
NotoSerif-Regular.ttf
NotoSerif-Bold.ttf
NotoSerif-Italic.ttf
NotoSerif-BoldItalic.ttf
NotoSans-Regular.ttf
NotoSans-Bold.ttf
NotoSans-Italic.ttf
NotoSans-BoldItalic.ttf
NotoSansMono-Regular.ttf
NotoSansMono-Bold.ttf

Note the file names above are in the family and style order as the Free Fonts in our own '$HOME' directory.

To get a feel for the span of glyphs in some of the fonts, the FreeSerif font has over 1,000 glyphs and can be used for many languages. We typically use that font, and the rest of that font family, for most of our work around Europe and the Americas. For the rest of the world, Google's Noto fonts should cover all but a tiny fraction of the population.

One of the many strengths of Raku is its handling of Unicode as a core entity. You can read about Unicode at its website. Of particular interest are the code charts at https://www.unicode.org/charts. If you select any chart you will see that the code points are shown in hexadecimal. In Raku the code points are natively in decimal. Fortunately, Raku has a method to do that:

# convert decimal 178 to hexadecimal (base 16)
say 178.base(16); # OUTPUT: B2
# convert a hexadecimal 'A23' to decimal
say 'A23'.parse-base(16); # OUTPUT: 2595

Or we can look at Wikipedia where there are charts showing both hexidecimal and decimal code points (Unicode_chars).

Font files and utilities

  • module 'PDF::FontCollection'

Not released yet is my Raku module 'PDF::FontCollection' which will encapsulate useful font collections into a single reference list. The module has routines allowing the user to get a loaded font with a short code mnemonically associated with the font collection (a digit, and a one- or two-letter character for its style). It has an installed binary to show its fonts by number, code, and name. However, my most useful module, 'Font::Utils', has made this module effectively obsolete.

  • module 'Font::Utils'

I now introduce my almost-released module 'Font::Utils' which uses fonts already installed and collects them into a file called 'font-files.list' which is then placed into the user's $HOME/.Font-Utils directory. (The git source for the module is at https://github.com/tbrowder/Font-Utils.) Since our servers already have the desired OpenType fonts installed, using 'Font::Utils' is actually more convenient since you can arrange the font list any way you want including: (1) creating your own mnemonic keys for easy reference, (2) deleting or adding data lines, and (3) reordering data lines.

Note that the actual OpenType font files are quite large, but a good design will ensure they are not loaded until specifically called for in the program. If they are already loaded, calling the routine will be is a no-op. The 'Font::Utils' module has one such routine.

Other non-Latin languages are covered in many freely available font collections, including right-to-left and other orientations along with layouts for users who need that capability (the Noto fonts are a good example). As noted, those can be easily added to your 'Font::Utils' collection.

Let's take a look at the first couple of lines in the default installation of my '$HOME/.Font-Utils/font-files.list':

# key  basename  path
 1 FreeSerif.otf                    /usr/share/fonts/opentype/freefont/FreeSerif.otf

We see the comments and the first line of data consisting of three fields. The first field is the unique code which you may change. The second field is the font file's basename, and the last field is file font file's path. You may delete or reorder or add new font entries as you wish.

Now let's say you want to publish some text in Japanese. Oh, you say you don't know Japanese? And you don't have a keyboard to enter the characters? No problem!, There is a way to do that. We first find from Wikipedia that some Unicode characters for the Japanese language are in the Hiragan collection, which covers hexadecimal code points '3041' through '3096' and '3099' through '309F'. Then we create a space-separated string of the characters for each word. We'll use an arbitrary list of them:

my $jword1 = "3059 306A 306B 306C 305D";
my $jword2 = "3059-305D"; # same as $jword1 but with a hyphen for a range
my $jword3 = "306B-306F";

Note that we can use a hyphen to indicate a contiguous range of code points. (We could also use decimal code points, but that's a bit more awkward due to ease and larger number of characters required as well as the confusing use of the joining '-' with subtraction.)

Oops, what font shall we use? I couldn't find a suitable font with the searches on Debian, so I went online to Google, searched for 'Hiragana', and found the font with description 'Noto Serif Japanese'." I selected it, downloaded it, and got file 'Noto_Serif_JP.zip'. I created a directory named 'google-fonts'and moved the zip file there where I then unpacked them to get directory 'Noto_Serif_JP' with files:

README.txt
NotoSerifJP-VariableFont_wght.ttf
OFL.txt
static
static/NotoSerifJP-Bold.ttf
static/NotoSerifJP-SemiBold.ttf
static/NotoSerifJP-Medium.ttf
static/NotoSerifJP-Light.ttf
static/NotoSerifJP-ExtraLight.ttf
static/NotoSerifJP-ExtraBold.ttf
static/NotoSerifJP-Black.ttf
static/NotoSerifJP-Regular.ttf

The main font is a variable one, so I tried it to see the results.

Text

There are many ways to lay out text on a page. The most useful for general use is to create reusable text boxes.

Reusable text boxes

# define it
my PDF::Content::Text::Box $tb .= new(
   :$text, :$font, :$font-size, :$height,
   # style it
   :WordSpacing(5), # extra spacing for Eastern languages
);
#...clone it and modify it...
$tb.clone(:content-width(200));

Use the $page.text context to print a box:

my @bbox;
$page.text: {
    .text-position = $x, $y;
    # print the box and collect the resulting bounding box coordinates
    @bbox = .print: $tb;

Graphics and clipping

I won't go into it very much, but you can do almost anything with PDF. The aforementioned 'Name::Tags' module has many routines for drawing and clipping. Another of my modules on deck is 'Graphics::Utils' which will encompass many similar routines as well as the published 'PDF::Document' module.

Tricks and hints for module authors

Create a run alias

I was having trouble with the variable syntax needed testing scripts as well as test modules. Raku user and author '@librasteve' suggested creating an alias to do that. The result from my '.bash_aliases' file;

alias r='raku -I.'      # adhoc script Raku run command
alias rl='raku -I./lib' # adhoc script Raku run command

Use a load test

I was having problems with modules once and '@ugexe' suggested always using a load test for checking all 'lib' modules will compile. Now I always create a test module similar to this:

# file: t/0-load-test.t
use Test;
my @modules = <
    Font::Utils
    Font::Utils::FaceFreeType
    Font::Utils::Misc
    Font::Utils::Subs
>;
plan @modules.elems;
for @modules {
    use-ok $_, "Module $_ can be used okay";
}

done-testing;

And run it like this: r t/0*t

$ r t/0*t
# OUTPUT:
1..4
ok 1 - Module Font::Utils can be used okay
ok 2 - Module Font::Utils::FaceFreeType can be used okay
ok 3 - Module Font::Utils::Misc can be used okay
ok 4 - Module Font::Utils::Subs can be used okay

'=finish'

Use '=finish' to debug a LONG rakumod file. I use it when I'm in the process of adding a new sub or modifying one and commit some error that causes a panic failure without a clear message. I add the '=finish' after the first routine and see if it compiles. If it does I move the '=finish' line to follow the next sub, and so on. When I do get a failure, I have a better idea of where the bad code is. Note I do sometimes have to reorder routines becaause of inter-module dependencies. It's also a good time to move completely independent routines to a lower-level module as described next.

Encapsulate small, independent routines

Sometimes, as in 'Font::Utils', I create a single, long file with routines that are dependent on other routines so that it is difficult to tell what is dependent upon what. Then I start creating another, lower-level module that has routines that are non-dependent on higher level modules. You can see that structure in the load test output shown above.

Use the BEGIN phaser

Module authors often need access to the user's $HOME directory, so use of the 'BEGIN' phaser as a block can make it easier to access it from the 'Build' module, as well as the base modules. Here is that code from the 'Font::Utils' module:

unit module Font::Utils;

use PDF::Font::Loader :load-font;
our %loaded-fonts is export;
our $HOME is export = 0;
our $user-font-list is export;
our %user-fonts     is export;
BEGIN {
    if %*ENV<HOME>:exists {
        $HOME = %*ENV<HOME>;
    }
    else {
        die "FATAL: The environment variable HOME is not defined";
    }
    if not $HOME.IO.d {
        die qq:to/HERE/;
        FATAL: \$HOME directory '$HOME' is not usable.
        HERE
    }
    my $fdir = "$HOME/.Font-Utils";
    mkdir $fdir;
    $user-font-list = "$fdir/font-files.list";
}
INIT {
    if not $user-font-list.IO.r {
        create-user-font-list-file;
    }
    create-user-fonts-hash $user-font-list;
    create-ignored-dec-codepoints-list;
}

That BEGIN block creates globally accessible variables and enables easy access for any build script to either create a new user fonts list or check an existing one. The INIT block then uses a routine to create the handy %user-fonts hash after the Raku compilation stage enables it.

As '@lizmat' and others on IRC #raku always warn about global variables: danger lurks from possible threaded use. But our 'use cases' should not trigger such a problem. However, other routines may, and we have used a fancy module to help the problem: 'OO::Monitors' by '@jnthn'. See it used to good effect in the class (monitor) 'Font::Utils::FaceFreeType'.

Experiment: Embedded links

Currently the PDF standard doesn't deal with active http links, but my Raku friend David Warring gave me a solution in an email. I think I've tried it, and I think it works, but YMMV.

David said "Hi Tom, Adding a link can be done. It's a bit lower level than I'd like and you need to know what to look for. Also needs PDF::API6, rather than PDF::Lite. I'm looking at the pdfmark documentation. It's a postscript operator and part of the Adobe SDK. It's not understood directly by the PDF standard, but it's just a set of data-structures, so a module could be written for it. I'll keep looking."

His code:

# Example:
use PDF::API6;
use PDF::Content::Color :&color, :ColorName;
use PDF::Annot::Link;
use PDF::Page;

my PDF::API6 $pdf .= new;
my PDF::Page $page = $pdf.add-page;

$page.graphics: {
    .FillColor = color Blue;
    .text: {
        .text-position = 377, 515;
        my @Border = 0,0,0; # disable border display
        my $uri = 'https://raku.org';
        my PDF::Action::URI $action = $pdf.action: :$uri;
        my PDF::Annot::Link $link = $pdf.annotation(
            :$page,
            :$action,      # action to follow the link
            :text("Raku homepage"), # display text (optional)
            :@Border,
         );
    }
}
$pdf.save-as: "link.pdf";

Use modules 'App::Mi6' and 'Mi6::Helper'

Because of the artistic nature of our work, we often need to create our own modules for new products. In that event, we use module App::Mi6 and its binary mi6 to ease the labor of managing recurring tasks with module development. By using a logical structure and a 'dist.ini' configuration file, the user can create Markdown files for display on GitHub or GitLab, test his or her test files in directories 't' and 'xt', and publish the module on 'Zef/Fez' with one command each.

By using my module 'Mi6::Helper', the author can almost instantly create a new module 'git' source repository with its structure ready to use app mi6 with much boiler plate already complete.

I'm also working on a 'dlint' program to detect structural problems in the module's git repository. It is a linter to check the module repo for the following:

  1. Ensure file paths listed in the 'META6.json' file match those in the module's '/resources' directory.

  2. Ensure all 'use X' statements have matching 'depends' entries in the 'META6.json' file.

  3. Ensure all internal sub module names are correct.

The linter will not correct these problems, but if there is interest, that may be added in a future release.

Party time: finale

Neb concluded his presentation:

"Bottom line: Raku's many PDF modules provide fairly easy routines to define any pre-press content needed. Those modules continue to be developed in order to improve ease of use and efficiency. Graphic arts have always been fun to me, but now they are even 'funner!'".

Santa's Epilogue

Don't forget the "reason for the season:" ✝

As I always end these jottings, in the words of Charles Dickens' Tiny Tim, "may God bless Us, Every one!" [2]

Footnotes

  1. Old print shops: When I was in the seventh grade in my mother's home town of McIntyre, Georgia, where we lived while my Dad was in Japan, we had a field trip to the local print shop of the Wilkinson County News. There the printer had to set lead type by hand, install the heavy frame set of type for the full page on the small, rotary press. Then he hand fed fresh newsprint paper as the apparatus inked the type-frame and then pressed the paper against the inked type. Finally, he removed the inked sheet and replaced it with the next copy. One of the kids noted the sharp pin on the press plate to hold the paper and asked if that ever caused problems. The printer showed us his thumb with a big scar and said that was the mark of an old time press man and just an expected hazard of the trade.

  2. A Christmas Carol, a short story by Charles Dickens (1812-1870), a well-known and popular Victorian author whose many works include The Pickwick Papers, Oliver Twist, David Copperfield, Bleak House, Great Expectations, and A Tale of Two Cities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment