ORG 2 SHTM0 eazy

Table of Contents

1 Intro

The motivation for this work is to reproduce and exercise the publishing features of org mode. The objective is to involve any digital kind of rich media presentations and facilitate information architecture1 within org mode. The tangible result is a static web site without javascript,2 based on html5 and something that might be called CSS3.3 The deployment is aimed at content delivery network hosting of static sites.

Actions-help-hint icon Due to the restricted functionality of a free wordpress account this is a straight How-To with a very reduced set of background information. But even with all extensions involved my ideas about structured references and supplementary material are too exaggerated to deploy it on a php mysql system. It would involve too much extra work. As a matter of fact that’s the reason I started to look for alternatives.

The research started with Sebastian Rose’s worg entry Publishing Org-Mode Files to HTML at orgmode.org and a comparison to Dennis Ogbe’s approach discussed in the blog entry Blogging using exclusively org-mode at ogbe.net. A lot of additional insight is from the doc strings of the publishing functions, which are mainly part of πŸ“‚ ox-publish.el but also spread all over the individual export libraries, especially πŸ“‚Β ox-html.el.

2 Org Publish

org Publishing is configured almost entirely through setting the value of one variable, called org-publish-project-alist. Well, it’s not a value, it’s an association list, and it can be a pretty large one; see Ogbe’s blog entry cited in the Intro. TheΒ  Manual’s Section Project Alist offers two syntactic patterns for the main elements and Simple Example Section contains the mandatory properties of these elements:

  • :base-directory – the folder containing publishing sources. All files meeting the criteria defined by the properties described in the Manual’s Section Selecting Files are published according to Section Publishing Action.
  • :publishing-directory – target folder or tramp syntax for published files.

3 Example Selection

Research about html5 and β€œsomething that might be called” css3 leads to w3school, mozilla developer, or pure css3 layouts, like Punica. But they are about Web pages, not sites.

My first hit for Web site layouts is the supplementary zip file of a German html Intro (β†’ google Translation), accompanied by a showcase (β†’ gooTrl), at Β wiki.selfhtml.org; see Figure 1. This intro demonstrates new html5 and css3 features. They also offer a whole bunch of additional layout material at the CSS – fertige Layouts (css complete layouts) wiki entry (β†’ gT). All these layouts are declared to be public domain, CC0.

pages
Figure 1: From left to right: πŸ“‚Β index.html, πŸ“‚Β products.html, and a cut version of πŸ“‚Β contact.html. Note the smaller font size of Impressum headline in the right page.

I transfer all organizational elements of the pages into English and rename the html files, available in the bitbucket folder πŸ“‚Β hBeg/; the fact that the contents are in German acts like a lorem ipsum approach. Then I pandoc the html files of the zip file to org, e.g.,

pandoc -f html -t org -o index.org index.html

I want to have all my pictures at one location only, my picture root. After exporting the org files and embedding them into the template the media will be collected by an xml crawler into the upload directory.

The final css file from the procedure below, in Section 7, will be tangled into theΒ  upload folder.

4 Publishing Experiments

The publication property :body-only is not only a <body> issue. I developed a minimal set of publishing options which is shown in the code below, available as bitbucket supplement πŸ“‚Β shtm0.el; the process is revisited in Section 6.

(setq org-publish-project-alist
      '(("shtm0"
         :base-directory "~/myOrgRoot/pub/shtm0/"
         :publishing-directory "~/www/shtm0/"
         :publishing-function org-html-publish-to-html
         :body-only t
         :html-doctype "html5"
         :html-container "section" )))

5 Website Template

The transferred and renamed semi-original html files are the source for identifying locations for placing the main content, the navigational elements, the footer, or a menu.

The pages are very similar, so the header, the navigation and the footer might be considered static. So I won’t make a big mistake if I decide to inject only the main content. I figure the template for the four pages to be

<!doctype html>
<html lang="de">
  <head> ... </head>
  <body>
    <header> ... </header>
    <nav> ... </nav>
<!-- here's where the main content should be injected -->
    <footer> ... </footer>
  </body>
</html>

I’ll add it as πŸ“‚Β tmplt.html to the bitbucket supplemental files and use the two parts above, below, and without the main-content-comment as πŸ“‚ tmplt1.html and πŸ“‚Β tmplt2.html as export brace. Before proceeding to the css I’ll have to know how the export fills in.

6 Org HTML Export

The lines beginning with #+ in an org file translate to in-buffer-settings or export settings, for example. Apart from entries like #+title:, #+Subtitle:, #+Author:, or #+Email: my settings are

#+Language: de
#+Options: num:nil toc:nil
#+HTML_CONTAINER: section
#+HTML_DOCTYPE: html5

Later some of them are transferred into the project’s org-publish-project-alist entry. But that depends on the nature of each option. For example, #+Language: de is related to the individual file, so it should stay in the file. While #+Options: num:nil toc:nil depends on the Web site architecture, so they might end up in the publishing alist as :section-numbers nil or :with-toc nil.

Actions-help-hint icon The org code of the in-buffer #+HTML_CONTAINER: section property transfers to the elisp code publishing property :html-container "section". We can derive this connection by looking up the in-buffer property at the Manual’s Section HTML Specific export settings and inspect the docstring of the corresponding M-x h org-html-container-element. Or look up the whole collection of
property relations in the backend definition of πŸ“‚ ox-html.el, i.e., the :options-alist of org-export-define-backend.

The adaption of the <aside> element is inspired by the Manual’s Section HTML doctypes and related to the HTML_CONTAINER property.

In comparison to the original html pages the headline level of the re-exported html bodies is increased by 1.

From a number of adaption possibilities I decide to choose the raw body-only export as the main content of the template and to change the css accordingly. For this purpose I employ the org mode css procedure borrowed from Fabrice Niessen.

7 Org CSS Construction

This section shows how to reproduce the example’s css file πŸ“‚ formate.css by tangling designated parts of an org file πŸ“‚Β shtm0.org into a style file called πŸ“‚shtm0.css. The procedure is inspired by the file πŸ“‚Β readtheorg.org or, respectively, by the whole github repository org-html-themes of Fabrice Niessen.

My utilization of org tangling begins with cutting the example’s css file πŸ“‚Β formate.css into pieces, putting the pieces into css source code blocks and embedding them in an outline structure of an org file. For my short example I include :tangle ~/www/shtm0/css/shtm0.css in the property drawers of every section.

The adaption the css to the html exports has essential items

  • Change the icon source in πŸ“‚Β tmplt.html and πŸ“‚ tmplt1.html from icon.svg to hobel.png. And the backlink from the root "/" to "index.html".
    <a id="backlink" href="/"><img src="img/icon.svg"
    
  • Change css property for h2 elements to properties of h3; and h1 β†’ h2.
  • Redefine the #services css id as .services .outline-text-3 class pro and in πŸ“‚Β index.org add :HTML_CONTAINER_CLASS: services in the properties drawer of the headline Unsere Leistungen.

Then some design measures. I considered it a pleasure to write them into the css org file, hit C-c C-v t, refresh the browser and see the effect immediately.

The contact form would need a whole team of design engineers to get fixed, but because of its legal character I’ll just leave it untouched in its bureaucratic beauty. Same holds for the page’s un-utilized <aside> element. With the hints from this β€œproof of concept” I consider it a matter of simultaneously changing the πŸ“‚ contact.org and the πŸ“‚Β shtm0.css files supported by publishing and tangling feaures, respectively. The dear reader might choose this as an exercise. The Design 01 (β†’ gT) layout at selfhtml.org offers another version of the contacts page.

8 Glue HTML and Extract Images

org publish considers org-publish-attachment for the handling of static files. The org-static component below copies files and folders (recursive t) from the base to the publishing directory without changing them; see Selecting Files and its preceding section in the org Manual. The corresponding :publishing-function property is set to org-publish-attachment.

("shtm0-static"
:base-directory "~/myOrgRoot/pub/shtm0/img"
:base-extension "css\\|js\\|png\\|jpg\\|gif\\|pdf\\|mp3\\|ogg\\|swf"
:publishing-directory "~/www/shtm0/img"
:recursive t
:publishing-function org-publish-attachment)

Opposed to that procedure my idea is to collect the media from a central image folder. The tools are sketched in the four units below. They are implemented in core R, expanded by Temple-Lang’s XML package, and employ some minor feedback and safety measures. Ahead of collection the export files are combined with the template parts.

8.1 Paths and Files

pub <- "~/myOrgRoot/pub/shtm0";
www <- "~/www/shtm0"; setwd (www);
pic <- file.path(www, "img"); if(!dir.exists(pic)) dir.create(pic);
t <- c(file.path(pub,"tmplt1.html"),file.path(pub,"tmplt2.html"));

8.2 Select Updated Pages

  • Collect the names of html files in πŸ“‚ ~/www/shtm0/ with list.files() and
    the regEx pattern [.][hH][tT][mM][lL]?$, which could be reduced to [.]html$.
  • Look for html files with a <!doctype start and delete their names from the list; see append(), for(){}, length(), and scan().
listHtml <- list.files(path=www, recursive=TRUE,
                       pattern="[.][hH][tT][mM][lL]?$");
del <- NULL;
for (i in 1:length(listHtml)) {quest=FALSE;
    if(scan(listHtml[i],character(),1,quiet=TRUE)=="<!doctype") {
        quest=TRUE }
    del <- append(del,quest)}
listHtml <- listHtml[!del]

8.3 Page Assembly and Media Check

Repeat the steps below for the updated html body exports; see the R help for control stuctures, particularly for(){}.

  • Create a tempfile()4
  • file.append() template-cut-one, export, and template-cut-two to this file
  • htmlParse() the result Read the src and data attributes in <img> and <object> elements, respectively, by applying getHTMLExternalFiles() and append()‘ing them to the img collection. The attribute xpQuery of getHTMLExternalFiles() is an xpath expression and defaults to c("//img/@src", "//link/@href", "//a/@href", "//script/@href", "//embed/@src").
  • Overwrite the html extract source with the tempfile construct, using file.copy()
img <- NULL;
for (i in 1:length(listHtml)) {
    x <- tempfile(fileext=".html");
    file.append(x,c(t[1],listHtml[i],t[2]));
    doc <- XML::htmlParse(x);
    xpq <- c("//img/@src", "//object/@data");
    img <- append(img,XML::getHTMLExternalFiles(doc=doc,xpQuery=xpq));
    file.copy(x,listHtml[i],overwrite=TRUE)
}

8.4 Collect Media

Thinning image paths and copy media to the πŸ“‚Β ~/www/img/ folder, with md5 check.

  • unique() the img collection
  • file.copy() the required media to the πŸ“‚Β www/img/ folder. If the image file already exists, force copying if the md5sum() is different.
uImg <- unique(img);
for (j in 1:length(uImg)) {
    fPic <- file.path(www, uImg[j]);
    fPub <- file.path(pub, uImg[j]);
    if (file.exists(fPic)) {
        sPic <- tools::md5sum(fPic);
        sPub <- tools::md5sum(fPub);
        if(sPic!=sPub) {
            file.copy(fPub, pic, overwrite=TRUE) }
    } else { file.copy(fPub, pic) } }

9 Netlify Drop

The content of πŸ“‚Β ~/www/shtm0 dropped into netlify delivers the pages of the screenshot excerpt shown in Figure 2.

pages1
Figure 2: Org mode version of πŸ“‚Β index.html, πŸ“‚Β products.html, and a cut version of πŸ“‚Β contact.html.

10 What’s more

The most intriguing fact of today’s blogs is their cloaking mechanism for links. They hide all their ingenuity behind clickable words. And the most the reader can expect from a whole site is to be challenged by a cloud or categories of tags. With the measures reported in this reduced blog entry I entered into the endeavor to programmatically reintroduce indicators of the section below which once were the main source of quickly getting an impression of the content.

In the section after that I’ll reason about a first step to organize the template production like the css approach.

10.1 Index Toc Tag Category Bibentry Link-List

Digital high performance machines could handle table of contents, prematter, backmatter, footnotes, captions, indices, and biblographies. So, why don’t they? For reintroducing these features in a cms we would have to browse through a jungle of extensions and fill a multitude of database tables. That’s why.

Apparently the org team is on a similar trip about the inclusion of citations. With org version 9.5 they introduced a new chapter of org citations with its own prefix oc- for elisp files. A dedicated prefix means to be prepared for a lot of subissues.

One of my next steps is to harness the bibtex run of pdflatex combined with bibtex2html and the by-product of a supplemental pdf. The list below includes more hints about the next topics of my agenda:

  • employ :sitemap properties of org publish for different content views, beginning with a thin css top menu; see Section Generating a sitemap
  • the un’manual’ed rss publishing feature of org works with the :publishing-function (org-rss-publish-to-rss). Its docstring leads to πŸ“‚ ox-rss.el. So it’s basically designed as an export feature. See Blogging from GNU Emacs, Bastien Guerry, 2013-09-25, blog entry at bzg.fr. Compared to the sitemap extract it offers far more details which can be rendered into Web site modules.
  • compare πŸ“‚Β reftex-mode.el, πŸ“‚ org-bibtex.el, the very new org-citation πŸ“‚ oc.el, and external machines like RefManageR, services like zotero, or
    formats like bibframe. How to combine links, tags and bibentries? Right now I’m good with bibtex2html and looked into the command line export features of jabref.
  • See the predefinitions in Macro Replacement for ideas about harnessing variables that have been defined already. For example the customized counters in numbered theorems or examples or exercises that can be produced by n(m,x).
  • Compare the effort (1) to setup a nltk crawler for tagging and linked data methods for categorization (2) of macro tagging, capturing, and the many org hyperlink versions, which include archive, agenda, the whole org-as-a-note-taking system (3) the elaborated index mechanism of the texinfo export,5 and the special index property for the properties’ drawer of a headline: either render exported info files, use pandoc, or regex in the org file source (4) the org publishing feature of Generating an index.

10.2 Glue HTML Expanded

The combination of exports and the template is part of the image collection code in Section 8, but there are other options for the expansion of the export files:

  1. idea: produce the templates by a combination of org mode‘s noweb, tangling and export features. And add customized features with the favorite programming language.
  2. idea: design the template in html cut it in pieces and linux-cat the to the web site page
  3. idea: utilize a cdn feature like netlify‘s File-Based Configuration to build the Web page online. I think this also affects the netlify billing. But it might be covered by a fully fledged org empire.

The first idea is the most attractive approach. Just like Niessen’s css approach it offers a debuggable, self documenting environment for template development. Perhaps in another org construction file or as part of an all-embracing Web site constructor file. For an org mode noweb approach the main part of the home page

<h1>Willkommen ...</h1>
<p>Wir sind seit ....</p>
<section id="service">
<h2>Unsere Leistungen:</h2>
     ...
</section>

will be put into the template below at the space where the <<main-content>> shows up. In the layout considerations of the Section 5 above I just changed the comment line <!-- here's where... --> to a Noweb Reference referring to a #+Name:‘d block or to multiple source code blocks with the same :noweb-ref header argument.

<!doctype html>
<html lang="de">
  <head> ... </head>
  <body>
    <header> ... </header>
    <nav> ... </nav>
<<main-content>>
    <footer> ... </footer>
  </body>
</html>

In an org file which will be responsible for all the tangling jobs this will be a source code block with the header below. The tangling process is triggered with C-c C-v t, aka org-babel-tangle.

#+header: :tangle "~/www/shtm0/indexTest.html"
#+begin_src html  :noweb yes

In the document which produces the blog you’re looking at the :noweb is set to no-export in order to show the <<main-content>> noweb syntax reference; see Section Noweb Reference Syntax in the org Manual. The source code block with the main content carries the header

#+Name: main-content
#+begin_src html :noeval

That’s a possible procedure to inject the main content into the template. At least it produces the intended result. And it is expandable to all other modules I can think of. But how do I programmatically put the exported file into the tangling org file? Hmm.

Reading the export file with an R process in an R source code block and use the output of this process as noweb input? Hmm.

11 Appendix

11.1 Supplemental Material

Except for the first item all supplemental material is available from the public bitbucket folder πŸ“‚Β shtm0/ of repository StPjotr. You can clone it with the command below which will produce a πŸ“‚Β shtm0/ folder in your current directory.

  • The remodeled and renamed exercise html files from selfhtml.org (zip) are in the πŸ“‚Β hBeg/ folder of the bitbucket repo.
  • Four org files which export to the main content of the Web pages are in the πŸ“‚ pub/ folder of bitbucket.
  • html template πŸ“‚Β tmplt.html for the Web site β†’ in the πŸ“‚Β supp/ folder.
  • css producing org file πŸ“‚Β shtm0.org β†’ in πŸ“‚Β supp/.
  • elisp file πŸ“‚Β shtm0.org containing the org publish setting in πŸ“‚Β supp/.
  • My carpenter logo πŸ“‚Β hobel.png β†’ in πŸ“‚Β supp/.
git clone https://StPjotr@bitbucket.org/StPjotr/shtm0.git

11.2 System Info

Linux 5.13.0-39-generic #44, 20.04.1-Ubuntu SMP Thu Mar 24 16:43:35 UTC 2022 β€’ GNU Emacs 26.3 (build 2, x86_64-pc-linux-gnu, GTK+ Version 3.24.14) of 2020-03-26, modified by Debian β€’ Org mode version 9.3.1 β€’ R version 3.6.3Β  (2020-02-29) utilizing the core base, core tools and Temple-Lang’s XML packages.

Footnotes:

1

See Information Architecture for the World Wide Web, Morville and Rosenfeld, 3rd ed., 2006 O’Reilly.

2

At first I only aim at switching off all javascript which is rendered superfluous with new CCS mechanics. Afterwards, but not here, javascript and html api‘s are considered to be reinforced again.

3

Due to Meyer & Weyl’s CSS: The Definite Guide (4th edition, 2018, p.2) β€œit’s hard to speak of a single `css3 specification.’ There isn’t any such thing, nor can there be.”

4

According to the help page tempfile() doesn’t create a file, but the directory shows an entry after the command.

5

texinfo offers concept, function, variables, keystroke, program, data type and customized indices. org supports the predefined ones as setting like #+PINDEX or as entry in a headline’s property drawer; see Indices.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.