CodeSyntaxHighlight MediaWiki Extension

From SwinBrain

Contents

What & Why

Powered by GeSHi

A MediaWiki extension to provide code syntax highlighting in Wiki pages using the GeSHi project by Nigel McNie (released under the GNU GPL).

This extension was written by Clinton Woodward for SwinBrain, and we are happy to make it available to anyone who would like to use it, but without any guarantees of any kind what-so-ever. This code is simple and should remain "free" for any one to use as you wish. Feel free to let Clinton know if you are using the extension, or if you have suggested changes.

Note: The first version of the extension was written for earlier MediaWiki versions that did not support the parsing of the element tag attributes to the mediawiki parser hook. Hence, a syntax of brackets <code>[...]...</code> was used to set parameters for the language, lines numbers and so on. This new version (4) of the extension supports the attribute parameter as well (for backward compatibility). However, a new installation could just strip out the old syntax support.
Tip: See the CodeSyntaxHighlight Usage page for some more examples of how to use the extension in your own entries.

The Extension Code

We'll use the extension to make this look pretty.

<?php
 # Code syntax highlighting extension for MediaWiki using the GeSHi engine.
 # See http://qbnz.com/highlighter/ for more info about GeSHi.
 # Written by Clinton Woodward cwoodward@swin.edu.au for SwinBrain
 #
 # Usage in mediawiki
 #  <code>[lang,line_no,start_at,range(1-4,6-7)]...</ code>
 # Examples
 #  <code>[lang]...</ code> // line_no default=Y, start_at default=1
 #  <code>[lang,N]...</ code> // no line numbers
 #  <code>[lang,5]...</ code> // start at line 5 (with line numbers
 #  <code>[lang,Y,5]...</ code> // deprecated - same as above
 #                             // ('Y' is implied by start_at)
 #  <code>[lang,1,(3-5,7)]...</ code> // mark lines 3,4,5 and 7
 #
 #  - lang is required, others are optional
 #  - line_no default is Y, start_at is default to 1
 #
 # version 1, started 25th August 2005
 # version 2, added 29th August, 2005
 # version 2.1, 19th October 2005
 #  - cleanup for SwinBrain release.
 #  - moved to the ByteClub wiki and added lots of languages
 # version 3, 2006-04-05
 #  - added changes from Jocelyn Fiat to allow single line code
 #  - altered args behaviour, deprecated implicit line_no='Y' when start_at given.
 #  - added support for the mark-range argument eg. (1,3,5-7)
 #  - removed case-sensitivity for 'Y'/'N' arg value
 # version 4, 2006-12-12
 #  - added attribute based parameters instead of old pre mediawiki 1.5 [...] method
 #  - <code lang="lang" numbers="Y|N" startat="#" range="1-3,6,9">...</ code>
 #  -       
 
$wgExtensionFunctions[] = 'wfCodeSyntaxHighlight';
 
$cshLanguages = array (
  'actionscript','ada','apache','asm','c','cpp','eiffel','ini','html','xhtml',
  'java','java5','css','js','javascript','vbnet','csharp','pascal','xml','php',
  'delphi','bash','perl','lisp','matlab','mpasm','objc','vb','smarty','vhdl',
  'ruby','sql','python','pseudocode','qbasic','scheme','oracle8','dos','text');
 
// See: http://qbnz.com/highlighter/geshi-doc.html
include_once('extensions/CodeSyntaxHighlight/geshi.php');
define('CSH_GESHI_PATH','extensions/CodeSyntaxHighlight/geshi');
 
 
function wfCodeSyntaxHighlight() {
  global $wgParser, $cshLanguages;
  # register the extension with the WikiText parser
  $wgParser->setHook('code', 'renderCodeSyntaxHighlight');
}
 
# The callback function for converting the input text to HTML output
function renderCodeSyntaxHighlight($source, $args=array()) 
{
  global $cshLanguages;
  
  // defaults
  $source = trim($source);  // remove any unwanted whitespace 
  $lang = '';
  $line_no = 'Y';
  $line_start = 1;
  $line_range = '';
 
  // are there args? use them
  if (isset($args['lang'])) {
    
    // find the matching language
    $lang = (in_array($args['lang'],$cshLanguages)) ? $args['lang'] : '';
    // check for the special langlist request
    if ($args['lang']=='langlist') {
      sort($cshLanguages); // sort so its easy to read
      return 'Current enabled languages: <code class="csh">'.implode(', ',$cshLanguages).'.</ code>';
    }
    // show line numbers?
    $line_no = (isset($args['numbers']) && strtoupper($args['numbers'])=='Y') ? 'Y' : 'N';   
    // start at? implies numbers='Y'
    if (isset($args['startat']) && is_numeric($args['startat'])) {
      $line_no = 'Y';
      $line_start = $args['startat'] ;
    }
    // range? implies range="..."
    if (isset($args['range'])) {
      $line_no = 'Y';
      $line_range = getCSHMarkRange(trim('('.$args['range'].')'));
    }
        
  }
  // use the old [...] syntax for parameters
  else {
    
    // extract the required args [...]
    $lines = explode("\n",$source);
    $args = trim($lines[0]);
    
    // Is there a [..] section?
    if (strpos($args,'[')===0 && strpos($args,']')!==false) 
    {
      
      // extract the args values, trim off the [] characters.
      $eofargspos = strpos($args, ']');
      $args = explode(',', substr($args, 1, $eofargspos - 1));
      // Language? Is it one of the one we have enabled?
      if (in_array($args[0],$cshLanguages)) {
          $lang = $args[0];
      }
      // Is this the special "list the languages" command?
      elseif ($args[0] == 'langlist') {
          sort($cshLanguages); // sort so its easy to read
          return 'Current enabled languages: <code>'.implode(', ',$cshLanguages).'.</ code>';
      }
      else {
          $lang = '';
      }
      
      // Show line numbers/start line no?
      if (isset($args[1])) {
        if(is_numeric($args[1])) {
          $line_no = 'Y'; // implied by the presence of a start line number
          $line_start = intval($args[1]);
          if(isset($args[2])) {
              $line_range = getCSHMarkRange(trim($lines[0])); // give the whole arg string
          }
        }
        else {
          $line_no = (strtoupper($args[1]) == 'Y') ? 'Y' : 'N';
          // start line given? May be irrelevant if 'N' set :)
          if(isset($args[2]) && is_numeric($args[2])) {
              $line_start = intval($args[2]);
          }
        }
      }
 
      // Get rid of the now used first bit of [...] info, implode and trim
      $lines[0] = trim(substr($lines[0], $eofargspos + 1, strlen($lines[0])));
      $source = trim(implode("\n",$lines));
 
    }
  } // end old [...] syntax
  
  
  if ($lang !== '') 
  {
 
    // Remap any languages?
    if ($lang == 'pascal') { $lang = 'delphi'; } // looks better
    if ($lang == 'js') { $lang = 'javascript'; } // shortcut
    if ($lang == 'html') { $lang = 'html4strict'; } // shortcut
    if ($lang == 'xhtml') { $lang = 'html4strict'; } // remap
 
    // Create the GeSHi parser object, tell it what it needs...
    $geshi = new GeSHi($source, $lang, CSH_GESHI_PATH);
 
    // turn line numbers on?
    if ($line_no == 'Y') {
      // add to remove the extra line height we didn't like to see in mediawiki
      $norm = 'border: 0px solid green; margin: -1px; padding: 0;';
      $geshi->set_line_style($norm);
      $geshi->enable_line_numbers(GESHI_NORMAL_LINE_NUMBERS,0);
      if(is_array($line_range)) {
        $mark = 'background-color:#FFFFF0; color: red; font-weight: bold;';
        $geshi->set_highlight_lines_extra_style($mark);
        $geshi->highlight_lines_extra($line_range);
      }
    }
    // start line numbering at ...
    $geshi->start_line_numbers_at($line_start);
 
    // parse the output, hand it back.
    $output = $geshi->parse_code();
    return '<div class="csh">'.$output.'</div>';
 
  }
  else {
    // If it's not for us, catch and return with <code>...</ code> 
    return '<code>'.htmlentities($source).'</ code>';
  }
} // end renderCodeSyntaxHighlight()
 
function getCSHMarkRange($str)
{
  // get the (...) substring as an array of int values
  $start = strpos($str,'(')+1;
  $end = strpos($str,')');
  $str = substr($str,$start,$end-$start);
  // break into parts
  $tmp = explode(',',$str);
  foreach($tmp as $key=>$val) {
    // expand the 4-7 type ranges and replace in $tmp
    if(strpos($val,'-')) {
      list($start,$end) = explode('-',$val);
      for($i=$start; $i<=$end; $i++) $tmp[] = $i;
      unset($tmp[$key]);
    }
  }
  return $tmp;
}
 
?>

That's it. I hope the comments in the code help and that the coding style is not too offensive to all the hard-core PHP coders out there.

I was using the nowiki tags around the closing code tag to stop the mediawiki parser from getting upset about the appearance of closing code tags in the code block, but that no longer works I and don't know why yet. So, all </ code> will need to be replaced with </code> (space removed). Looking for a fix with this.

Extensions to the Extension?

Briefly, here are some other features I would like to add:

  • Make better use of the "fancy" line numbers so that sub-sets of the lines can be highlighted differently for the purposes of illustration/discussion. Done 2006-04-05. See the Help:Editing page for usage details.
  • Use static CSS files to reduce the now bloated inline (but perfectly valid) style that is being used.

Installation Files & Folders

This is how we have used the extension - you could easily do this other ways.

  • Went to the "extensions/" folder of our MediaWiki installation
    • Created a new php file for the extension code. We called ours "CodeSyntaxHighlight.php" and used the code listed above.
    • Created a new folder called "CodeSyntaxHighlight/" in "mediawiki/extensions/" to contain the related GeSHi code.
      • Copied the "geshi.php" file into the "CodeSyntaxHighlight/" folder.
      • Created a sub folder called "geshi" in "CodeSyntaxHighlight/"
      • Copied all the GeSHi language files (ie. pascal.php, vbscript.php etc) that we wanted into the "geshi" folder.
  • Edited the MediaWiki configuration file "LocalSettings.php" (in the "mediawiki/" folder) so include our new extension file.

In the LocalSettings.php file, look for the ... "extensions/ExampleExt.php" (depending on your version - see the mediawiki home page for more details)and add your own include for your new extension. So for us, something like:

...
# include("extensions/ExampleExt.php");
include("extensions/CodeSyntaxHighlight.php");
...

And for the entire install we now have something like:

 mediawiki/
 ...
 LocalSettings.php
 extensions/
   CodeSyntaxHighlight.php
   CodeSyntaxHighlight/
     geshi.php
     geshi/
       actionscript.php
       ada.php
       ...
       xml.php

Bold indicates files/folder we created, italic files are normal GeSHi files that we copied.

Customising Code Keyword Links

Here in the Faculty of ICT we have local copies of various manuals etc. The GeSHi engine will - by default - link some language key words to online documentation or Google search queries. For example, a PHP function or keyword can be a link to the PHP website (whic will then show the online documentation - nice!)

We have updated some of our GeSHi language files to point to our own local copies of manuals. This is then faster and more convenient for our local use of the manuals.

The main CSS file was updated so that external links in syntax highlighted code section do not have an external link icon next to them (this clutters up and confuses the readability of the code).

Here's one update example. In the geshi/php-brief.php file there is a section like

...
  'URLS' => array(
    1 => '',
    2 => '',
    3 => 'http://www.php.net/{FNAME}',
    4 => ''
  ),
...

We change this to

...
  'URLS' => array(
    1 => '',
    2 => '',
    3 => 'http://php.it.swin.edu.au/{FNAME}',
    4 => ''
  ),
...

What Languages are Supported?

The list shown below is dynamically generated by the extension (see the code) using the command <code>[langlist]</code>, so it's an easy way to check the available languages. Here's our current available languages:

  • Current enabled languages: actionscript, ada, apache, asm, bash, c, cpp, csharp, css, delphi, dos, eiffel, html, ini, java, java5, javascript, js, lisp, matlab, mpasm, objc, oracle8, pascal, perl, php, pseudocode, python, qbasic, ruby, scheme, smarty, sql, text, vb, vbnet, vhdl, xhtml, xml.

You could easily alter the extension code to allow any of the available languages by default, but we decided on a more gradual inclusion of languages if/when we need them. (If you didn't make this kind of a change to the extension you would need check to make sure only one of the real GeSHi language files was being included.)

Enjoy - it was fun to whip-up and we hope it's useful for others .