Language forks bring new power to programming

From Hack to Cython, inventive forks are pushing popular programming languages in new directions

Language Forks
Credit: Comstock

Computer languages are like their real-life counterparts: They constantly evolve. But unique to the evolution of programming languages is the ability to expressly fork them -- to publicly announce a desire to branch off and deviate from the lineage. Sometimes the forks are temporary, with the new branch rejoining and influencing its parent. Other times, a useful variation of an existing language arises and is sustained. Or the mutation takes off, and an entirely new language is born.

The desire to tinker and innovate is only one reason to change a computer language. Another major impetus is that any programming language will in time show its limits, whether in the language itself or in its implementation. Those evolutionary pressures drive users to either change it for the better or to leave it behind for another option.

Most language forks evolve in one of three ways:

  1. As an entirely new, potentially incompatible branch of the language
  2. As a new language that compiles down to the original
  3. As a superset or subset of the original language, with features added or removed

Here we explore some of the more vibrant examples of each approach currently evolving today.

A new language: PHP and Hack

PHP’s sheer popularity is both its blessing and its curse. The upside: Applications developed in the language are all but guaranteed to run anywhere. The curse? PHP’s curious quirks and internal inconsistencies won't likely be ironed out soon, lest the changes break backward compatibility with much existing PHP code.

Enter Facebook’s Hack language. Hack is a variant of PHP, hatched from Facebook’s use of the language at massive scale. It’s designed to interoperate with PHP, but bristles with features that PHP doesn’t offer, such as type annotations, JavaScript-like lambdas, Java- and C#-like generics, and much more.

The changes Hack brought to PHP demonstrate why a language fork can be appealing. Major changes to the language can be implemented without having to wait for approval from a steering committee or governing body. A proposal to add type hinting to PHP recently passed, but it might be a while before it lands in the actual language, let alone be used in production code. With Hack, those features can be used right now.

The downside of any fork is that it’s likely to be backward-incompatible, meaning any code using the original language might not work. Hack provides a partial solution to this limitation by running on a virtual machine, HHVM, which also supports PHP -- allowing both languages to be deployed side by side on the same interpreter. In this way, an existing PHP codebase can be deployed alongside a newly minted Hack codebase, with the old deprecated over time in favor of the new.

A compiled-down language: JavaScript and the rest

How do you fork a language without forking the language itself? Create a new language that compiles down to the old one. The original language’s limitations, typically its syntax, can be kept at arm’s length from the programmer.

JavaScript and its derivatives are the most prominent examples. Rather than fork JavaScript by changing the language and creating a new interpreter or compiler for it, the new languages that use JavaScript at their core simply compile down to JavaScript and run on the existing engines for that language. CoffeeScript, TypeScript, and many other languages work this way.

Why do this? For one, it’s easier to leverage an existing language’s tool chain than it is to write an entirely new one. In JavaScript’s case, the speed of the existing compilers is a huge boon; even the overhead of mapping constructs in the new language to constructs in JavaScript isn’t too burdensome.

JavaScript has even been used to fork itself, in a sense. Tools like Babel allow features from JavaScript 6 to be back-ported to version 5. This way, a programmer can make use of those future language features right now, while browser-level support is still being rolled (or ironed) out.

One possible downside of using a transpiled version of a language is how debugging works. In JavaScript’s case, most transpilers will generate a source map that can be used to match the generated JavaScript to its original source. CoffeeScript, for instance, permits this. That said, this approach has limits -- for instance, there’s no way (yet) to look up original variable names when debugging, as opposed to their transpiled equivalents.

Subsets and supersets: Python

A third kind of language variant is a subset, a reduced version of the language designed to accelerate performance or address specific issues. Mozilla’s asm.js is one example of a subset of JavaScript into which C/C++ programs can be compiled. But the ever popular Python includes many subsets of its own.

Subsets of Python generally exist as a way to address Python performance -- a language with fewer features is easier to optimize. RPython, the language used by the PyPy Python implementation, is “a restricted subset of Python that is amenable to static analysis” and provides stricter controls over what type a variable can be at any given time. The resulting code can be optimized far more readily by the PyPy JIT compiler than by Python itself.

Just as there are subsets, there are also supersets -- versions of a language that tack on features to broaden what can be done with it. Cython, another Python derivative, adds ways to generate C code directly from Python code, allowing a programmer to accelerate a Python program’s performance by way of C.

Rarely does the likes of Cython or RPython generate the same level of interest as the parent language. In Cython’s case, it appeals mainly to people combining C with Python. If you’re not doing that, there’s little incentive to use it.

Sometimes, with supersets and subsets alike, features will bubble up (or down) into the main language. With static typing in Python, for instance, there’s now a proposal in the works to add type hinting to Python 3, as a way to make it easier to profile code -- and perhaps eventually as a way to accelerate its performance overall.

Future candidates for a fork

What other widely used languages might be destined for a fork in the near future?

One candidate is Google’s Go, aka Golang. High-profile projects such as Docker have been built with it, and the language has enjoyed attention and accolades. But several of its features and behaviors have as many detractors as they do adherents; Go’s error-handling mechanism, for instance, is one feature that’s been singled out for criticism. Lack of generics is another commonly cited shortcoming, and the Go development team has insisted that generics will not be added to the language. If Go’s designers are unwilling to reconsider their stance on such facets -- all signs point to that being the case -- a fork of the language might be the only way forward for the disgruntled.

Another possibility would be variations on Microsoft’s family of .Net languages, mainly C#, made more possible by Microsoft’s new generation of open source compilation frameworks. This development would be distinct from projects like Mono, a separate open source implementation of C# and .Net. Rather, it would be an attempt to take C# in new directions, whether they were compatible with the original or not.

One final possibility, though it’s more of a fork of a specification than a language, is the next major version of HTML. In some ways this already happened, as the WHATWG and HTML5 could be considered forks from the W3C and its version of the standard. There’s no guarantee such a fork would change the landscape, even if it came with a browser to run it, but that’s part of the risk -- and reward -- of forking in the first place.

Related articles

This story, "Language forks bring new power to programming" was originally published by InfoWorld.