What’s in a Number?

Credit: Jorge Franganillo
Credit: Jorge Franganillo

If we program in a low-level language like C, we have access to all sorts of numeric types of various sizes from unsigned char to long double. In JavaScript we have just the one: the IEEE 754 double precision floating point number (double for short). If we are to be limited to just the one number type, this is a pretty good one: it can represent whole numbers, fractional numbers, tiny numbers, enormous numbers and infinity.

A double is a 64-bit value arranged thus:

  • 1 sign bit
  • 11 bits of exponent
  • 53 bits of significand

If you’re sharp, you’ll notice that this sums up to 65 bits. This is possible because the numbers are normalized. As to what “normalized” means, consider the difference between 1.25×103 and 125×101. They both represent the same number but the first is normalized while the second is a bit weird. In base 10, the first digit of a normalized number is between 1 and 9 inclusive. In binary it’s always1. Since it’s always 1, we can leave it out without losing information. Hence, we get 53 bits of precision despite only having 52 bits to play with. An 11-bit exponent gives a range of 0-2047. Since we need to be able to represent small numbers, the actual exponent is obtained by subtracting 1023 from this, giving a range of -1023 to 1024. As we’ll see, the maximum and minimum exponent values are reserved so the usable range is -1022 to 1023.

In C it’s easy to get at the internals of a double:

#include <stdint.h>
...
double pi = 3.141592653589793;
uint64_t *p = (uint64_t *) &pi;
/* Twiddle bits through *p */
...

In JavaScript, we can’t access the internals directly, but a little bit of trickery opens them right up. Let’s see whether this is possible. Tell me, computer, will you allow me to peek inside the internals of a double?

Assuming that the required runtime support is available, the following function will split a double into its constituent parts as human-readable strings:

window.showNum = function(n) {
    var buf = new ArrayBuffer(8),
        dv = new DataView(buf);
    dv.setFloat64(0, n, true);
    var m = dv.getInt32(0, true),
        i = 0, j = 1, p = "", ret = {};
    for (; i < 64; ++i, j <<= 1) {
        if (i === 32) {
            // Done with the first half
            j = 1;
            m = dv.getInt32(4, true);
        }
        else if (i === 52) {
            // Built up the significand
            ret.m = p;
            p = "";
        }
        else if (i === 63) {
            // Built up the exponent
            ret.e = p;
            p = "";
        }
        if (m & j) {
            p = "1" + p;
        }
        else {
            p = "0" + p;
        }
    }
    // Set the sign bit
    ret.s = p;

    // Calculate the represented value as a
    // base-2 exponential value
    var sig = 1, f = 0.5, e = -1023;
    for (i = 0; i < 52; ++i, f /= 2) {
        if ('1' === ret.m.charAt(i)) {
            sig += f;
        }
    }
    j = 1;
    for (i = 10; i >= 0; --i, j <<= 1) {
        if ('1' === ret.e.charAt(i)) {
            e += j;
        }
    }
    ret.t = sig + "e" + e;
    if ('1' === ret.s) {
        ret.t = '-' + ret.t;
    }

    return ret;
};

Here’s how it works. An ArrayBuffer is a low-level generic data container that behaves roughly like a byte array. We get at the internals of the buffer by overlaying either a typed array (such as Int32Array) or a DataView. I chose a DataView because then I don’t have to worry about endianness. The true arguments to setFloat64 and getInt32 means “little-endian, please”. We stick the double into the buffer and then access it as two 32-bit integers. We may not be able to get at the bits in a double from JavaScript but we can do bitwise operations on an int. Had we used false as the endianness flag, we’d have to read out the ints in reverse order. Without a DataView, I would have to do some additional work to determine the endianness of your system. When we’re done calculating the bit strings, we build up a base-2 exponential representation that’s a bit more human-readable. The bits of the significand are all fractional with the first bit representing ½, the second bit ¼ and so on. The exponent is calculated working right-to-left along the bit string, remembering to take the bias of 1023 into account. The return property for the significand is “m” which stands for mantissa. Technically, this is wrong because the strict definition of “mantissa” is the fractional part of a logarithm. However, “s” is already used for the sign so “m” it is.

Let’s try it out. Tell me, computer, what does 1 look like internally?

That’s not a very interesting number apart from the exponent pattern which shows the 1023 bias. Tell me, computer, what does -√2 look like internally?

Limits

Let’s have a look at the limits of what is representable. Tell me, computer, what does the largest absolute value that you can handle look like internally?

In base 10, this is 1.7976931348623157×10308. What about the smallest?

That last one may be surprising if you recall that I said the minimum usable exponent was -1022. This value is denormalized and is, in fact, the smallest denormalized value. This may (or may not) be familiar to C++ programmers as std::numeric_limits::denorm_min. An exponent of -1023 signfies a subnormal number and the signficand must be interpreted as 0.f rather than 1.f. Therefore, the significand of the smallest number represents 2-51. When you add the exponents together, you get a value of 2-1074 which is the absolute limit of representability that is not zero. Using bits from the significand as additional exponent bits allows for gradual underflow at the cost of precision – at the limit, there is only a single bit. In base 10, it translates (very approximately) to 5×10-324.

The more familiar value to C/C++ programmers is DBL_MIN. Tell me, computer, what does DBL_MIN look like internally?

In base 10, this is 2.2250738585072014×10-308.

Infinity and NaN

A value with all 1s in its exponent signifies an abnormal condition. The first condition is overflow. Tell me, computer, what does the product of 10155 and 10154 look like internally?

This value is commonly represented as Infinity. The sign bit can be set and this represents minus infinity. The characteristic of the two infinities is that all exponent bits are 1 and all significand bits are 0.

The second condition is where the numeric representation of a value is undefined. Tell me, computer, what does 0/0 look like internally?

This value is commonly represented as NaN (aka, not a number). The characteristic of NaN is that all exponent bits are set along with 1 or more significand bits.

Zero

If the significand in a double is to be interpreted as 1.f, there is no possible representation of zero since there is no y such that 1.f×2y = 0. The solution is to reserve a pattern to represent 0. From the representation of the smallest denormalized number above, you’d be correct in guessing that this pattern is all exponent bits set to 0 along with all significand bits. However, the sign bit can be set or unset which means that there are two zero values: 0 and -0. -0 will be the result, for example, of truncating a value between 0 and -1.

Epsilon

Although we can represent minuscule numbers by reducing the exponent, there is a hard limit to the fractional values that can be represented by adjusting the significand. Since a double has 52 fraction bits, this value may trivially (and precisely) be calculated as Math.pow(2, -52). This value is known as machine epsilon and is denoted by 𝜀. It is the smallest value for which the following inequality is true:
1 + \epsilon > 1
Assuming that your browser supports it, the machine epsilon value is in Number.EPSILON

Largest integer values

The width of the significand also determines the limits on the maximum size of a whole number that can be represented without loss of precision. The largest values have all significand bits set to 1 and the exponent set to 10000110011b (that is, 52). In other words, ±253 – 1. Note the qualifier “without loss of precision”. Larger integer values can be represented but not uniquely. To illustrate:

253 - 2 → 9007199254740990
253 - 1 → 9007199254740991
253     → 9007199254740992
253 + 1 → 9007199254740992

Assuming browser support, the largest integers that can be represented in a double are given by the MAX_SAFE_INTEGER and MIN_SAFE_INTEGER properties of Number.

Playing

Assuming you saw the answer “Of course” to the first question I asked the computer, you can type numbers into the following input and the component parts will be displayed in the box below. Large and small numbers can be entered using ‘e’ notation, for example, 1.23e-50. You can also type properties of Math and Number, such as PI, E, SQRT2, MAX_SAFE_INTEGER, EPSILON, etc.


Putting the Reporting Into Perl

Credit: Adrian
Credit: Adrian

One of many backonyms for Perl is “Practical Extraction and Reporting Language”. The ability to create reports complete with headers, pagination and justified output is a feature built into the language yet I’m going to hazard a guess that few Perl programmers have ever used it. While you can go a long way with sprintf, producing justified output, in particular, is non-trivial. If you need it, you may as well let Perl do the grunt work for you.

Declaring and using formats

Formats are declared with the format keyword and formatted output is produced using the write function (note: not the same function as the system call; that’s syswrite). A format
declaration takes the following form:

format NAME = 
@field @field ...           "picture" line
$variable, $variable ...    argument line
# Comment line
.

There can be any number of picture and argument lines. Each pair defines the pattern for a line of output. The @field is the ‘@’ character followed by by one or more ‘<‘, ‘>’, ‘|’ or ‘#’ characters. Details (and examples) are:

  • @<<<<<
    Six characters, left justified
  • @>>>>
    Five characters, right justified
  • @|||
    Four characters, centre justified
  • @###.##
    Decimal number with four digits in front of the decimal point and two after.

Note that the ‘@’ character is included in the character count.

The second line is a list of variable names (or constants) that hold the field values. You may see examples that show these space-separated like the field declarations. Follow this example if you like deprecation warnings, otherwise comma separate them. The format declaration is terminated by a lone ‘.’ at the start of the line. Here’s a simple script that will print its input centre-justified:

#!/usr/bin/perl -w
use strict;
format STDOUT =
@|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
$_
.
write while (<>);

Note the name “STDOUT”; this is the default name for a format and may be omitted. To define a page header create a format with the name <NAME>_TOP, for example, “STDOUT_TOP”.

Full program

We’re going to create a program that will create a formatted report of the errno definitions for your system. Each line in the output will show the numeric value, right justified, the constant declaration and the error description.

errno definitions

These are found in /usr/include/errno.h:

$ cat /usr/include/errno.h
...License text
#include <sys/errno.h>

Seems like it’s not going to be that easy. sys/errno.h includes a bunch of other files each of which include other files and so on. We could do this the hard way or we could note that there is already a program that can handle files which include other files which include yet other files: the C preprocessor. If you are using gcc, the invocation that shows preprocessor output is gcc -E. If you are using the Microsoft compiler this will be cl /E:

$ echo '#include <errno.h>' | gcc -E -
...Some output
# 1 "/usr/include/errno.h" 1 3 4
# 23 "/usr/include/errno.h" 3 4
# 1 "/usr/include/sys/errno.h" 1 3 4
# 72 "/usr/include/sys/errno.h" 3 4
...More output

The interesting pattern is:

# <LINENO> "<FILENAME>" 1 ...

This is emitted when the preprocessor opens a file. Inside each include file, we’re looking for two types of line:

#define EAGAIN  35 /* Blah blah */
...
#define EWOULDBLOCK EAGAIN  /* Blah blah */

We now have enough information to write the bit of our program that gathers data:

#!/usr/bin/perl
# Produces formatted report of errno definitions

use strict;
use warnings qw/all/;

# errno.h is typically a whole bunch of other 
# includes; rather than trying to process
# it ourselves, let's get the C-preprocessor 
# to tell us where the includes are
my $child;
open($child, "-|", 
    "echo '#include <errno.h>' | gcc -E -") 
    // die("Can't exec gcc: $!");
my @includes;
while(<$child>) {
    if (/^# \d* "(\/[^"]*errno[^"]*)" 1/) {
        # For example:
        # 1 "/usr/include/errno.h" 1 3 4
        # 1 after the filename means "new file"
        push(@includes, $1);
    } 
}
close($child);
my @errno;
my %no = ();
my ($fh, $line);

# As we grind over the error definitions, let's 
# keep track of the maximum field length;
# this is for the purpose of constructing 
# formats later
my ($maxN, $maxC, $maxD) = (0, 0, 0);
my ($const, $val, $desc);

for my $inc (@includes) {
    open($fh, "<", $inc) or 
        die("Unable to read $inc: $!");
    while ($line = <$fh>) {
        if ($line =~ /^#define\s*([A-Z]*)\s*(\d*)\s*\/\*\s*([^\*]*)\*/) {
            # For example:
            # define EPERM 1 /* ... */
            ($const, $val, $desc) = 
                ($1, $2 + 0, $3);
            # Lose whitespace from the end 
            # of the description field
            $desc =~ s/\s*$//;
            $no{$const} = $val;
            push(@errno, { 
                const => $const, 
                errno => $val, 
                desc => $desc 
            });
        }
        elsif ($line =~ /^#define\s*([A-Z]*)\s*([A-Z]*)\s*\/\*\s*([^\*]*)\*/) {
            # For example:
            #define EWOULDBLOCK EAGAIN  /* ... */
            ($const, $val, $desc) = ($1, $2, $3);
            if (exists($no{$val})) {
                # Resolve the errno
                $val = $no{$val};
                $desc =~ s/\s*$//;
                push(@errno, { 
                    const => $const, 
                    errno => $val, 
                    desc => $desc 
                });
            }
        }
        else {
            next;
        }
        if (length($val) > $maxN) {
            $maxN = length($val);
        }
        if (length($const) > $maxC) {
            $maxC = length($const);
        }
        if (length($desc) > $maxD) {
            $maxD = length($desc);
        }
    }
    close($fh);
}
@errno = sort { 
    $a->{errno} <=> $b->{errno} 
} @errno;

Apologies for those regular expressions that pick out the #define statements. The principal cause of line noise is that we’re looking for C comments, /* ... */, which means four backslash escapes.

Creating the format definitions

Although we know our field widths, we face a problem: the picture lines in a format declaration require you to know the field widths ahead of time. Fortunately, we can get round this by building up the format declaration programmatically and then eval‘ing it:

# Build up formats; # field is right aligned
my $fmt = '@' . ('>' x ($maxN - 1)) . 
    "  @" . ('<' x ($maxC - 1)) . 
    "  @" . ('<' x ($maxD - 1)) .
    "\n\$val, \$const, \$desc\n.";
    
my $fmtHdr = '@' . ('>' x ($maxN - 1)) . 
    "  @" . ('<' x ($maxC - 1)) . 
    "  @" . ('<' x ($maxD - 1)) .
    "\n'#', '#define', 'Description'\n" . 
    ('-' x ($maxN + $maxC + $maxD + 4)) . 
    "\n.";

eval("format STDOUT =\n$fmt\n");
eval("format STDOUT_TOP =\n$fmtHdr\n");

What we’re eval‘ing looks something like this:

format STDOUT =
@>>  @<<<<<<<<<<<<<<  @<<<<<<<<<<<<<<<<<<<<<<<
$val, $const, $desc
.

The header format that we build up is similar with the addition of an underline.

Producing output

Perl paginates report output adding a page break and a header after every sixty lines of output unless told otherwise. For our purposes, this is not desirable behaviour. We want a header but not paginated output. We can get round this by changing the variable that controls the number of lines per page, $=:

$= = 2 + scalar(@errno);

We need to add 2 because the header is two lines. Now it’s just a matter of looping over our error information:

for my $errInf (@errno) {
    $val = $errInf->{errno};
    $const = $errInf->{const};
    $desc = $errInf->{desc};
    write();
}

Let’s try it out:

$ ./errno2fmt
  #  #define    Description
-------------------------------------------
  1  EPERM      Operation not permitted
  2  ENOENT     No such file or directory
...
106  ELAST      Must be equal largest errno

The output will be different depending on your operating system. You can download the complete script here.

Finding out more

This post has barely scratched the surface of Perl’s reporting capabilities. For the full lowdown, check out the perldoc.

Naive Implementations

Credit: The Guardian
Credit: The Guardian

With ES6, the JavaScript math library gained a level in badass with the addition of seventeen new functions. While I was polishing my UsefulJS library for publication I decided I’d spend a couple of fun evenings implementing shims for them since, if I want to calculate the hypotenuse, I do not want to have to do this:

var x = 3, y = 4,
    r = ("hypot" in Math) ? Math.hypot(x, y) :
        Math.sqrt(x*x + y*y);

The C++ documentation for hypot has an interesting statement in the notes:

POSIX specifies that underflow may only occur when both arguments are subnormal and the correct result is also subnormal (this forbids naive implementations).

By naive implementations, they mean something like this:

(function() {
    if (!("hypot" in Math)) {
        Math.hypot = function(x, y) {
            return Math.sqrt(x*x + y*y);
        };
    }
})();

As it happens, this wouldn’t be up to standard anyway since Math.hypot takes any number of arguments but that’s a detail irrelevant to this discussion. Since this is the formula for Pythagoras’s theorem, what’s naive about it? Tell me, computer, what is the hypotenuse when the two shorter sides are 10155 and 1?

That’s why it’s naive: even though the correct result (which is 10155) can be represented, there’s an overflow in the intermediate calculations that causes the wrong value to be returned. Moler and Morrison published a non-naive algorithm for calculating the hypotenuse in 1983. Not only does it avoid underflow and overflow in intermediate products, it also avoids taking a square root (which was presumably an attractive quality back then). Let’s give it a whirl:

var mm = function(x, y) {
    // Use abs values
    if (x < 0) {
        x = -x;
    }
    if (y < 0) {
        y = -y;
    }
    var tmp, r, s;
    // Ensure that x >= y
    if (x < y) {
        tmp = x;
        x = y;
        y = tmp;
    }
    do {
        tmp = y / x;
        r = tmp * tmp;
        s = r / (4 + r);
        x = x + 2 * s * x;
        y *= s;
    }
    // While y is significant
    while (1 + y > 1);
    return x;
};

Tell me, computer, what, according to the Moler-Morrison algorithm, is the hypotenuse when the two shorter sides are 3 and 4?

Damn and blast! Everyone knows that’s 5! Can we do better? Let’s rearrange the formula a little starting from the familiar form:
r = \sqrt{x^2+y^2}
The sum inside the square root can be expressed as:
x^2(1 + (y/x)^2)
Why should we care? Because we can take x2 outside the square root to give:
r = |x|\sqrt{1 + (y/x)^2}
As long as we ensure that x >= y, (y/x)2 will not overflow. It might underflow, but then the result is simply |x| which to all intents and purposes is correct. Let’s give it a try:

var hypot = function(x, y) {
    var tmp, r;
    if (x < 0) {
        x = -x;
    }
    if (y < 0) {
        y = -y;
    }
    if (x < y) {
        tmp = x;
        x = y;
        y = tmp;
    }
    if (!x) {
        // hypot(0, 0)
        return 0;
    }
    r = y / x;
    return x * Math.sqrt(1 + r * r);
};

The only gotcha here is to avoid the division 0/0 which would result in NaN, but otherwise it’s straightforward. Now tell me, computer, what is the hypotenuse when the two shorter sides are:

  • 3 and 4
  • 0 and 0
  • 10155 and 10155

Works like a charm! You may well ask whether there's any point in being able to calculate the hypotenuse of extremely large numbers. I can't see myself ever needing to do it yet I went to the trouble of making it possible. The reason is simple: pride. I do not want someone looking at my code and thinking "n00b!".

Several of the new Math functions, such as expm1 and log1p, can give the wrong result if they're not carefully implemented. Tell me, computer, does your Math library contain naive implementations?

If you're seeing "Dunno", your browser doesn't support ES6 Math functions. "Looks like it" means that the area hyperbolic sine (asinh) of 10-16 was given as 0. To machine precision, this should be 10-16. In fairness to the engineers, the deficiency probably lies in the underlying C libraries. "Not as far as I can tell" means that this particular Math function at least has been carefully implemented. The difference between 0 and 10-16 may seem trivial but it's a 100% loss of accuracy, an example of what is known as catastrophic cancellation.

As to how naive or not my own math code is, see for yourself! (Math code starts around line 1250).

I Know Where You Live

Credit: Hope Abrams
Credit: Hope Abrams

With a reasonable chance of success, I can obtain your location to a fair degree of accuracy. Let’s give it a go.

Obtaining your location...

Hopefully, you’ll see a table of scarily-accurate information rather than an error message. If the info is miles out (or even thousands of miles out) then you’re probably reading this at work.

IP addresses are not given out willy-nilly. They are assigned hierarchically: you are assigned an address by your ISP who buy blocks of addresses from a national registrar who has been given larger blocks of addresses by the big regional registrars (for example, ARIN in North America and RIPE NCC in Europe and most of Asia). These assignments are kept in a distributed database called the whois database. If you have a Mac OS X or Linux system, you can query the database yourself:

$ whois <IPADDRESS>

This gives you most of the information you see above. Other information may be obtained by various forms of data mining; for example, when you get a weather forecast for your home town, some organization now has a fair idea of which town is your home town.

Question: is the ability to locate you by your IP address a good thing, a bad thing or just a thing? I’d say the latter. It’s a necessary side-effect of how the Internet works. Data has to be routed from one host on the Internet to another. The problem of how to do this is only tractable because of the hierarchical block assignment of IP addresses. Connecting to a local exchange ties you down further to a fairly narrow radius. On the minus side, disreputable sites may make false promises of women in your area. On the plus side, an attempt to buy goods using your Amazon account from an IP address located in Russia will throw up a red flag.

Technical Details

I’m using a geolocation API provided by ip-api.com (yes, this is an endorsement – their service is excellent and their fair use terms are very generous). They provide a JSONP interface into their database. If you’re not familiar with JSONP, it’s a way to access third party webservices, taking advantage of the fact that a JSON-encoded object is valid JavaScript and that src attributes in script tags, unlike AJAX requests, are not subject to a same-origin policy. Typically, a JSONP URL will take a callback parameter in its query string and it’s output will be the encoded JSON object passed as a parameter to your callback function. Specifically, my code looks this:

<script>
window.geo = function(obj) {
    ...
};
</script>
<script src="http://ip-api.com/json/?callback=window.geo" defer></script>

Callback function details are as follows:

window.geo = function(obj) {
    var gEle = document.getElementById("geoinfo");
    gEle.innerHTML = "";
    // Map obj properties to human readable values
    var pM = ["IP address", "query",
        "Hostname", "reverse",
        "Country", "country", "Region", "regionName",
        "City", "city", "Postcode", "zip",
        "Latitude", "lat", "Longitude", "lon",
        "Organization", "org", "Error", "msg"],
        i, tbl = document.createElement("table"),
        tr, rowIdx = 0, td, cellIdx, n, prop;
    for (i = 0; i < pM.length; i += 2) {
        n = pM[i];
        prop = pM[i + 1];
        if (!obj[prop]) {
            continue;
        }
        tr = tbl.insertRow(rowIdx++);
        cellIdx = 0;
        td = tr.insertCell(cellIdx++);
        td.appendChild(document.createTextNode(n));
        td = tr.insertCell(cellIdx++);
        td.appendChild(
            document.createTextNode(obj[prop]));
    }
    gEle.appendChild(tbl);
    if ("success" === obj.status) {
        var mapCtr = document.getElementById("map");
        mapCtr.style.height = "250px";
        var ll = {lat: obj.lat, lng: obj.lon},
        map = new google.maps.Map(mapCtr, {
            center: ll,
            zoom: 13
        }),
        marker = new google.maps.Marker({
            position: ll,
            map: map
        });
    }
};

The latitude and longitude are fed into Google Maps (whose API is a whole lot easier to use than it is to sign up for) to visualize the data.

Obtaining location information from the browser

Depending on whether your browser supports the geolocation API, whether you click the “Get Location” button, whether you give permission for your location to be used and where you are, you should see accurate information below. This information is obtained from a variety of sources. If your device has a GPS chip, that will be used. Otherwise, something like Google’s map of wi-fi networks may be used to obtain your location by triangulation. If you’re not using wi-fi, then the accuracy will be less impressive.

The API is very easy to use. You basically define a success callback and an error callback and then pass them to navigator.geolocation.getCurrentPosition, like this:

gEle = document.getElementById("geoinfo2");
var pre = document.createElement("pre");
gEle.appendChild(pre);
if (!("geolocation" in navigator)) {
    pre.innerHTML = "Geolocation is unavailable";
    return;
}

var showGeo = function() {
    var locSuccess = function(pos) {
        var lat = pos.coords.latitude,
            lon = pos.coords.longitude,
            tbl, tr, td, rowIdx, cellIdx, mapCtr;
        gEle.innerHTML = "";
        tbl = document.createElement("table");
        rowIdx = 0;
        tr = tbl.insertRow(rowIdx++);
        cellIdx = 0;
        td = tr.insertCell(cellIdx++);
        td.appendChild(
            document.createTextNode("Latitude"));
        td = tr.insertCell(cellIdx++);
        td.appendChild(document.createTextNode(lat));
        tr = tbl.insertRow(rowIdx++);
        cellIdx = 0;
        td = tr.insertCell(cellIdx++);
        td.appendChild(
            document.createTextNode("Longitude"));
        td = tr.insertCell(cellIdx++);
        td.appendChild(document.createTextNode(lon));
        gEle.appendChild(tbl);
        mapCtr = document.getElementById("map2");
        mapCtr.style.height = "250px";
        showLocation(lat, lon, mapCtr);
    },
    locError = function(e) {
        pre.innerHTML = "";
        pre.appendChild(
            document.createTextNode(
            "Unable to obtain location information: " +
            e.message));
    };
    navigator.geolocation.getCurrentPosition(
        locSuccess, locError);
};
var btn = document.createElement("button");
btn.appendChild(document.createTextNode("Get Location"));
btn.onclick = showGeo;
pre.appendChild(btn);

The geolocation API can also give you periodic updates. See the
MDN documentation
for further information.



Hexadecimal English

Credit: Ninfaj on Flickr
Credit: Ninfaj on Flickr

3735928559 is a familiar number among programmers. “Really?”, you may be asking. Perhaps you’re more familiar with its hexadecimal representation, 0xDEADBEEF. It’s the sort of number you use to initialize a block of memory so that you recognize it in a debugger or somesuch. This made me wonder what other English words can be formed from hexadecimal digits:

$ grep '^[a-f]*$' /usr/share/dict/words

Meh. I quite like effaced and defaced but there’s nothing with eight characters (8 hex digits => 32-bit int). Let’s extend it a bit. We’ll allow the following numeric substitutions:

1 → i
5 → s
0 → o

i, s and o are not valid hex digits so we need to convert them to numbers. Also, everything will look more gratifying in uppercase. We can do this with standard command-line tools, no custom scripting required:

$ grep '^[a-fiso]*$' /usr/share/dict/words | sed 's/s/5/g;s/i/1/g;s/o/0/g' | tr 'a-z' 'A-Z'

That’s better. From now on, when I allocate a block of memory, I’ll initialize it to 0xBADA55ED and before freeing it, I’ll overwrite it with 0xDECEA5ED. My code has just become cooler.

Rounding Numbers: How Hard Can It Be?

Rules for rounding half up
Credit: tes.com

How hard? Hard enough, apparently, that your computer gets it badly wrong. Tell me, computer, what is 1.015 rounded to two decimal places?

If you see “1.01”, you are using a standards-compliant browser. An eleven year old child knows that “1.01” is wrong and when a kid can see that something’s wrong, it’s just not good enough! So how hard can it be when you put your mind to it? Let’s see.

Algorithms

Before implementing a precise rounding algorithm, we need to figure out what we mean by rounding. For most cases, it’s simple: you just round towards the nearest digit. The tricky case is what to do with those pesky halves. Turns out there’s more than one way to handle them.

Round half towards positive infinity

This is the method we learn at school: 1.5 rounds up to 2. But what do we do about negative numbers? If we apply the round-half-up rule consistently we have to round them up too so that -1.5 rounds up to -1.

Round half away from zero

This is what we get if we apply the round-half-up rule symmetrically: positive numbers are rounded up to become more positive, while negative numbers are rounded down and become more negative.

Round half to even

The problem with the round-half-up algorithms is that they introduce a bias. What this means is that if you round a set of random numbers and take the average, it will not be the same as the average of the originals. This matters a lot in some applications: if you’re a major bank those half pennies mount up! The solution to this bias is to round down as often as you round up. In the case of round half towards even (aka banker’s rounding), halves are rounded to the nearest even digit so that 1.5 and 2.5 both round to 2 while 0.5 and -0.5 both round to 0.

Other algorithms

There are other algorithms which are (almost) never used. Round-half-down or round half to zero simply makes your results look consistently wrong. Round half to odd has the disadvantage that it doesn’t round to 0. Alternate rounding requires that we keep track of the last direction while stochastic rounding (randomly up or down) requires a reliable source of randomness which as any fule kno is hard!

Internet Explorer

If you see “1.02” above, you are probably using Internet Explorer. Presumably, some Microsoft engineer looked at the output from the specification and thought “that’s not good enough” as well.

Implementation

When reading the code snippets in this section, assume the following module wrapper:

(function(window) {
    "use strict";
    ...
})(this);

Which algorithm to use? We’ll give the programmer a choice:

var _ralgs = {
    HALF_UP : 0,
    AWAY_FROM_0 : 1,
    TO_EVEN : 2
};

We can note that a naive implementation – scale up, round, scale back down – isn’t going to cut the mustard. The reason for this is the finite precision of your computer hardware. Tell me, computer, what do you get when you multiply 1.015 by 100?

We can also note that rounding to n decimal places is not an arithmetic operation, it is a presentation operation. Therefore, our rounding algorithm is going to work on the string representation of the number, a bit like humans do. Like all Objects, numbers in JavaScript have a toString operation. The only question is, is it fit for purpose? Tell me, computer, what is the stringified form of 0.0000001?

Tarnation! Admittedly, I don’t have to work with very small numbers very often, but an algorithm that didn’t work in the general case would bother the heck out of me! Looks as though we’re going to have to start by implementing toString ourselves and before we can do that we have to write a function that will zero-pad a string. This really should be a standard method on String, but it’s not hard to do:

// Padding function
var _pad = function(s, padTo, padWith, right) {
    s = s + '';
    var pad = "",
        toPad = Math.max(padTo, s.length) - s.length;
    while (toPad) {
        if (toPad & 1) {
            pad += padWith;
        }
        toPad >>>= 1;
        if (toPad) {
            padWith += padWith;
        }
    }
    return right ? s + pad : pad + s;
};

Don’t be too alarmed reading that while loop. String concatenation is relatively expensive (because strings are immutable) so adding one character at a time is sub-optimal. Instead of doing n concatenations we do, at most, log2 n. Now we can re-implement toString:

// Stringifies numbers, paying attention to very large and very small quantities
var _toString = function(n) {
    var parts = n.toExponential().split("e"),
        mag = parseInt(parts[1], 10),
        _n = parts[0];
    parts = _n.split("."),
    var iPart = parts[0],
        dPart = parts[1] || "",
        sig, ret;
    if (n < 0) {
        iPart = iPart.substring(1, iPart.length);
        sig = '-';
    }
    if (mag > 0) {
        ret = iPart + dPart;
        if (ret.length < mag + 1) {
            // Pad to the right with zeroes
            ret = _pad(ret, mag + 1, '0', true);
        }
        else {
            // Need to insert the decimal place at the
            // right point
            iPart = ret.substring(0, 1 + mag);
            dPart = ret.substring(1 + mag, ret.length);
            ret = iPart;
            if (dPart) {
                ret += '.' + dPart;
            }
        }
    }
    else if (mag < 0) {
        mag = -mag;
        // 0.0 followed by mag-1 x '0' followed
        // by iPart + dPart
        ret = _pad("0.", mag + 1, '0', true) +
            iPart + dPart;
    }
    else {
        return _n;
    }
    if (sig) {
        ret = sig + ret;
    }
    return ret;
};

toExponential gives us a consistent string representation of the number which we can then split to obtain the signficand and exponent. Then it’s just a matter of zero-padding the significand accordingly and sticking the decimal point in the right place. Taking a minus sign into account when padding is bothersome so we remove it and restore it afterwards.

The next bit we have to implement is the function where we “increment” a string, propagating any carry left:

// The numeric value of the '0' character
var _zero = '0'.charCodeAt(0),
// For suppressing -0
_likeZero = /^0(?:\.0*)?$/,
// Numerically increments a string
_incStr = function(s, pos) {
    if (pos >= 0) {
        var p1 = s.substring(0, pos),
            p2 = s.substring(pos, pos + 1);
        if ('.' === p2.charAt(0)) {
            // Skip past '.'
            return _incStr(p1, pos - 1) + '.';
        }
        var d = s.charCodeAt(pos) - _zero + 1;
        if (d >= 10) {
            d -= 10;
            // Carry 1
            p1 = _incStr(p1, pos - 1);
        }
        p2 = String.fromCharCode(d + _zero);
        return p1 + p2;
    }
    // We've incremented off the left edge of the string
    // so do the final carry
    return '1' + s;
};

That’s the boilerplate done. Now we can implement our rounding functions. All take a number and the number of desired fractional digits as parameters. First up, half-up:

// Implementation of round to plus infinity (exact)
var _round2PlusInfinity = function(n, fd) {
   var s = _toString(Math.abs(n)),
       pos = s.indexOf('.');
   // _toString gives x or x.yyy not .yyy
   if (pos >= 1) {
       var nd = s.length, rpos = pos + fd + 1;
       if (rpos < nd) {
           var d = s.charCodeAt(rpos) - _zero,
               up = d >= 5;
           if (d === 5 && n < 0) {
               up = false;
           }
           if (up) {
               // Round up
               s = _incStr(s.substring(0, rpos),
                   rpos - 1);
           }
           else {
               // Simply truncate
               s = s.substring(0, rpos);
           }
       }
   }
   if (n < 0 && !_likeZero.test(s)) {
       s = '-' + s;
   }
   return s;
};

Here we see what that mystery _likeZero regex was for. For negative numbers, we get rid of a bothersome minus sign by working with the abs value. If a value rounds to 0 and we stick the minus sign back on the front, we’ll end up with “-0” which just looks weird. Round half towards even is similar with a little bit of added complexity in deciding whether we round up or down:

// Implementation of round half to even (exact)
var _roundTowardsEven = function(n, fd) {
    var neg = false;
    if (n < 0) {
        // Round half to even is symmetric
        neg = true;
        n = -n;
    }
    var s = _toString(Math.abs(n));
    var pos = s.indexOf('.');
    if (pos >= 1) {
        var nd = s.length, rpos = pos + fd + 1;
        if (rpos < nd) {
            var d = s.charCodeAt(rpos) - _zero, 
                up = d > 5;
            if (d === 5) {
                // Check the previous digit
                var c, _rp = rpos, _d;
                do {
                    c = s.charAt(--_rp);
                }
                while (c === '.');
                _d = s.charCodeAt(_rp) - _zero + 1;
                // Up if the previous digit will be even
                up = (0 === _d % 2);
            }
            if (up) {
                // Round up
                s = _incStr(s.substring(0, rpos),
                    rpos - 1);
            }
            else {
                // Simply truncate
                s = s.substring(0, rpos);
            }
        }
    }
    if (neg && !_likeZero.test(s)) {
        s = '-' + s;
    }
    return s;
};

Implementation of the final algorithm is a doddle:

// Implementation of round half away from zero
var _roundAwayFromZero = function(n, fd) {
    if (n >= 0) {
        return _round2PlusInfinity(n, fd);
    }
    var ret = _round2PlusInfinity(-n, fd);
    if (!_likeZero.test(ret)) {
        ret = '-' + ret;
    }
    return ret;
};

Now we can join them together:

// Rounding function with algorithm specified
var _round = function(n, fd, alg) {
    fd >>>= 0;
    switch (alg) {
        case _ralgs.HALF_UP:
            return _round2PlusInfinity(n, fd);

        case _ralgs.AWAY_FROM_0:
            return _roundAwayFromZero(n, fd);

        default:
            return _roundTowardsEven(n, fd);
    }
};

There’s one last bit of boilerplate before implementing our public API. One thing I dislike about toFixed is that you get trailing zeroes in your output whether you want them or not. You asked for six decimal places in your output? Here you go! Our API will, by default, suppress useless zeroes but the caller can specify a minimum number as an optional argument. We need a function to make that possible:

// Applies minimum / maximum fractional digits
var _trimFraction = function(s, minFd, maxFd) {
    var splitPoint = s.indexOf('.'), i = s, j, f = "";
    if (splitPoint === -1 && minFd === 0) {
        return s;
    }
    if (null == minFd) {
        minFd = 0;
    }
    if (null == maxFd) {
        maxFd = 2;
    }
    if (splitPoint >= 0) {
        f = s.substring(splitPoint + 1, s.length);
        i = s.substring(0, splitPoint);
    }
    if (minFd) {
        // Add trailing zeroes
        for (j = f.length; j < minFd; j++) {
            f += '0';
        }
    }
    else {
        // Trim excess seroes
        f = f.substring(
            minFd, maxFd).replace(/0+$/, "");
    }
    if (f) {
        return i + '.' + f;
    }
    return i;
};

Our public API consists of some properties and a method added to Number.prototype:

// Algorithm specification
window.Rounding = {
    HALF_TO_PLUS_INFINITY : _ralgs.HALF_UP,
    HALF_AWAY_FROM_ZERO : _ralgs.AWAY_FROM_0,
    HALF_TO_EVEN : _ralgs.TO_EVEN,
    // Default rounding method is
    // ROUND_HALF_TO_EVEN
    algorithm : _ralgs.TO_EVEN
};

// Add a roundTo method to Number
var rt = function(maxFd, minFd) {
    if (!isFinite(this)) {
        return this + '';
    }
    // Require numbers >= 0
    maxFd = Number(maxFd);
    minFd = Number(minFd);
    if (!(isFinite(maxFd) && maxFd >= 0)) {
        maxFd = 0;
    }
    if (!(isFinite(minFd) && minFd >= 0)) {
        minFd = 0;
    }
    if (minFd > maxFd) {
        minFd = maxFd;
    }

    return _trimFraction(
        _round(this, maxFd, window.Rounding.algorithm), minFd, maxFd);
}, mName = "roundTo", obj = window.Number.prototype;
if (!(mName in obj)) {
    try {
        Object.defineProperty(obj, mName, {
             value : rt,
             enumerable : false
        });
    }
    catch(e) {
        // Do it the old fashioned way
        obj[mName] = rt;
    }
}

Because of the way we’ve implemented our rounding function, we don’t have to worry that someone might set Rounding.algorithm to “Dave”; this means that we don’t have to use a getter / setter for the property. Some Internet experts expend a lot of hot air on the evils of extending built-in types, but Number is absolutely the right place to add our method since we can now use it in place of toFixed and the ability to extend Objects is a central feature of JavaScript anyway. Do note, however, how we go about adding our method – this is the nice way to do it; stomping over someone else’s roundTo method is not polite!

Let’s try it out. Tell me, computer, what is the result of rounding 1.015 to two decimal places?

Hurrah! Remember that it’s generic enough to handle very small numbers? Tell me, computer, what is 1.25×10-26 to 27 decimal places?

Note, this is not wrong – it is the round half to even algorithm. If we set the algorithm to round half to plus infinity and ask again, we get a slightly different answer:

If you don’t want to use a different rounding function, there’s no reason why we can’t simply fix toFixed! Here’s a patch:

(function() {
    if ((0.1).toFixed(20) ===
        "0.10000000000000000000") {
        // Not broken
        return;
    }
    var m = function(fd) {
        return this.roundTo(fd, fd);
    }, obj = Number.prototype, mName = "toFixed";
    try {
        Object.defineProperty(obj, mName, {
            enumerable : false,
            value : m
        });
    }
    catch(e) {
        obj[mName] = m;
    }
})();

And the answer to the original question of how hard can it be? The answer is: harder than you think!

You can download the complete library or, if you prefer, a a minified version.