posted Jul 09, 2003
in Communication Ethics

Communication Ethics book part for An Unfortunate Dichotomy. (This is an automatically generated summary to avoid having huge posts on this page. Click through to read this post.)

Unfortunately, in computer science's zeal to explain how some of this stuff works to the public, computer scientists have made statements that were convenient at the time about machine language, specifically "Machine language only has meaning to the computer." This is incorrect, but I hope you now understand why this simplification was made, having seen was machine language looks like ("2192837463", or, as we computer folks prefer it, "0x82B40B57" in hexadecimal). It is difficult for a human to follow machine language, we prefer reading higher level languages, but for a competent human, it's only a matter of time to read machine language.

There are tools that can help, too. A disassembler can convert machine code back into assembly code reasonably accurately, which makes it somewhat easier to read. There are ever decompilers that try to convert machine code back into high-level languages, which doesn't always work very well for various technical reasons, but with other tool support, it is possible for a skilled computer user to learn how a program works in a reasonable amount of time, even if they start with just the machine code.

This is not just theory; many programs have been modified by people simply examining the machine language, often to remove copy protection. Game cheating codes like those used by the Game Genie are actually done by making tiny changes to the machine language of the video game. Snake-oil encryption techniques have been reverse engineered and subsequently defeated by examining the machine language alone. Reverse engineering is frequently used to get hardware running in operating systems that the manufacturor has not provided a driver for. Even the author has on a couple of occaisions dived down to the machine language level; in my case, an important executable file was scrambled on disk and I had to re-assemble the pieces in the correct order or I would have lost the file; without the clues from reading the assembly code, I could not have gotten the order right.

Because of incorrectly simplifying the idea of "machine code", people have gotten the idea that machine language is somehow different then those other high level languages. But it's not really that machine language is different, it's that it is a special language (or rather, set of languages) that the machine can execute directly. Despite the fact that we as humans perceive machine code, assembly code, and the various high-level languages very different, it is a difference of degree, not kind. Each is more abstract and human-usable then the last, but there is a smooth progression of languages and possible languages from machine code right up to some of the most abstract languages there are, like Javascript being used in a web page.

There has been work done on creating machines that can execute high-level languages directly, in particular one called LISP. As compilers improved, efforts in this direction ceased, but for those machines, machine language was a high-level, human readable language. We choose to have computers execute this rather obtuse language directly because it makes some things easy, but we are not forced by the nature of computers to do this. We could create a machine with a processor that executed Java directly, without even compiling it as a separate stage, it just wouldn't be as efficient for a number of reasons.

You may not have the tools and you certainly don't have the time, but you could learn absolutely everything about how Microsoft Word works simply by reading its code, because its code is just a list of instructions. Or you can learn how to efficiently sort things by reading code. All computer code, a.k.a. software, from machine code to Javascript and beyond, is communication.


Site Links


All Posts