Compilers and Interpreters

In the last chapter, We came to a conclusion that writing computer programs in machine language or machine code is very tedious, time consuming, error prone and difficult to understand.

Programmers mostly use a more simpler way to write computer programs often referred to as source code using what are called : programming languages. A few popular ones are Java, C#, Python, C++ etc.

These source code are more human friendly and the instructions are written line by line in the language specific syntax using any kind of text editor like notepad for windows or TextEdit for mac based on the programmers personal preferences.

Machine language is called a low-level language

As machine language closely deals with the CPU itself (i.e it does not have almost any level of abstraction from the hardware), It falls in the category of low-level languages. It is also sometimes referred to as 1GL or first-generation language.

Programming languages are called high-level languages

Programming Languages are more human readable and with strong abstraction from the computer hardware. Hence, they fall in the category of high-level languages often referred to as 3GL or third-generation languages.

The source code written in any high-level programming language needs to be converted to low-level programming language or machine code at some point.

Lets take a look at how it is done…

Language Processors

The CPU does not understand the source code written using any high level programming languages. The source code written in any such languages must first be translated into machine code using a Language Processor.

The language processing is typically done by a special software which can be categorized into three types :

Compiler
Assembler
Interpreter

Programming languages are often categorized into compiled languages, assembly languages and interpreted languages based on these language processors.

1. Compiler

A compiler in the most simplest terms, is a program that converts source code- human-readable code into machine readable code (object code) in a single compilation process. This generated object code is still not pure machine language. Hence, It is referred to as IL Code or Intermediate Language code.

Compiled Languages : C, C++, Java, C#, Objective-C, Swift etc.

NOTE Some languages however can directly compile to machine code.

The compiler actually takes the whole file containing the source code and goes through every instructions line by line, processes it and spits out a new file containing intermediate language code. This new file that the compiler generates is often called an executable.

In Summary

Lets say, I wrote a computer program in a programming language called C# using a simple text editor (Notepad) in my personal laptop.

I can then simply compile my source code using a C# language compiler which then generates an executable. The type of executable in case of the C# compiler is generally a .exe file.

I can later on give the executable file to my friend Ted who can then run the executable on his laptop.

NOTE In case of some languages like C#, a different form of compiling needs to be done to create pure machine code. This compiling is done during execution time by the JIT compiler.

2. Assembler

It is a program that converts assembly language code into machine code. Some compilers often perform the task of the assembler and generate machine code. The output of an assembler is called the object code.

Assembly language is a low level language which is sometimes called 2GL or second generation language.

Assembly language is specific to a particular computers architecture and sometimes to an operating system.

Interpreted Languages : JavaScript, Python

3. Interpreter

An Interpreter is a program that converts human-readable code into machine code one line at a time. Unlike the compiler, Interpreters have to do the translation each time you run the program. No object code will be generated most of the time as the translation is generally done, directly to machine readable code.

Due to this line by line translation which happens during execution time, Interpreters are rather slower than compilers.

However, The development process using an interpreter is faster in comparison to a compiler while doing incremental development by dividing the source code into smaller sections, as it provides immediate output which makes running and functionality testing easier.

In Summary

Lets say, I wrote a computer program using an interpreted language called JavaScript using a simple text editor (Notepad) in my personal laptop.

This time, instead of compiling the source code and giving an executable file to my friend Ted, I have to give him the whole source code.

Instead of doing the machine readable code conversion in my laptop, Ted needs to do the conversion himself in his own laptop.

Luckily, He does not need to install an interpreter separately as interpreters usually come bundled inside a web browser or the operating system.

All Ted needs to do now is load the file using a web browser and the language translation happens while the source code is being executed by the browser.

Key differences

A Compiler

often faster execution
platform specific (executable working on pc, may not work on mac or linux)
takes up the whole source code
requires more memory (creates object code)
compilation needs to happen only once (ready to run machine code)

An Interpreter

slower execution
cross--platform (as long as you have an interpreter, operating systems dont really matter)
takes a single line at a time
requires less memory (does not create object code)
translation needs to happen in every execution (An interpreter is required on each computer)

NOTE Some languages have been implemented using both compilers and interpreters, including BASIC, C, Lisp, Pascal, and Python

The (Just In Time) JIT Compiler

Some of the modern programming languages like Java and C# make use of the Just in time compilation (on the fly compilation). JIT compilation happens after you have executed the program.

Unlike the conventional compilation process where the source code is converted to machine code, a few modern programming languages are compiled to an intermediate language code. In case of the language like Java, the IL code is called the byte-code. The conversion to machine code finally happens only when the program is executed.

But what is the use of this dual compiling ?

JIT compilers are highly advanced, performance oriented compilers which have access to dynamic runtime information. They perform specialized tasks like monitoring and optimizing the code for the particular CPU where the program runs.

Some of its features are:

global code optimization
statistical data analysis of the program
CPU targeted compilation, analysis and recompilation

What Now ?

That was basically all you needed to know about the different ways, how the source code written by a programmer is converted to the CPU readable machine code.

Compilers and Interpreters have a lot more going inside them than what is described above. But that is a chapter to open in the future when we will write our own compiler.

As we move along the road to programmer, lets go through a few basic ideas behind every programming languages.

Compilers and Interpreters