.NET Assembly – Random Thoughts

If you browse the internet, you will find the definition of Assembly as: Either an EXE (that you can execute and that has an entry point), or a DLL (you can reference it, and does not have entry point).

If you go deeper and want to know exactly what an assembly is, then you have to know the concept of a Module.

When you create an assembly, you also create a Module. An assembly by default contains one module, but you can combine multiple modules inside the same assembly.

Because multiple files (modules) can be identified as one Assembly, we can say that an assembly is the logical grouping of files.

Assemblies also are the boundary for the Internal Access Modifier. So a Class marked as internal, can be used only within the assembly.

So Assembly file contains one or more modules in addition to a special file named Manifest.

Assembly7

Modules

So a module is kind of the core of the assembly. If you open an assembly file and reverse it back to its intermediate language, you will see the definition of the module inside it.

Create a simple program called program.cs, then open the (Developer Command Prompt for VS2013), and browse for the project file directory, and then use the C# compiler to compile the program.cs code using (csc program.cs):

Assembly1

Note: You can find the developer command prompt for VS 2013 under “C:\Program Files (x86)\Microsoft Visual Studio 12.0\Common7\Tools\Shortcuts“)

Now, use the ILDASM to inspect the intermediate language IL for the program.exe file , by typing :

ildasm /out:myintermediate.txt  Program.exe

You can see inside the IL of the Program.EXE, that it contains a Module that is auto created for you.

Assembly2

You can also create a module file from a C# program like this:

csc /target:module /out:MyModuleFile.netmodule program.cs

Assembly23PNG

The output file will be MuModuleFile.netmodule. You cannot use this file directory or even reference it in your project. So what can we do with it?

Create a multi-file Assembly

Open Visual Studio, and create program.cs that looks like this. Basically, it contains one class (Module1) with static public method named Hello1(). Next export this class as a module named (Module1.netmodule)

Assembly5

Now, rewrite the program to contain class (Module2), with one static method Hello2(). Export this class to a module named (Module2.netmodule)

Now rewrite the program as a console application, and inside the main method, call Module1.Hello1() and Module2.Hello2.

Finally, use (csc /addModule: module1.netmodule,module2.netmodule  program.cs) to create one assembly that contains these files:

  • Module1.netmodule  (contains Hello1() method)
  • Module2.netmodule (contains Hello2() method)
  • Program.EXE

Assembly6

The main thing to remember here is that .net sees those three files as one logical file (One Assembly).

On the other hand, you can play around and create two assemblies from the three files (Module1.netmodule, Module2.netmodule, program.cs). To do that,

 csc /addmodule:module1.netmodule,module2.netmodule /target:library /out:myDLL.dll

You can now reference the myDLL.dll from anywhere in your project.

Now, let us create an output assembly from the program.cs file:

csc /out:myprogram.exe program.cs

So now you have two assemblies, the myDLL.dll and the myprogram.exe file. So instead of grouping the three files to one assembly, now we made two assemblies out of the three files.

Assemblies vs Modules

As u know now, an assembly can contain one or more modules, and one module can be part of multiple assemblies. Thus an assembly is nothing but logical grouping of files, and it is the unit of access boundaries when it comes to access modifier. For example, Classes marked with Internal Access Modifiers, can be accessed within the same assembly.

Assembly14PNG

DLL vs EXE Assembly

If you have a normal C# program, and you compile it once to DLL and once to EXE file, and then you use ILDASM to inspect the intermediate language, you will notice that the IL language is almost the same except for one and only one thing: the existence of a small line (.entrypoint). This is the way the intermediate language knows where to start executing the program.

Assembly8

Aside from this only difference, .net does not distinguish between DLL and EXE assembly files. They are the same files in IL except of the .entryline thing. The file extension (.dll or .exe) only means something to Windows, and not to .net

You can reference EXE files the same way you reference DLL files, but it is a strange way to design your project.

 See Also

By Ammar Hasayen Posted in .NET Tagged

.NET – What is CLR ?

It is so interesting to break down how .NET framework works and uncover the internal components and functionalists. I love breaking things down so i can have better understanding about what I am dealing with.

In a previous blog post, we talked about what an Assembly is. In this blog post, I will be sharing my thoughts about the .NET run time, or CLR.

.NET Common Language Runtime “CLR” is a run-time environment and an execution engine, provided by the .NET framework that provides robust application support with a very small memory footprint and it performs its operations in a very fast way (15,000 managed method calls per second at 27.6MHz).

You can think of CLR as the interface between .NET applications and the operating system. This is way .NET applications are called (Managed Code), because they are managed by the CLR.

clr3

CLR Provides the following services for programming languages targeting it:

  • Compiles Microsoft Intermediate Language (MSIL) into native code.
  • Handles garbage collection.
  • Handles exceptions.
  • Enforces code access security.
  • Handles verification.
  • Debugging services.
  • Verification of Type Safety.

CLR

For example, since the runtime uses exception to report errors, any language that uses CLR will also get errors reported via exception. And since CRL allows thread creation, any language that target the runtime can create threads.

Because the runtime only operates on IL code (Intermediate Language), the runtime is not aware of the programming language you are using. This means that it is up to the developer to pick his choice of programming language without losing anything.

When you write your .NET application using any programming language that targets CLR, your compiler (for example, Visual Studio) will help you check the syntax and analyze your source code. The compILER then will transform the source code to an assembly file. At runtime, the CLR Just In Time (JIT) compiler will transform the assembly written in IL code into machine language (Native Code) that the CPU can understand and execute.

Assembly Generation

Let us see what happens when you run an executable file for example:

  • Windows checks the EXE header to see if the application requires a 32-bit or 64-bit address space.
  • Windows loads the x86, x64, or ARM version of MSCorEE.dll into the process’s address space.
  • The process’s primary thread calls a method defined inside MSCorEE.dll.
  • This method initializes the CLR, loads the EXE assembly, and then calls its entry point method (Main).

One of the biggest benefits of compiling source code to IL is code security and verification. CLR performs a process called “verification” while compiling the code form IL to native CPU language. CLR will check the IL code and ensures that everything is safe by verifying for example that each method is called with the right number and type of parameters, and that each parameter passed to method, is of the correct type.

Another benefit is that you can run multiple managed application in a single O.S process. Why this is a big deal?

Well, usually in Windows, each process has its own virtual space, because if this is not the case, then an application can read and write to an invalid memory address. By verifying managed code,  the code does not improperly access memory and can’t adversely affect another application’s code. This means that you can run multiple managed applications in a single Windows virtual address space. Since having multiple processes will harm the performance of O.S, having multiple managed applications running under the same process is definitely a welcomed thing.

This leads to another question, what is safe and unsafe code? In simple words, safe code do not access memory addresses directly and manipulate bytes, while unsafe code does. Safe code is verifiably safe, like the one managed by CLR. Nevertheless, you can still write unsafe code using C#, but you have to mark all methods that contains unsafe code with “unsafe” keyword, and the C# sharp complier requires that you compile your source code with /unsafe compiler switch.

By Ammar Hasayen Posted in .NET Tagged

.NET – What is an Assembly?

I usually got a need to do some coding either using PowerShell or .NET. For me, both are similar as PowerShell is built on top of .NET framework, and you can even call helper .NET classes from within your PowerShell code.

I guess a good foundation knowledge in .NET is becoming more and more important for any IT professional. I recall that when i started to learn about .NET framework, I got some difficulties understanding some key concepts. For example, we always hear the word “Assembly”, but for me it is not enough to define it as a DLL you reference or .EXE that you execute, because what I really need to know is what an Assembly actually contains and how the magic happens under the table.

So i decided to share my thoughts about couple of .NET framework concepts and I hope it makes sense to most of you.

Define an Assembly

When I think of an assembly and read the definitions from the web, it get confusing to me to really define what is an assembly:

“An assembly is the logical boundary of functionality”

To understand what an assembly is, we have to explain how .NET code get executed:

  • Step 1: You start by writing your code using any of the available .NET languages. For example, you open Visual Studio, you start typing a code in C# and you have a file with .CS extension. This .CS file is called the Source Code file, because it contains the original source code.
  • Step 2 : Using Visual Studio, you compile your code, and the output result is a DLL or EXE file. This output file is called Portable Executable PE or an Assembly file, and it contains code that is written in Microsoft Intermediate Language (MSIL or IL for short)
  • Step 3: you run the code, and the .NET Common Language Runtime (CLR) will inspect the output assembly and convert it to machine language (called Native Code) via a Just In Time (JIT). The Native Code is a very low language that the CPU can understand and execute.

Assembly Generation

So the assembly is the output of your source code compilation, and it is written in MSIL or IL for short. This code is still human readable.  IL code is a CPU-independent machine language created by Microsoft.

It is so important to know that there are two compilation happening. When you write your code in Visual Studio for example, and hit Compile, this is just turning your code into the MSIL (human readable high level language). in the form of DLL or EXE (Assembly). Now later on, when you run that assembly [RUN TIME], the CLR (Just In Time) compilation will inspect your assembly (MSIL code), and will turn it into machine low level language (native code) that a CPU can understand and execute.

Regardless of what programming language you are writing your code, when your code compiles, and turns into an assembly file (DLL or EXE), the intermediate language output is the same.

For me to really appreciate what an assembly is, I started to answer this question “what are the benefits of an assembly?”. For me, answering the “why” would help me understand the “what” part. So let us start talking about some of the Assembly purposes in life:

  • Security Boundary: When you want to make sure a piece of code is signed with a strong name to ensure uniqueness, or to use digital certificate to identify the signer, the smallest unit to do this is the Assembly file.
  • Type Boundary: When you define a type, the type definition cannot span multiple assemblies, while two types with the same name can exist if they are located in different assemblies.
  • Reference Info: Each assembly broadcast its types and resources inside it, and also it gives info about other assemblies it reference. So you can think of Assemblies as unit of functionality from this perspective.
  • Versioning: An assembly is the smallest unit of versioning.
  • Deployment: An assembly represent a unit of deployment
  • Language Boundary: If you want to use multiple programming language to write your project, you have to know that an assembly can only contain code from one language, so you have to break your code to more than one assembly to use another programming language.

Assembly Purpose

Managed Module

There is also another concept to understand here which is the Managed Module. Managed Modules and Assemblies are related like this:

An assembly contains one or managed modules and perhaps another resource files (jpeg, gif, html, etc.)

When the assembly contains more than just one file, a Manifest data is created in the assembly file to describe the set of files in the assembly“.

A managed module is like the internal structure of an assembly. You can have one or more of these modules inside an assembly. Most of the cases, the assembly file will contain just one module. In fact, visual studio does not offer a way to produce an assembly with more than module i guess. If you want to do this, you need to do this using command line.

Why we are interested in learning about managed modules? Well, because sometime, in order to understand how .NET works, you may have to inspect the IL code itself. When you do that, it will be easier to inspect the IL code if you know that a module exists inside an assembly.  Each module contains many data, but for me the most important pieces that make an managed module are the IL code for your types, and the metadata.

Assembly vs Modules

So let us see what the Managed Module contains:

  • PE32 or PE32+ header:
    • Time Stamp
    • CPU Support information
    • Type of file (GUI,CUI or DLL)
  • CLR  header:
    •  Version of CLR required
    • Flags
    • Info about the entry point method (Main Method)
    • Strong name
  • Metadata: contains two table
    • Table for types and members defined inside this assembly.
    • Table for types and members referenced by this assembly.*
  • IL code of your types.

*Each module contains information about the referenced assemblies and their version number. This information is very important as the CLR can know the assembly’s immediate dependencies. This makes the assembly self-describing.

Managed Module

Metadata 

The most interesting part in the assembly file components is the metadata. Every compiler targeting CLR is required to provide metadata information. Metadata provides the following benefits:

  • Microsoft Visual Studio uses metadata. IntelliSence feature parses metadata information to give you what methods, properties, events, and field a type offers.
  • CLR uses metadata to ensure that your code is using “type-safe” operations.
  • Metadata enables object serializing (serializing the object in a memory block, move it across the wire, and recreating it at destination).
  • Metadata allows garbage collector to track lifetime of objects.

Metadata is the most important piece in the .NET story. The whole .NET framework is built around the idea of metadata. The more you know about metadata, the more you get closer in understanding the secrets of .NET framework.

By Ammar Hasayen Posted in .NET Tagged