Thursday, August 10, 2017

JVM - JIT details and its optimisation techniques

Where Just-in-time compilation comes into picture ?
->Here is simple diagram that I have prepared as per my understanding:


JVM implementation is doen using C. JVM development rules are left to developers (developers of Oracle JVM)
Just-in-time (JIT) compiler, optimizes & compiles a piece of bytecode into machine code, and that part is subsequently executed.
The pieces of code to be compiled using JIT are decided based on the statistic(profiling information) of run-time received from interpreter. These are called HotSpots at run-time.

JIT process is started as thread in background. It generates code tailored for the target architecture

There are two types of compilers in JIT:
C1 i.e. client compiler: Low processing time but less optimized solution -> Applied for small pieces of code
C2 i.e.server compiler: Higher processing time but more optimized solution - > Applied for big pieces of code 

The VM uses the interpreter to execute the code (without optimizations) immediately after starting and profiles it, to decide what optimizations to apply. It keeps track of the number of invocations for each method, and if it exceeds the threshold of the c1 compiler, it makes the method eligible for optimizations by the c1 compiler, so it’s queues the method for compilation. Similarly for the c2, if the number of invocations reaches a certain threshold, eventually it will be compiled by it.

Optimisations like following are done:
1. Inlining methods
2. Synchronisation lock coarsening
3. Dead Code elimination


Replacing variables on stack i.e. On stack Replacement (OSR), which is difficult task, done by JIT in final optimisation steps.

Conclusion:
My conclusion is not to waste time on local optimizations. In most cases, we can’t beat the JIT. A sensible (yet counter-intuitive) optimization is to split long methods into several methods that can be optimized individually by the JIT.

You’d rather focus on global optimizations. In most cases, they have much more impact.

Python code interpretation and execution stages


Python code gets compiled in following fashion.
If byte code is generated using Cpython then it is executed using Python Virtual machin (PVM)
If byte code is generated using Jpython then it is executed using Java Virtual machin (JVM)

Tuesday, August 8, 2017

ARM architecture concept of Bit-Banding to solve problem of Race-condition in case of register value update

Two 1MB 'bit-band' regions, one in the peripheral memory area and one in the SRAM memory areas are each mapped to a 32MB virtual 'alias' region. Each bit in the bit-band region is mapped to a 32bit word in the alias region.
The first bit in the 'bit-band' peripheral memory is mapped to the first word in the alias region, the second bit to the second word etc.
Code example of using Bit-banding:

// Define base address of bit-band
#define BITBAND_SRAM_BASE 0x20000000
// Define base address of alias band
#define ALIAS_SRAM_BASE 0x22000000
// Convert SRAM address to alias region
#define BITBAND_SRAM(a,b) ((ALIAS_SRAM_BASE + (a-BITBAND_SRAM_BASE)*32 \    + (b*4)))
 
// Define base address of peripheral bit-band
#define BITBAND_PERI_BASE 0x40000000
// Define base address of peripheral alias band
#define ALIAS_PERI_BASE 0x42000000
// Convert PERI address to alias region
#define BITBAND_PERI(a,b) ((ALIAS_PERI_BASE + (a-BITBAND_PERI_BASE)*32 \    + (b*4)))
 
//Define some memory address
#define MAILBOX 0x20004000
//Define a hardware register
#define TIMER 0x40004000
 
// Mailbox bit 0
#define MBX_B0 *((volatile unsigned int *)(BITBAND_SRAM(MAILBOX,0))) 
// Mailbox bit 7
#define MBX_B7 *((volatile unsigned int *)(BITBAND_SRAM(MAILBOX,7))) 
// Timer bit 0
#define TIMER_B0 *((volatile unsigned char *)(BITBAND_PERI(TIMER,0))) 
// Timer bit 7
#define TIMER_B7 *((volatile unsigned char *)(BITBAND_PERI(TIMER,7)))
 
 
int main(void){    
    unsigned int temp = 0;
    MBX_B0 = 1// Word write    
    temp = MBX_B7; // Word read    
    TIMER_B0 = temp; // Byte write    
    return TIMER_B7; // Byte read
}

This is not the only solution to this problem.All common architectures have implemented mechanisms for atomically setting and clearing bits. ARM’s approach is elegant in that it can be exercised with ANSI C, while most others implementations require special C extensions or the use of assembly language.

PROFILE

My photo
India
Design Engineer ( IFM Engineering Private Limited )

Followers