Tuesday, December 22, 2015

Areas of RTOS

-----------------------------
RTOS concepts
-----------------------------
Why RTOS
Classification of RTOS
- hard real time
 - firm real time
- soft real time

RTOS Architecture


  1. Kernel
    • what does it do?
      • process - ready, running, blocked, suspended (Scheduler)
      • process priority
      • process communication
      • memory management
      • provide interfaces for UserSpace applications
      • interrupt and event handling
      • device io and device driver
      • file system
      • POSIX
    • kernel Types
      • monolithic (Most OS like Ubuntu kernel)
      • microkernel (Most RTOS like QNX)
      • exokernel
  2. Scheduler
    • First in first out
    • Earliest deadline first
    • Pre-emptive (Fixed priority)
    • Round robin
  3. Task
    • states - Ready, running, Blocked, Suspended
    • Task control block (ID, Priority, Status, register, PC, SP)
    • priority of task
    • task implementation - creation, deletion, suspend, resume
    • idle task
  4. Co-routines
    • states - ready, Running, Blocked
  5. Task synchronization
    • Binary semaphore
    • Counting semaphore
    • Mutex (Mutually Exclusion Semaphores)
      • Priority Inversion problem
      • Solution - 
        • Priority Inheritance
        • Priority ceiling
    • Recursive mutex
    • Task notifications
  6. Intertestk communication
    • Messgae queues (structured data exchnage )
      • Mailbox - messgae queue with length 1

    • Pipes (unstructured data exchange - byte stream)
    • Remote procedure calls (RPC) - one task initiating another task on remote computer. XMLRPC is a kind of RPC as used in project
  7. Memory Management
    • Stack management
    • Heap management
  8. Timer Management
    • Ticks counting used for task delay
    • Task alert
  9. Interrupt and Event Handling
    • defining Interrupt handler
    • creation and deletion of ISR
    • Enabling and disabling ISR
    • changing and referencing of interrupt mask
  10. Device I/O Management
    • Provide APIs of using device drivers
  11. Selection of RTOS
    • Technical consideration
      • Scalability
      • Portability
      • Run-time facilities
      • Development tools
    • Commercial Considerations
      • Costs
      • License
      • Supplier stability/ longevity
      • Usually there are no thread concepts in RTOS. Task are the only things.

How and why to use VOLATILE and CONST in embedded C programming ???

---------------------------------
use of volatile
---------------------------------
Its a qualifier.This keyword is especially valuable when you are interacting with hardware peripheral registers and such via memory-mapped I/O.
It is important to use volatile to declare all variables that are shared by asynchronous software entities, which is important in any kind of multithreaded programming.

int volatile g_flag_shared_with_isr;

uint8_t volatile * p_led_reg = (uint8_t *) 0x00080000;

The first example declares a global flag that can be shared between an ISR and some other part of the code (e.g., a background processing loop in main() or an RTOS task) without fear that the compiler will optimize (i.e., “delete”) the code you write to check for asynchronous changes to the flag’s value. It is important to use volatile to declare all variables that are shared by asynchronous software entities, which is important in any kind of multithreaded programming. (Remember, though, that access to global variables shared by tasks or with an ISR must always also be controlled via a mutex or interrupt disable, respectively.)

The second example declares a pointer to a hardware register at a known physical memory address (80000h)–in this case to manipulate the state of one or more LEDs. Because the pointer to the hardware register is declared volatile, the compiler must always perform each individual write. Even if you write C code to turn an LED on followed immediately by code to turn the same LED off, you can trust that the hardware really will receive both instructions. Because of the sequence point restrictions, you are also guaranteed that the LED will be off after both lines of the C code have been executed. The volatile keyword should always be used with creating pointers to memory-mapped I/O such as this.

---------------------------------
use of const
---------------------------------
uint16_t const max_temp_in_c = 1000;

In C, this variable will exist in memory at run-time, but will typically be located, by the linker, in a non-volatile memory area such as ROM or flash.
Any reference to the const variable will read from that location.

Another use of const is to mark a hardware register as read-only. For example:

uint8_t const * p_latch_reg = 0x10000000;

Declaring the pointer this way, any attempt to write to that physical memory address via the pointer (e.g., *p_latch_reg = 0xFF;) should result in a compile-time error.

---------------------------------
use of volatile and const
---------------------------------

1. Constant Addresses of (Read or Write) Hardware Registers

uint8_t volatile * const p_led_reg = (uint8_t *) 0x00080000;

p_led_reg IS A constant pointer TO A volatile 8-bit unsigned integer. This makes sure that p_led_reg contains address that will never change. Gives us surity that it will always point to same address every time. Typo mistake will not cause any undesired change to value at some different address.



2. Read-Only Shared-Memory Buffer

Another use for a combination of const and volatile is where you have two processors communicating via a shared memory area and you are coding the side of this communications that will only be reading from a shared memory buffer. In this case you could declare variables such as:

int const volatile comm_flag;

uint8_t const volatile comm_buffer[BUFFER_SIZE];


3. (Read-Only) Hardware Register

Sometimes you will run across a read-only hardware register. In addition to enforcing compile-time checking so that the software doesn’t try to overwrite the memory location, you also need to be sure that each and every requested read actually occurs. By declaring your variable IS A (constant) pointer TO A constant and volatile memory location you request all of the appropriate protections, as in:

uint8_t const volatile * const p_latch_reg = (uint8_t *) 0x10000000;

As you can see, the declarations of variables that involve both the volatile and const decorators can quickly become complicated to read. But the technique of combining C’s volatile and const keywords can be useful and even important. This is definitely something you should learn to master to be a master embedded software engineer.

Parallel programming for CPU and GPU

------------------------------
CPU Parallel programming in C
------------------------------
POSIX - pthread in linux eg. $gcc -o go go.c -lpthread
OpenMP <omp.h> eg. $gcc -fopenmp -o go go.c
MPI - message passing interface <mpi.h> eg. $mpicc go_mpi.c -o go_mpi
   $mpirun -n 4 go_mpi
   $time mpirun -n 1 go OR $time mpirun -n 4 go

reference link: http://gribblelab.org/CBootcamp/A2_Parallel_Programming_in_C.html

------------------------------
GPU Parallel programming in C
------------------------------
OpenCL <cl.h>

Harvard and Von Neumann: Memory Architecture's

--------------------

Harvard architecture

--------------------
1. Physical connection
ROM(Program memory) <---Addr and Data Buses------> CPU <---Addr and Data Buses------> RAM(Data memory)

2. Execution difference: Here, decode instruction and execute instruction can be done one after another.
3. Following are the instruction execution steps:
fetch -> decode -> execute -> store

Eg. Atmel AVR, Michochip PIC, Intel 8051,



--------------------

Modified harvard architecture

--------------------
1.Physical connection
ROM(Program memory) & RAM(Data memory) <---Addr and Data Buses------> CPU

2. But the memory map is common for both.
3. Usually used in ARM processors
4. There is Instruction cache and Data cache inside the CPU. Even though CODE and DATA memories have separate buses, they are accessed by CPU from I and D cache, one after another. Hence single memory map.

Eg. ARM architecture based any processor and controller


--------------------

Von-neumann architecture

--------------------
1.Physical connection
ROM(Program memory) & RAM(Data memory) <---Addr and Data Buses------> CPU

2. Execution difference: Here, decode instruction and execute instruction cannot be done one after another.
3. Following are the instruction execution steps:
fetch -> decode -> evaluate address of operands -> get the operands from memory -> execute ->store

Eg. Texas Instruments MSP430





* Usually External EEPROM is connected to CPU using I2C (or any other communication protocol). EEPROM is treated as a peripheral.
* NAND/NOR FLASH or RAM are connected to CPU using high speed bus AMBA/AHBA. These bus run almost as fast as CPU speed.
* USB and Ethernet as also connected AMBA/AHBA as they are treated as high speed peripherals.
* DMA comes into picture when handling data related to data transfer on USB / Ethernet kind of things. This helps CPU to concentrate on some other task rather than spending time on data transfer kind of tasks.
* Remaining all peripherals are connected using PBA bus which is a low speed bus.

Mixed signal MCU mean's Digital and Analog Circuit on same die.

Some info on CACHE

When considering Harvard architecture, as there are separate Instruction and Data memories, cache are also different.
-Instruction cache (i-cache)
-Data cache (d-cache)

Data cache has a hierarchy (L1, L2, L3 or L4)

In a multicore processor system, L1, L2 cache are independent to each core. Usually the data on L3 and L4 are used for cache sharing between cores.
L1 and L2 cache are not shared as it would result in increase of wiring on silicon and eventually teh size of chip. Shaing L1,L2 cache even makes processing slow as the hit rate in cache decreases.

EXAMPLE:
--------
one classic example is to iterate a multidimensional array "inside out":

pseudocode
for (i = 0 to size)
  for (j = 0 to size)
    do something with ary[j][i]
The reason this is cache inefficient is because modern CPUs will load the cache line with "near" memory addresses from main memory when you access a single memory address. We are iterating through the "j" (outer) rows in the array in the inner loop, so for each trip through the inner loop, the cache line will cause to be flushed and loaded with a line of addresses that are near to the [j][i] entry. If this is changed to the equivalent:

for (i = 0 to size)
  for (j = 0 to size)
    do something with ary[i][j]
It will run much faster.

Harvard and Von Neumann: Memory Architecture's

--------------------

Harvard architecture

--------------------
1. Physical connection
ROM(Program memory) <---Addr and Data Buses------> CPU <---Addr and Data Buses------> RAM(Data memory)

2. Execution difference: Here, decode instruction and execute instruction can be done one after another.
3. Following are the instruction execution steps:
fetch -> decode -> execute -> store

Eg. Atmel AVR, Michochip PIC, Intel 8051,



--------------------

Modified harvard architecture

--------------------
1.Physical connection
ROM(Program memory) <---Addr and Data Buses------> CPU <---Addr and Data Buses------> RAM(Data memory)
2. But the memory map is common for both.
3. Usually used in ARM processors
4. There is Instruction cache and Data cache inside the CPU. Even though CODE and DATA memories have separate buses, they are accessed by CPU from I and D cache, one after another. Hence single memory map.

Eg. ARM architecture based any processor and controller


--------------------

Von-neumann architecture

--------------------
1.Physical connection
ROM(Program memory) & RAM(Data memory) <---Addr and Data Buses------> CPU

2. Execution difference: Here, decode instruction and execute instruction cannot be done one after another.
3. Following are the instruction execution steps:
fetch -> decode -> evaluate address of operands -> get the operands from memory -> execute ->store

Eg. Texas Instruments MSP430





* Usually External EEPROM is connected to CPU using I2C (or any other communication protocol). EEPROM is treated as a peripheral.
* NAND/NOR FLASH or RAM are connected to CPU using high speed bus AMBA/AHBA. These bus run almost as fast as CPU speed.
* USB and Ethernet as also connected AMBA/AHBA as they are treated as high speed peripherals.
* DMA comes into picture when handling data related to data transfer on USB / Ethernet kind of things. This helps CPU to concentrate on some other task rather than spending time on data transfer kind of tasks.
* Remaining all peripherals are connected using PBA bus which is a low speed bus.

Mixed signal MCU mean's Digital and Analog Circuit on same die.

MCIROCODES and its significance...

Microcodes are something that is present in control unit section of CPU. It resides in control store part of i.e. like a high speed ROM in control unit of CPU.
Control unit decodes instruction from code and call the particular instruction of microcode.
Microinstruction from microcode are large in size like around 50bit or more... because they generate the required signals based on its each bit to ALU.

Microcodes system is only used in CISC based architectures. It is used instead of hardwire circuit bcz, it reduces the job of designing complex "control unit" circuit in CISC.

RISC uses Hardwired control unit circuit. Previously, the job of writing assembly code for RISC processor was time consuming as it required LOAD and STORE instruction for every simple job like "Addition  of two numbers"
RISC architecture assembly code takes more space on RAM as compared to CISC.

But as years passed, the compilers became so improved that, no more assembly level programming is mandatory. Hence, in RISC architecture based system, more emphasis is given on software side i.e. more emphasis given on building intelligent compiler.

Thursday, December 17, 2015

"yield" function sample code

Following is a sample code where use of yield function is shown. create_generator()  function is defined which basically iterates over a range of numbers 1 to 4. Here the value a  and  b  is summed up and incremented in every iteration.

In the following steps, when this same function create_generator() is used to provide range of values for loop (for i in create_generator() ) every time the program flow stops at yield  statement and provides value to i. This value is printed. Loop continues until the loop inside create_generator() completes.

It can also be seen in the following code where each step is printed using next() functionality. The last print statement is commented. Because, at that point there is no further value produced by generator function. It throws error.

Both the outputs (with and without comment part are provided). you can try it out too ....


#__author__ = 'Sumant'
# use of yield functionality of python

def create_generator():
    a = 1    
    b = 10  
    for count in range(1, 5):
        yield a + b
        a += 1        
        b += 1
for value in create_generator():
    print value

print "going step by step ..."
xx = create_generator()
print xx.next()
print xx.next()
print xx.next()
print xx.next()
#print xx.next()

-------------------------------------------------
OUTPUT (with last line commented)
-------------------------------------------------
11
13
15
17

going step by step ...
11
13
15
17

Process finished with exit code 0


-------------------------------------------------
OUTPUT (uncomment last line)
-------------------------------------------------
11
13
15
17
going step by step ...
11
13
15
17
Traceback (most recent call last):
  File ".../demo_yield.py", line 22, in <module>
    print xx.next()
StopIteration

Process finished with exit code 1

How to use "lambda" in python ??

Few days ago, I came across a very interesting function used in one of the python test scripts. Its called "lambda"
lambda is like a callback function. Its a alternate to def in situations where 

  • function is just one liner & only used once
  • when a calling function needs a function as an input parameter
It is a keyword used to define "anonymous function". Its a kind of functionality similar to defining function where you define parameters and what to do with it.

But by using lambda, you can make use of other functions like "filter, map and reduce". I have provided this sample python script where lambda function is defined and shown how it can be used along with filter, map and reduce ...

#__author__ = 'Sumant'
# use of lambda function along with "filter, map & reduce"
xx = lambda a, b: a+b
print xx(2, 3)

lst = range(1, 6)
print "lst = " + str(lst)
print "filter functionality:"
print filter(lambda a: a+1 < 5, lst)
print "lst = " + str(lst)
print "maps functionality:"
print map(lambda a, b: a*b, lst, lst)
print "lst = " + str(lst)
print "reduce functionality:"
print reduce(lambda a, b: a * b, lst)

--------------------------------
OUTPUT
--------------------------------
5
lst = [1, 2, 3, 4, 5]
filter functionality:
[1, 2, 3]
lst = [1, 2, 3, 4, 5]
maps functionality:
[1, 4, 9, 16, 25]
lst = [1, 2, 3, 4, 5]
reduce functionality:
120

Process finished with exit code 0


Hope this link will be helpful ... :)

There are few very wonderful links where it is explained when & where to use "lambda" function:
https://pythonconquerstheuniverse.wordpress.com/2011/08/29/lambda_tutorial/

PROFILE

My photo
India
Design Engineer ( IFM Engineering Private Limited )

Followers