September 3, 2015

RPi VFPU : the denormal numbers terminator code

Filed under: Raspberry 3.14, linux — Tags: , , , , , , , , — admin @ 9:19 am
static void runfast(void) {
    uint32_t fpscr;
    __asm__ __volatile__ ("vmrs %0,fpscr" : "=r" (fpscr));
    fpscr |= 0x03000000;
    __asm__ __volatile__ ("vmsr fpscr,%0" : :"ri" (fpscr));

Raspberry Pi model B+ , impacts on all the program code at runtime, (and all #included libraries )


February 8, 2013

RasPi-B : Some speed tests using GPIOs

Filed under: Raspberry 3.14 — Tags: , , , , , — admin @ 9:24 am

This post deals with speed tests run , using GPIOs of the RasPI.
My job deals with hardware and software design for industry products , and one of my latest projects has to deal with floating point multiplication.
To be clear enough, on a special device currently under development, i have to use an 512 bytes input matrix that’s passing thru a neuron network with hidden layers.
Experiment have proved that i can’t use a PIC18 ( for instance 18F4550 @ 48MHz ) because of it’s lack of speed doing the computations ( 512 times multiplying 2 floats then adding result to a counter , this operations two times, in the shortest time possible ).
Speed of computation is a major subject for that project, because the device operates in ‘real-time’ ( computation’s lag introduced by the device must not exceed 1ms )

So, the tests results i’ll post here, will always be related to a 512 time-related loop.

Test configuration :

  • using low_level memory access ( direct access to the GPIO port in memory )
  • program is compiled using gcc, with -mfloat-abi=hard
  • program is executed in SU mode ( else we won’t be able to access to /dev/mem ) , under the LXDE (forget the interrupts lags, we’ll deal with it later )

TEST 1 : simple float variables :

the vars:

float t_result;
float t_a;
float t_b;

the test prog part :

// there we set one of the GPIO output to Hi
for (i=0;i<512;i++) { t_result+= t_a * t_b ; }
// there we set the GPIO output to Low

results : Using that configuration for test, i get 35 microseconds (GPIO lo/hi/lo toggle time) for computation.

TEST 2 : 2 Arrays of float variables

the vars :
float t_result;
float rx[512];
float mx[512];

the test prog part :

// there we set one of the GPIO output Hi
for (i=0;i<512;i++) { t_result+=rx[i] * mx[i] ; }
// there we set the GPIO output to Low

results : Using that test prog, we get 60 microseconds for computation ( some interrupts happening while computing, that time can extend to 100 microseconds from time to time ).
If we run the same test from the root console, we get 55 microseconds , up to 85 microseconds if the loop is halted by an interrupt request while computing.

The interrupts are an issue for applications who are time-critical.
My current hardware project involves a linear CCD reading, and using C code with SDL  plus the wiringPi library (drogon.net) gives results that dont satisfy me. The CCD exposure time must be the same from one reading to another, so the CCD dump must be done at strict periodic intervals of time, and that goal can’t be achieved using the normal environment, because of the bunch of interrupts happening very often. the main problem comes from the USB thread, wich is taking much time to poll for USB events.

Below is a picture of the oscilloscope’s probing the CCD output.

And a closer look at the problem :

One can easily see the main timing ( bright lines) , while the ‘phantom’ like trace is the CCD’s analogic output when time as been stretched because of an interrupt occuring.

Because time was greater during that interrupt, the CCD exposure time was a little higher than the one without interrupts, thus truncating the CCD’s output values.

The 100 microseconds time between each CCD read is the time for the C program to update the screen ,using SDL wich is initialised with a 0 value instead of SWSURFACE nor HWSURFACE on screen init.

That way,the SDL_Flip() is not needed.
The Frame buffer is updated when you access it.
It creates flickering when used like this, but the main loop then don’t gets stuck waiting for a vertical refresh and hangin on the CCD read sequence) .

i could kill the USB thread before using my prog, but i’ve choosed to try coding at bare-metal level instead,because i don’t need USB devices connected to the Pi for now.
Yes this is ambitious.
But we have to believe and make. Isn’t it ?


cat{ } { post_272 } { } 2009-2015 EIhIS Powered by WordPress