DUAL CORE ARCHITECTURE FOR CELLULAR HANDSETS

Autor: iMarketing.es – Consultoría informática y de gestión, servicios tecnológicos y de outsourcing

Nueva economía, internet y tecnología

Suscríbete GRATIS al boletín y recibe:

10 ebooks con las lecciones empresariales más representativas de Jack Welch, Kenichi Ohmae, Michael Newman y otros exitosos líderes de primer nivel en el mundo de los negocios...

Al pulsar aceptas los términos de uso y la política de privacidad

02-2005

Texto

Descargar Original


The exponential growth of the wireless
communications industry has created a multitude Motorola's M·CORE architecture is designed
of new products with advanced features that specifically for sophisticated, yet low power,
allow users to stay in touch with every aspect of applications. It's a fully static CMOS core that
their lives wherever they may be. These new packs about 80,000 transistors in a 2.2-mm2
products are quite diverse, require more system square of silicon in a 0.36 micron process. The
performance with no exceptions to power architecture implements logic within portions of
conservation and have short product life cycles. the core execution and control blocks to
Features such as video-teleconferencing, global minimize power and reduce EMI. In addition to
positioning and internet access requires these providing mechanisms to power down the
systems to be flexible and capable of processor and system logic, there is focus on
understanding a variety of digital wireless minimizing dynamic power consumption when
standards currently defined by the USA, Europe, the system is active.
Asia-Pacific and Japan.
The M·CORE architecture utilizes a streamlined
For example, there is a growing need for cellular execution engine that provides many of the same
baseband transceivers that accommodate GSM as performance enhancements as mainstream RISC
well as CDMA standards at a low cost. In order architectures. It is implemented with a fixed 16-
to accomplish this, a micro-architecture that bit instruction length and 32-bit internal data
couples easily to DSPs, ASICs, standard path which meets the computational precision
peripherals and memory devices is needed. This requirements of newer advanced products with
micro-architecture must be programmable in C the cost and power advantages previously
or C++, supported by the most popular real-time available only with 16-bit architectures. Thus,
operating systems, and have a high instance of increased code density accomplishes the goal of
code re-usability for rapid prototype minimizing the overhead of memory system
development with a rich development tool set. energy consumption.


A close examination of the M·CORE micro- reducing external bus transients which consume
RISC architecture, as illustrated in Figure 1, power.
shows how it was designed for optimal
performance and low power consumption. Key The arithmetic unit contains a barrel shifter
factors to consider are instruction set efficiency, which provides fast multiply and signed or
memory utilization, special low power modes for unsigned divides of integers, as well as special
static operation, power consumption during help in translation of incoming/outgoing data,
dynamic operation, and code density. Other such as single cycle bit reversal of a 32 bit word.
important factors to consider during product Data movement is accomplished using
design are the ease of interface to custom load/stores of single or multiple registers in one
peripheral circuits and ASICS, on-chip instruction. This facilitates fast and efficient
JTAG/OnCETM emulation port and development register utilization when entering/exiting
tool support from third party vendors. subroutines and context switches between user
and supervisor mode.
Instruction Set Efficiency
System-level power management
Optimal instruction set efficiency is
accomplished in the M·CORE architecture by To provide optimal static power management for
implementation of a universal load-store RISC the overall system, the M·CORE architecture
engine. The core contains a 16 entry, 32-bit provides three instructions (stop, wait, and doze)
general purpose register file, and processes that enable external logic to disable power to
instructions using an efficient four-stage parts of the system. Execution of any of these
execution pipeline. All computational activity instructions causes the processor to assert the
takes place within the internal registers thus LPMD1-0 output signals in the manner described
in Table 1.


pipeline. The instruction pipeline recognizes
Table 1: Low power mode which processor functions are required to
execute a particular instruction. This enables it to
signal encoding [1] ensure that data only transitions through the
LPMD1 LPMD0 Mode processor blocks that is actually needed to
implement the instruction. For example, an add
0 0 STOP instruction would cause data to transition
0 1 WAIT through the adder but not through the barrel
shifter. By eliminating unnecessary transitions,
1 0 DOZE the M·CORE architecture prevents switching of
1 1 normal gates, loads, and wires in unused blocks, all of
which would otherwise consume additional
The external logic uses the LPMD1-0 inputs to power.
determine exactly which parts of the overall
system logic should be placed in a low-power Code density
state. The external logic can also place the
processor in a low power mode by forcing the Compilers were developed in conjunction with
CLK input high. the M·CORE architecture instruction set to
maximize code density. Code density is a
Dynamic power consumption measure of how many bytes of code are required
to implement an application or function. Code
Although reducing a system's static power usage density affects power consumption both
achieves the greatest overall reduction in power statically and dynamically. The M·CORE
consumption, a true low power solution must architecture's high code density results in a
address the issue of dynamic power smaller executable image. This reduces an
consumption. By dynamic power consumption, application's memory requirements, which in
we are referring to the power required by the turn reduces system cost and system power
system when it is actually being used. The consumption. However, there is a second benefit
M·CORE architecture optimizes dynamic power to code density. Every time the processor fetches
consumption by both minimizing the power an instruction from memory, it must use a bus
needed to execute an instruction and minimizing cycle. Bus cycles, of course, consume power.
the number of bytes that need to be fetched to Since the M·CORE architecture's dense code
perform a given function. allows it to perform equivalent functionality with
fewer bytes of code, a program executing on an
Power Aware instruction pipeline M·CORE processor will consume less power
because it will fetch fewer bytes from memory.
The low power instructions discussed earlier
provide a mechanism to power down select parts Embedded and portable benchmarks were used
of the system when not used. With processors to make design trade-offs in the architecture and
themselves becoming more complex, a logical the compiler. The Powerstone benchmarks,
extension of this is to only power up the parts of which include paging, automobile control, signal
a processor that are required to execute an processing, imaging and fax applications, are
instruction. The M·CORE architecture achieves detailed in Table 2.
this benefit through its advanced power aware
3


Table 2: Powerstone Benchmark Suite [2]
Benchmark Instr. Count Description
auto 17374 Automobile control applications
bilv 21363 Shift, AND, OR operations
bilt 72416 Graphics application
compress 322101 A Unix utility
crc 22805 Cyclic redundancy check
des 510814 Data Encryption Standard
dhry 612713 Dhrystone
engine 986326 Engine control application
fir_int 629166 Integer FIR filter
g3fax 2918109 Group three fax decode (single
level image decompression)
g721 231706 Adaptive differential PCM for
voice compression
jpeg 9973639 JPEG 24-bit image
decompression standard
pocsag 131159 POCSAG communication
protocol for paging application
servo 41132 Hard disc drive servo control
summin 3463087 Handwriting recognition
ucbqsort 674165 U.C.B. Quick Sort
v42bits 8155159 Modem encoding/decoding
whet 3028736 Whetstone
During initial analysis the M·CORE architecture code density, the 16-bit instruction set provides a
instruction set was profiled by running the performance advantage over conventional RISC
Powerstone benchmark suites on a cycle accurate architectures in many low-cost applications. It is
C++ simulator. Table 3 shows the percentage of common for such applications to minimize cost
dynamic instructions utilizing the adder and through use of a 16-bit bus. Since conventional
barrel shifter, as well as the percentage of change RISC architectures use 32-bit wide instructions,
of flow and load/store instructions. they have to perform two bus cycles to fetch an
instruction, negatively impacting overall
Table 3: Dynamic Instruction Percentages [2]
Type Dynamic Instruction
Percentage
Adder Usage 50.23%
Barrel shifter usage 9.68%
Change of flow instructionsa. 17.04%
Load/store instructions 22.46%
a. (83.5% of change of flow instructions are taken)
instruction throughput. In contrast, the M·CORE
Although the M·CORE architecture is 32-bits, it architecture would only require a single bus
utilizes a 16-bit instruction set to achieve high cycle to perform an instruction fetch, enabling it
code density. In addition, to providing improved to run at full speed even with a 16-bit bus.
4


A comparison to other popular architectures was 16 general purpose registers, an alternate register
made to evaluate instruction set efficiency and file with 16 registers, and 5 scratch registers.
favorable results were realized as illustrated in
Figure 2. Compiler efficiency played a key role The register file consumes 16% of total
in the code density comparisons especially when processor power and 42% of data path power due
evaluating function call stacking, interrupt to the high utilization of the registers in the
handlers, variable manipulation and the handling instruction set. Since loads and stores in a
of if-else conditional statements. The typical commercial RISC constitute
implementation of conditional move, increment, approximately 23% of the dynamic instructions
decrement, and clear operations supplemented executed, the implementation of the alternate
traditional change of flow instructions and register file coupled with the ability to load/store
helped improve compiler optimization. multiple registers improved interrupt entry and
exit latency and reduced memory accesses for
instruction fetches and variable save/restore.
Figure 2: Code Density Comparison using Powerstone Benchmarks
1.60
Code Density significantly affects
power consumption, runtime performance, and system cost
1.50 1.49
sity 1.46 1.47
n
e 1.41 1.42
1.40 d
e
d .
CORE n
· co e
e
M b E E E E E
1.30 tw R R R R R
u
m e O O O O O
To h b C C C C C
e · · · · ·
v l T in
i M M
t 1.20 a lls M M M
n n
al ctu fa a a an an an
r a ht h h h h
t t t t
Re y y y y y
u r r r r r
e o o o o
z 1.10 Y 1.07 mo mo m m m
e e e
Si E me me m m m
R e e e e e
r r r r r
O
Code 1.00 1.00 e o o o
r C
o · mo mo m m m
M
m % %
n 6 1
a
9% 4 4
h
0.90 s s
t e e es 42% es 49% es 47%
es y r r r r r
r r i i i i i
i u u u u u
u q q q q q
q mo e e e e e
e R R R R R
0.80 R me
M·CORE Thumb ARM7 V830 V850 SH2 SH3
(compressed ARM9
ARM) StrongArm
Compplieed C coode oppimizzed foor code ddennstyy.
Com i l d C c de o ttimie d f r co de e siit .
CCoompplieerss: DDaabb 4.1,1,AARRMM SSDDKK2255 TThuumb 1.04, GGreen Hilills 1.8, Hitaacch 3.00F.
m i l r : ii 4. ..,, h m b 1.04, ree n H ls 1.8, Hit hii 3. F.
Rich register set
Support for multiple data sizes
To further minimize bus activity, the M·CORE
architecture reduces the need to read and write Some commonly used data types such as chars or
data to and from memory. It achieves this by shorts have 8- or 16-bit, rather than 32-
providing a rich set of registers that enables a bit,representations. This provides an additional
program to keep data variables in registers while opportunity for the M·CORE architecture to
they are live. The M·CORE architecture reduce power consumption when fetching data
provides a total of 37 32-bit data registers that from memory. For example, the M·CORE
are available to system programmers, one set of architecture would only toggle the 8 bits required
to read or write a char, minimizing power
5


consumption by logic external to the processor synthesis required 60% more transistors and
core. To speed up memory copy and 175% more area with an increase of 40% more
intitialization operations load multiple/load power. Thus the data path was custom designed
quadrant and store multiple/store quadrant to reduce power and area.
instructions were added for block moves of
registers to memory or memory to registers. Further analysis showed that Clock power was
This helped compiler resolution of variable 36% of the total processor power consumption.
alignment in memory. The M·CORE processor uses a single global
clock with local generation of dual phase non-
Low Voltage overlapping clocks. Clock gating can be
performed which allows for complete or partial
Since dynamic power consumption is clock tree disabling. The ability of clock gating
proportional to the square of the supply voltage permits specific data paths to be shut down
required, lowering the voltage provides a during pipeline stalls thus saving power. This is
disproportionately large boost to battery life. quite important since the data path consumes
M·CORE processors are designed to require 36% of total power while the remaining 28% is
only 1.8 volts to operate, with future versions consumed by control logic.
planned to use as little as 0.9 volts.
Interrupt latency was significantly improved by
Figure 3: Cellular Handset Block Diagram [3]
Telephone PA
LCD
Display 1 2 3
4 5 6
Roam InUse NoSvc 7 8 9 RF/IF Audio Codec
1-800-555-1212
LoBatt Rchg * 0 #
I/O LEVEL TRANSLATORS
S Shared S
SRAM R Keypad GPIO Prot Timer QSPI Memory Baseband SerialAudio R
O Memory O
T T
A A
SL SL
N N
A F/I A
R R
T m·core yr DSP56600 T
Mem o m
VEL Ext VEL
E E
L Me
L
O O
FLASH EEPROM /I UART Smartcard Timers /I
Debug Logic
I/O LEVEL TRANSLATORS
SmartCard
the use of a 32 channel programmable interrupt
Processor Power Distribution controller. The 16 alternate registers improved
interrupt latency entry and exit by eliminating
Analysis of the architectural implementation the need to perform memory accesses for
showed that clock and data paths consumed a saving/restoring processor state. The use of a
large portion of the power. This led to a critical Find First One (FF1) instruction eliminated the
decision on whether to synthesize or custom need for interrupt priority scanning routines.
design the data path. Research showed that This combination of special circuits realized a
6


37% improvement over the ARM processor with Figure 3 where all signal processing functions
respect to interrupt service handling when such as speech coding/decoding, error correction,
performing a virtual DMA benchmark. channel coding/decoding, equalization,

Figure 4: DSP56652 Cellular Baseband Processor [5]
DSP56600 DSP Core
·High performance: 60MHz @1.8V
CODEC ports for Baseband & Audio ·1x engine 60MHz = 60 MIPS
·Full duplex ·16-bit data
·Standard codec clock generation ·Efficient 24-bit instruction set
·16x16=40-bit multiply
M·CORE MCU ·GSM bit-exact arithmetic support
·20MHz @ 1.8V Data RAM ·Fully static
·32 bit architecture, fixed 16-bit instr DSP 13K x 16 ·Ultra low power modes
·Architected for handheld applications Debug ·Special power management
·Best-in-class Code Density
·Low Power, High Performance Data ROM
Baseband 56600 DSP Memory
·Dual 16 entry / 32-bit register files 18k x 16
CODEC DSP core ·On-chip DSP ROM
·Efficient 4 stage pipeline Serial Port ·ROM patch capability
PROM
·Single cycle execution for most instr ·On-chip DSP RAM
·Byte, half-word, word access 48K x 24
Audio Flexible Clock Generator
·Fast interrupt support CODEC PRAM ·16-60MHz PLL
Serial Port ·Two clock inputs:
Queued Serial Peripheral Interface DSP/MCU 512 x 24
·10-20 MHz or
·SPI compatible Interface
QSPI Clocks/PLL ·32KHz
·Variable queue size 1024 x 16
Serial Port
·Full or half duplex uC MCU-DSP Interface (MDI)
Smartcard Interface Module Debug ·1024 x 16-bit dual access
·3V Smartcard interface SIM M·CORE ·Polled or interrupt messaging
·ISO7816 standard µRISC JTAG OnCE Debug Ports
External MCU bus MCU core · M·CORE&DSP56600 core debug
External
·22-bit address ·Non-intrusive examine/modify
Keypad Port Bus
·16-bit data MUX ·Access via JTAG port
·Up to 8x8 scan Interface RAM
·Glueless system integration ·Or GPIO 512 x 32 JTAG Test Access Port
Keypad
Protocol Timer ·IEEE 1149.1 compliant
ROM UART
·Radio Channel timing control Interface ·For system diagnostics
4K x 32
·Frame number & position ·Access to M·CORE and DSP
Protocol
·Macro capability GPT UART Serial Communication Port
·8 outputs, 4 QSPI triggers Timer Timer/PIT ·Full Duplex
·16 vectored DSP interrupts Watchdog ·7- or 8-bit operation
·DSP wakeup ·Full 8 wire serial interface
·Timing advance/retard ·IrDA standard support
·Frame table restart/swap Periodic Timer / Watchdog MCU General Purpose Timer ·robust receiver sampling/filtering
·16-bit "set & forget" interrupts ·8-bit prescaler ·16-byte FIFO's
·Countdown or freerun ·Two 16-bit free-running counters ·Bit rates from 300 to 525Kbps
·Watchdog hardware reset ·3 output compare/2 input capture ·Low power wake-up modes
·Watchdog timeouts: 0.5 to 32 sec ·PWM capability for tone generation
modulation and encryption are all accomplished
DSP56652 Integrated Cellular using a 60 MIPS DSP56600 core that executes
Baseband Processor one instruction every clock-cycle.
Tremendous progress has been made in reducing In this application the M·CORE processor
performs all microcontroller functions associated
the parts count of the baseband functions of a
with the phone user interface as well as protocol
wireless handset. This has been accomplished to
processing. Communication between the two
meet cost, size, power and system performance
cores is accomplished via a sophisticated MCU-
requirements of the latest versions of cellular
DSP interface (MDI) consisting of a 1K words
phones being marketed today. A key ingrediant
dual-access memory (with read/write access for
for the increase of battery life in a cellular phone
both processors) and a messaging unit, which
is component count reduction. By integrating an
M·CORE processor with an advanced 16-bit features independent messaging logic and
provides status and messaging control.
Digital Signal Processor (DSP), operating at Development of a Call-Processing Engine
1.8V, TDMA applications based on the IS-136 algorithm is easily accomplished using ANSI-C
protocol can be accomplished with efficient with in-line assembly language interrupt
battery power management to accomplish the handlers.
baseband functions of a cellular phone excluding
the front and backend analog blocks as illustrated Each core has a set of Input/Output peripherals
in Figure 3. System partitioning is illustrated in for interfacing to the analog and RF sections of



the phone. A key peripheral, the dedicated means for controlling the dual-core processor
protocol timer, offloads the task of maintaining directly in the target system.
handset to base station communication for both
cores. Once programmed by the M·CORE Software and hardware breakpoint registers are
processor, the timer is capable of coordinating all provided along with a First-In-First-Out program
radio operations, including activation of the counter trace buffer which stores change of flow
receiver, transmitter and frequency synthesizers. addresses. Single-stepping opcodes with a 16 bit
counter is available and the OnCETM registers are
The main goal of the protocol timer is to off-load accessible while each core runs in real-time or is
compute intensive tasks such as event scheduling in reset. This interface is very useful for
associated with the TDMA protocol. Software measuring static and dynamic power
only needs to reprogram the timer once per consumption and also allows analysis of code
frame. It is capable of generating timing signals, hot spots. Each core when put in the debug
trigger signals and interrupts to the M·CORE mode of operation will shut down clocks to the
processor and to the DSP. Sophisticated sets of respective core as well as its peripherals. This
tables interact for control of receive and transmit allows distributed power analysis by shutting
channel time intervals and number of frames per down one core and its peripherals while the other
channel. Macro tables are utilized to reduce the core may remain running in real-time. Specific
programming of events that have fixed hot spots in code of each processor may be
relationship between each other. [4] analyzed with external power measuring tools
that monitor current through the respective
The production version of the iDEN i1000TM core's power pins. It should be noted that power
phone utilizes the DSP56652 ROM version in a pins for each of the specific cores as well as their
0.31 micron, triple-layer metal static CMOS respective peripherals are isolated so they may
process. This device consists of 8 Kbytes of be filtered and powered properly.
ROM and 2 Kbytes of SRAM to support the
M·CORE processor. The chip measures 7.4 mm The M·CORE processor and DSP OnCE
on a side or 55 sq. mm. The part is packaged in a interface is currently supported by a Motorola
196 plastic ball grid array (PBGA) and was Universal Command Converter (UCC), which
designed for 16.8 MHz performance at 1.8v. communicates with a Software Development
This device when running out of internal SRAM Systems (SDS) source level debugger. The SDS
consumes on average less than 9 ma at 1.8v, SingleStep debugger is tightly integrated into the
which translates into less than 16.2 mW at 16.8 Motorola Tool Suite through the UCC interface
MHz for the complete system. On average, this so that the dual-core system can be easily
implementation of the M·CORE processor controlled using one common tool.
consumes 2.8 mA, which translates to a 0.30
mW/MHz rating. The part consumes less than C / C++ as well as assembly language programs
60 microamps in STOP mode. compiled using a Diab Data M·CORE
architecture cross-compiler can be quickly
DSP56652 Development Tools evaluated in this environment. Motorola also
includes a DSP GNU C compiler, debugger,
simulator, linker, assembler and DSP56652
To accelerate system level integration and also evaluation board.
provide a means for production and field testing
of new product, a Motorola standard OnCETM
block is available on the M·CORE processor as
well as the DSP56600 processor. This block Conclusion
provides a dedicated emulation interface for
rapid evaluation of the system hardware and As the wireless communications industry
software. Communication with the block is progresses forward at lightning speed with new
conducted via a 5 wire IEEE 1149 JTAG product designs, the issue of high performance
controller and provides direct access to each of with low-power consumption will present new
the processors' instruction registers so that challenges to wireless product designers. In
opcodes may be fed directly to each instruction order to design these new products in a timely
pipeline bypassing external accesses to memory. manner a complete solution is of utmost
This mechanism provides a true non-intrusive importance for rapid delivery. Motorola's new

 

Nota: Es probable que en esta página web no aparezcan todos los elementos del presente documento.  Para tenerlo completo y en su formato original recomendamos descargarlo desde el menú en la parte superior

iMarketing.es – Consultoría informática y de gestión, servicios tecnológicos y de outsourcing

http://www.imarketing.es/articulos 

Buscar recursos sobre

Los más nuevos

Una frase memorable

Más lecturas interesantes

Acerca de GestioPolis: Qué es GestioPolisTérminos de uso y Política de privacidadMapa del sitioContáctoAliadosContratar publicidad

Derechos de Autor: Los contenidos están bajo la licencia Reconocimiento - No comercial - Compartir bajo la misma licencia 3.0 Unported de Creative Commons a menos que se indiquen derechos de autor específicos.  Si desea citar o utilizar públicamente alguno de los contenidos le solicitamos ponerse en contacto con el respectivo autor.

Derechos Reservados sobre el concepto del sitio web GestioPolis.com © 2008 Carlos López