Suscríbete GRATIS al boletín y recibe:
10 ebooks con las lecciones empresariales más representativas de Jack Welch, Kenichi Ohmae, Michael Newman y otros exitosos líderes de primer nivel en el mundo de los negocios...
Al pulsar aceptas los términos de uso y la política de privacidad
O mediante uno de los siguientes servicios:
The exponential growth of the wireless
communications industry has created a multitude Motorola's M·CORE
architecture is designed
of new products with advanced features that specifically for
sophisticated, yet low power,
allow users to stay in touch with every aspect of applications. It's a
fully static CMOS core that
their lives wherever they may be. These new packs about 80,000
transistors in a 2.2-mm2
products are quite diverse, require more system square of silicon in a
0.36 micron process. The
performance with no exceptions to power architecture implements logic
within portions of
conservation and have short product life cycles. the core execution and
control blocks to
Features such as video-teleconferencing, global minimize power and
reduce EMI. In addition to
positioning and internet access requires these providing mechanisms to
power down the
systems to be flexible and capable of processor and system logic, there
is focus on
understanding a variety of digital wireless minimizing dynamic power
consumption when
standards currently defined by the USA, Europe, the system is active.
Asia-Pacific and Japan.
The M·CORE architecture utilizes a streamlined
For example, there is a growing need for cellular execution engine that
provides many of the same
baseband transceivers that accommodate GSM as performance enhancements
as mainstream RISC
well as CDMA standards at a low cost. In order architectures. It is
implemented with a fixed 16-
to accomplish this, a micro-architecture that bit instruction length and
32-bit internal data
couples easily to DSPs, ASICs, standard path which meets the
computational precision
peripherals and memory devices is needed. This requirements of newer
advanced products with
micro-architecture must be programmable in C the cost and power
advantages previously
or C++, supported by the most popular real-time available only with 16-bit
architectures. Thus,
operating systems, and have a high instance of increased code density
accomplishes the goal of
code re-usability for rapid prototype minimizing the overhead of memory
system
development with a rich development tool set. energy consumption.
A close examination of the M·CORE micro- reducing external bus
transients which consume
RISC architecture, as illustrated in Figure 1, power.
shows how it was designed for optimal
performance and low power consumption. Key The arithmetic unit contains
a barrel shifter
factors to consider are instruction set efficiency, which provides fast
multiply and signed or
memory utilization, special low power modes for unsigned divides of
integers, as well as special
static operation, power consumption during help in translation of
incoming/outgoing data,
dynamic operation, and code density. Other such as single cycle bit
reversal of a 32 bit word.
important factors to consider during product Data movement is
accomplished using
design are the ease of interface to custom load/stores of single or
multiple registers in one
peripheral circuits and ASICS, on-chip instruction. This facilitates
fast and efficient
JTAG/OnCETM emulation port and development register utilization when
entering/exiting
tool support from third party vendors. subroutines and context switches
between user
and supervisor mode.
Instruction Set Efficiency
System-level power management
Optimal instruction set efficiency is
accomplished in the M·CORE architecture by To provide optimal static
power management for
implementation of a universal load-store RISC the overall system, the
M·CORE architecture
engine. The core contains a 16 entry, 32-bit provides three instructions
(stop, wait, and doze)
general purpose register file, and processes that enable external logic
to disable power to
instructions using an efficient four-stage parts of the system.
Execution of any of these
execution pipeline. All computational activity instructions causes the
processor to assert the
takes place within the internal registers thus LPMD1-0 output signals in
the manner described
in Table 1.
pipeline. The instruction pipeline recognizes
Table 1: Low power mode which processor functions are required to
execute a particular instruction. This enables it to
signal encoding [1] ensure that data only transitions through the
LPMD1 LPMD0 Mode processor blocks that is actually needed to
implement the instruction. For example, an add
0 0 STOP instruction would cause data to transition
0 1 WAIT through the adder but not through the barrel
shifter. By eliminating unnecessary transitions,
1 0 DOZE the M·CORE architecture prevents switching of
1 1 normal gates, loads, and wires in unused blocks, all of
which would otherwise consume additional
The external logic uses the LPMD1-0 inputs to power.
determine exactly which parts of the overall
system logic should be placed in a low-power Code density
state. The external logic can also place the
processor in a low power mode by forcing the Compilers were developed in
conjunction with
CLK input high. the M·CORE architecture instruction set to
maximize code density. Code density is a
Dynamic power consumption measure of how many bytes of code are required
to implement an application or function. Code
Although reducing a system's static power usage density affects power
consumption both
achieves the greatest overall reduction in power statically and
dynamically. The M·CORE
consumption, a true low power solution must architecture's high code
density results in a
address the issue of dynamic power smaller executable image. This
reduces an
consumption. By dynamic power consumption, application's memory
requirements, which in
we are referring to the power required by the turn reduces system cost
and system power
system when it is actually being used. The consumption. However, there
is a second benefit
M·CORE architecture optimizes dynamic power to code density. Every time
the processor fetches
consumption by both minimizing the power an instruction from memory, it
must use a bus
needed to execute an instruction and minimizing cycle. Bus cycles, of
course, consume power.
the number of bytes that need to be fetched to Since the M·CORE
architecture's dense code
perform a given function. allows it to perform equivalent functionality
with
fewer bytes of code, a program executing on an
Power Aware instruction pipeline M·CORE processor will consume less
power
because it will fetch fewer bytes from memory.
The low power instructions discussed earlier
provide a mechanism to power down select parts Embedded and portable
benchmarks were used
of the system when not used. With processors to make design trade-offs
in the architecture and
themselves becoming more complex, a logical the compiler. The Powerstone
benchmarks,
extension of this is to only power up the parts of which include paging,
automobile control, signal
a processor that are required to execute an processing, imaging and fax
applications, are
instruction. The M·CORE architecture achieves detailed in Table 2.
this benefit through its advanced power aware
3
Table 2: Powerstone Benchmark Suite [2]
Benchmark Instr. Count Description
auto 17374 Automobile control applications
bilv 21363 Shift, AND, OR operations
bilt 72416 Graphics application
compress 322101 A Unix utility
crc 22805 Cyclic redundancy check
des 510814 Data Encryption Standard
dhry 612713 Dhrystone
engine 986326 Engine control application
fir_int 629166 Integer FIR filter
g3fax 2918109 Group three fax decode (single
level image decompression)
g721 231706 Adaptive differential PCM for
voice compression
jpeg 9973639 JPEG 24-bit image
decompression standard
pocsag 131159 POCSAG communication
protocol for paging application
servo 41132 Hard disc drive servo control
summin 3463087 Handwriting recognition
ucbqsort 674165 U.C.B. Quick Sort
v42bits 8155159 Modem encoding/decoding
whet 3028736 Whetstone
During initial analysis the M·CORE architecture code density, the 16-bit
instruction set provides a
instruction set was profiled by running the performance advantage over
conventional RISC
Powerstone benchmark suites on a cycle accurate architectures in many
low-cost applications. It is
C++ simulator. Table 3 shows the percentage of common for such
applications to minimize cost
dynamic instructions utilizing the adder and through use of a 16-bit
bus. Since conventional
barrel shifter, as well as the percentage of change RISC architectures
use 32-bit wide instructions,
of flow and load/store instructions. they have to perform two bus cycles
to fetch an
instruction, negatively impacting overall
Table 3: Dynamic Instruction Percentages [2]
Type Dynamic Instruction
Percentage
Adder Usage 50.23%
Barrel shifter usage 9.68%
Change of flow instructionsa. 17.04%
Load/store instructions 22.46%
a. (83.5% of change of flow instructions are taken)
instruction throughput. In contrast, the M·CORE
Although the M·CORE architecture is 32-bits, it architecture would only
require a single bus
utilizes a 16-bit instruction set to achieve high cycle to perform an
instruction fetch, enabling it
code density. In addition, to providing improved to run at full speed
even with a 16-bit bus.
4
A comparison to other popular architectures was 16 general purpose
registers, an alternate register
made to evaluate instruction set efficiency and file with 16 registers,
and 5 scratch registers.
favorable results were realized as illustrated in
Figure 2. Compiler efficiency played a key role The register file
consumes 16% of total
in the code density comparisons especially when processor power and 42%
of data path power due
evaluating function call stacking, interrupt to the high utilization of
the registers in the
handlers, variable manipulation and the handling instruction set. Since
loads and stores in a
of if-else conditional statements. The typical commercial RISC
constitute
implementation of conditional move, increment, approximately 23% of the
dynamic instructions
decrement, and clear operations supplemented executed, the
implementation of the alternate
traditional change of flow instructions and register file coupled with
the ability to load/store
helped improve compiler optimization. multiple registers improved
interrupt entry and
exit latency and reduced memory accesses for
instruction fetches and variable save/restore.
Figure 2: Code Density Comparison using Powerstone Benchmarks
1.60
Code Density significantly affects
power consumption, runtime performance, and system cost
1.50 1.49
sity 1.46 1.47
n
e 1.41 1.42
1.40 d
e
d .
CORE n
· co e
e
M b E E E E E
1.30 tw R R R R R
u
m e O O O O O
To h b C C C C C
e · · · · ·
v l T in
i M M
t 1.20 a lls M M M
n n
al ctu fa a a an an an
r a ht h h h h
t t t t
Re y y y y y
u r r r r r
e o o o o
z 1.10 Y 1.07 mo mo m m m
e e e
Si E me me m m m
R e e e e e
r r r r r
O
Code 1.00 1.00 e o o o
r C
o · mo mo m m m
M
m % %
n 6 1
a
9% 4 4
h
0.90 s s
t e e es 42% es 49% es 47%
es y r r r r r
r r i i i i i
i u u u u u
u q q q q q
q mo e e e e e
e R R R R R
0.80 R me
M·CORE Thumb ARM7 V830 V850 SH2 SH3
(compressed ARM9
ARM) StrongArm
Compplieed C coode oppimizzed foor code ddennstyy.
Com i l d C c de o ttimie d f r co de e siit .
CCoompplieerss: DDaabb 4.1,1,AARRMM SSDDKK2255 TThuumb 1.04, GGreen
Hilills 1.8, Hitaacch 3.00F.
m i l r : ii 4. ..,, h m b 1.04, ree n H ls 1.8, Hit hii 3. F.
Rich register set
Support for multiple data sizes
To further minimize bus activity, the M·CORE
architecture reduces the need to read and write Some commonly used data
types such as chars or
data to and from memory. It achieves this by shorts have 8- or 16-bit,
rather than 32-
providing a rich set of registers that enables a bit,representations.
This provides an additional
program to keep data variables in registers while opportunity for the
M·CORE architecture to
they are live. The M·CORE architecture reduce power consumption when
fetching data
provides a total of 37 32-bit data registers that from memory. For
example, the M·CORE
are available to system programmers, one set of architecture would only
toggle the 8 bits required
to read or write a char, minimizing power
5
consumption by logic external to the processor synthesis required 60%
more transistors and
core. To speed up memory copy and 175% more area with an increase of 40%
more
intitialization operations load multiple/load power. Thus the data path
was custom designed
quadrant and store multiple/store quadrant to reduce power and area.
instructions were added for block moves of
registers to memory or memory to registers. Further analysis showed that
Clock power was
This helped compiler resolution of variable 36% of the total processor
power consumption.
alignment in memory. The M·CORE processor uses a single global
clock with local generation of dual phase non-
Low Voltage overlapping clocks. Clock gating can be
performed which allows for complete or partial
Since dynamic power consumption is clock tree disabling. The ability of
clock gating
proportional to the square of the supply voltage permits specific data
paths to be shut down
required, lowering the voltage provides a during pipeline stalls thus
saving power. This is
disproportionately large boost to battery life. quite important since
the data path consumes
M·CORE processors are designed to require 36% of total power while the
remaining 28% is
only 1.8 volts to operate, with future versions consumed by control
logic.
planned to use as little as 0.9 volts.
Interrupt latency was significantly improved by
Figure 3: Cellular Handset Block Diagram [3]
Telephone PA
LCD
Display 1 2 3
4 5 6
Roam InUse NoSvc 7 8 9 RF/IF Audio Codec
1-800-555-1212
LoBatt Rchg * 0 #
I/O LEVEL TRANSLATORS
S Shared S
SRAM R Keypad GPIO Prot Timer QSPI Memory Baseband SerialAudio R
O Memory O
T T
A A
SL SL
N N
A F/I A
R R
T m·core yr DSP56600 T
Mem o m
VEL Ext VEL
E E
L Me
L
O O
FLASH EEPROM /I UART Smartcard Timers /I
Debug Logic
I/O LEVEL TRANSLATORS
SmartCard
the use of a 32 channel programmable interrupt
Processor Power Distribution controller. The 16 alternate registers
improved
interrupt latency entry and exit by eliminating
Analysis of the architectural implementation the need to perform memory
accesses for
showed that clock and data paths consumed a saving/restoring processor
state. The use of a
large portion of the power. This led to a critical Find First One (FF1)
instruction eliminated the
decision on whether to synthesize or custom need for interrupt priority
scanning routines.
design the data path. Research showed that This combination of special
circuits realized a
6
37% improvement over the ARM processor with Figure 3 where all signal
processing functions
respect to interrupt service handling when such as speech
coding/decoding, error correction,
performing a virtual DMA benchmark. channel coding/decoding,
equalization,
Figure 4: DSP56652 Cellular Baseband Processor [5]
DSP56600 DSP Core
·High performance: 60MHz @1.8V
CODEC ports for Baseband & Audio ·1x engine 60MHz = 60 MIPS
·Full duplex ·16-bit data
·Standard codec clock generation ·Efficient 24-bit instruction set
·16x16=40-bit multiply
M·CORE MCU ·GSM bit-exact arithmetic support
·20MHz @ 1.8V Data RAM ·Fully static
·32 bit architecture, fixed 16-bit instr DSP 13K x 16 ·Ultra low power
modes
·Architected for handheld applications Debug ·Special power management
·Best-in-class Code Density
·Low Power, High Performance Data ROM
Baseband 56600 DSP Memory
·Dual 16 entry / 32-bit register files 18k x 16
CODEC DSP core ·On-chip DSP ROM
·Efficient 4 stage pipeline Serial Port ·ROM patch capability
PROM
·Single cycle execution for most instr ·On-chip DSP RAM
·Byte, half-word, word access 48K x 24
Audio Flexible Clock Generator
·Fast interrupt support CODEC PRAM ·16-60MHz PLL
Serial Port ·Two clock inputs:
Queued Serial Peripheral Interface DSP/MCU 512 x 24
·10-20 MHz or
·SPI compatible Interface
QSPI Clocks/PLL ·32KHz
·Variable queue size 1024 x 16
Serial Port
·Full or half duplex uC MCU-DSP Interface (MDI)
Smartcard Interface Module Debug ·1024 x 16-bit dual access
·3V Smartcard interface SIM M·CORE ·Polled or interrupt messaging
·ISO7816 standard µRISC JTAG OnCE Debug Ports
External MCU bus MCU core · M·CORE&DSP56600 core debug
External
·22-bit address ·Non-intrusive examine/modify
Keypad Port Bus
·16-bit data MUX ·Access via JTAG port
·Up to 8x8 scan Interface RAM
·Glueless system integration ·Or GPIO 512 x 32 JTAG Test Access Port
Keypad
Protocol Timer ·IEEE 1149.1 compliant
ROM UART
·Radio Channel timing control Interface ·For system diagnostics
4K x 32
·Frame number & position ·Access to M·CORE and DSP
Protocol
·Macro capability GPT UART Serial Communication Port
·8 outputs, 4 QSPI triggers Timer Timer/PIT ·Full Duplex
·16 vectored DSP interrupts Watchdog ·7- or 8-bit operation
·DSP wakeup ·Full 8 wire serial interface
·Timing advance/retard ·IrDA standard support
·Frame table restart/swap Periodic Timer / Watchdog MCU General Purpose
Timer ·robust receiver sampling/filtering
·16-bit "set & forget" interrupts ·8-bit prescaler ·16-byte FIFO's
·Countdown or freerun ·Two 16-bit free-running counters ·Bit rates from
300 to 525Kbps
·Watchdog hardware reset ·3 output compare/2 input capture ·Low power
wake-up modes
·Watchdog timeouts: 0.5 to 32 sec ·PWM capability for tone generation
modulation and encryption are all accomplished
DSP56652 Integrated Cellular using a 60 MIPS DSP56600 core that executes
Baseband Processor one instruction every clock-cycle.
Tremendous progress has been made in reducing In this application the
M·CORE processor
performs all microcontroller functions associated
the parts count of the baseband functions of a
with the phone user interface as well as protocol
wireless handset. This has been accomplished to
processing. Communication between the two
meet cost, size, power and system performance
cores is accomplished via a sophisticated MCU-
requirements of the latest versions of cellular
DSP interface (MDI) consisting of a 1K words
phones being marketed today. A key ingrediant
dual-access memory (with read/write access for
for the increase of battery life in a cellular phone
both processors) and a messaging unit, which
is component count reduction. By integrating an
M·CORE processor with an advanced 16-bit features independent messaging
logic and
provides status and messaging control.
Digital Signal Processor (DSP), operating at Development of a
Call-Processing Engine
1.8V, TDMA applications based on the IS-136 algorithm is easily
accomplished using ANSI-C
protocol can be accomplished with efficient with in-line assembly
language interrupt
battery power management to accomplish the handlers.
baseband functions of a cellular phone excluding
the front and backend analog blocks as illustrated Each core has a set
of Input/Output peripherals
in Figure 3. System partitioning is illustrated in for interfacing to
the analog and RF sections of
the phone. A key peripheral, the dedicated means for controlling the
dual-core processor
protocol timer, offloads the task of maintaining directly in the target
system.
handset to base station communication for both
cores. Once programmed by the M·CORE Software and hardware breakpoint
registers are
processor, the timer is capable of coordinating all provided along with
a First-In-First-Out program
radio operations, including activation of the counter trace buffer which
stores change of flow
receiver, transmitter and frequency synthesizers. addresses.
Single-stepping opcodes with a 16 bit
counter is available and the OnCETM registers are
The main goal of the protocol timer is to off-load accessible while each
core runs in real-time or is
compute intensive tasks such as event scheduling in reset. This
interface is very useful for
associated with the TDMA protocol. Software measuring static and dynamic
power
only needs to reprogram the timer once per consumption and also allows
analysis of code
frame. It is capable of generating timing signals, hot spots. Each core
when put in the debug
trigger signals and interrupts to the M·CORE mode of operation will shut
down clocks to the
processor and to the DSP. Sophisticated sets of respective core as well
as its peripherals. This
tables interact for control of receive and transmit allows distributed
power analysis by shutting
channel time intervals and number of frames per down one core and its
peripherals while the other
channel. Macro tables are utilized to reduce the core may remain running
in real-time. Specific
programming of events that have fixed hot spots in code of each
processor may be
relationship between each other. [4] analyzed with external power
measuring tools
that monitor current through the respective
The production version of the iDEN i1000TM core's power pins. It should
be noted that power
phone utilizes the DSP56652 ROM version in a pins for each of the
specific cores as well as their
0.31 micron, triple-layer metal static CMOS respective peripherals are
isolated so they may
process. This device consists of 8 Kbytes of be filtered and powered
properly.
ROM and 2 Kbytes of SRAM to support the
M·CORE processor. The chip measures 7.4 mm The M·CORE processor and DSP
OnCE
on a side or 55 sq. mm. The part is packaged in a interface is currently
supported by a Motorola
196 plastic ball grid array (PBGA) and was Universal Command Converter
(UCC), which
designed for 16.8 MHz performance at 1.8v. communicates with a Software
Development
This device when running out of internal SRAM Systems (SDS) source level
debugger. The SDS
consumes on average less than 9 ma at 1.8v, SingleStep debugger is
tightly integrated into the
which translates into less than 16.2 mW at 16.8 Motorola Tool Suite
through the UCC interface
MHz for the complete system. On average, this so that the dual-core
system can be easily
implementation of the M·CORE processor controlled using one common tool.
consumes 2.8 mA, which translates to a 0.30
mW/MHz rating. The part consumes less than C / C++ as well as assembly
language programs
60 microamps in STOP mode. compiled using a Diab Data M·CORE
architecture cross-compiler can be quickly
DSP56652 Development Tools evaluated in this environment. Motorola also
includes a DSP GNU C compiler, debugger,
simulator, linker, assembler and DSP56652
To accelerate system level integration and also evaluation board.
provide a means for production and field testing
of new product, a Motorola standard OnCETM
block is available on the M·CORE processor as
well as the DSP56600 processor. This block Conclusion
provides a dedicated emulation interface for
rapid evaluation of the system hardware and As the wireless
communications industry
software. Communication with the block is progresses forward at
lightning speed with new
conducted via a 5 wire IEEE 1149 JTAG product designs, the issue of high
performance
controller and provides direct access to each of with low-power
consumption will present new
the processors' instruction registers so that challenges to wireless
product designers. In
opcodes may be fed directly to each instruction order to design these
new products in a timely
pipeline bypassing external accesses to memory. manner a complete
solution is of utmost
This mechanism provides a true non-intrusive importance for rapid
delivery. Motorola's new
Nota: Es probable que en esta página web no aparezcan todos los elementos del presente documento. Para tenerlo completo y en su formato original recomendamos descargarlo desde el menú en la parte superior
iMarketing.es – Consultoría informática y de gestión, servicios tecnológicos y de outsourcing
Buscar recursos sobre
Master internacional desde España (Online)- Becas parciales
Una frase memorable
Acerca de GestioPolis: Qué es GestioPolis — Términos de uso y Política de privacidad — Mapa del sitio — Contácto — Aliados — Contratar publicidad
Derechos de Autor: Los contenidos están bajo la licencia Reconocimiento - No comercial - Compartir bajo la misma licencia 3.0 Unported de Creative Commons a menos que se indiquen derechos de autor específicos. Si desea citar o utilizar públicamente alguno de los contenidos le solicitamos ponerse en contacto con el respectivo autor.
Derechos Reservados sobre el concepto del sitio web GestioPolis.com © 2008 Carlos López