MODELING, SIMULATING AND PROTOTYPING CIRCUITS FOR M•CORE BASED SYSTEMS

Autor: iMarketing.es – Consultoría informática y de gestión, servicios tecnológicos y de outsourcing

Nueva economía, internet y tecnología

Suscríbete GRATIS al boletín y recibe:

10 ebooks con las lecciones empresariales más representativas de Jack Welch, Kenichi Ohmae, Michael Newman y otros exitosos líderes de primer nivel en el mundo de los negocios...

Al pulsar aceptas los términos de uso y la política de privacidad

02-2005

Texto

Descargar Original


Rapid development of embedded applications in short time frames has generated a
market for reusable micro-RISC architectures. These cores are being offered in two
forms, custom gate level macrocells and fully synthesizable RTL models. Custom gate
level macrocells are ideal for lower power and smaller die size but are not easily
retargetable to different fabrication technologies without a new cell library.
Synthesizable RTL models are ideal for rapid deployment to different fabrication sites
and provide a rapid prototyping path using FPGAs but have larger die size, consume
more power and may not work at the same speeds as custom macrocells. In both
cases there is a list of important items to the bill of materials which comprise a complete
product development cycle when designing with a reusable micro-RISC core.
Design teams who need a popular microcontroller architecture which fits their price,
power, performance and support requirements will be looking for a complete solution
to their product development plan. This includes access to libraries of special
peripherals circuits and device drivers, well defined bus interfaces, a well supported
suite of software and hardware co-simulation tools and rapid prototyping capability
using FPGAs.
The focus of this paper is to describe the M·CORE M210S fully synthesizable core
architecture, its design methodology and development tools which support rapid
prototyping for wireless, consumer and automotive applications. An architectural
overview of the M210S will include the programmers model, a hardware local bus
interface and a peripheral bus interface standard. Emphasis will be placed on tools
which assist in transitioning from a hardware description language (HDL) modeling
environment to a rapid prototyping platform using FPGAs.
M·CORE Architecture
Let's start with an overview of the M·CORE architecture and how to design with it.
Motorola's M·CORE architecture is designed specifically for sophisticated, yet low
power, applications. It's a fully static CMOS core that packs about 80,000 transistors in a
2.2-mm2 square of silicon. The architecture implements logic within portions of the core
execution and control blocks to minimze power and reduce EMI by gating clocks to
those blocks only when active. In addition to providing mechanisms to power down
the processor and system logic, there is focus on minimizing dynamic power
consumption when the system is active.
M·CORE is a streamlined execution engine that provides many of the same
performance enhancements as mainstream RISC architectures. It is implemented with a
1


fixed 16-bit instruction length and 32-bit internal data path which meets the
computational precision requirments of newer advanced products with the cost and
power advantages previously available only with 16-bit architectures. Thus, increased
code density accomplishes the goal of minimizing the overhead of memory system
energy consumption.
The Instruction Fetch, Instruction Decode/Register file Read, Execute, and Register
Writeback stages of the pipeline operate in an overlapped fashion, allowing single
clock instruction execution for most instructions.

The execution unit consists of a 32-bit Arithmetic/Logic Unit (ALU), a 32-bit Barrel shifter
(Shifter), Find-First-One unit (FF1), result feed-forward hardware, and miscellaneous sup-
port hardware for multiplication, division, and multiple register loads and stores as
illustrated in Figure 1. Arithmetic and Logical operations are executed in a single cycle
with the exception of multiply, which is implemented with a 2-bit per clock, overlapped-
scan, modified Booth algorithm with early-out capability to reduce execution time for
operations with small multiplier values, and the divide instructions, which also have data
dependent timing. A Find-First-One unit operates in a single clock cycle.
The Program Counter Unit has a PC incrementer and a dedicated Branch Address
Adder to minimize delays during change of flow operations. Branch target addresses
are calculated in parallel with branch instruction decode, with a single pipeline bubble
for taken branches and jumps, resulting in an execution time of two clocks. Conditional
branches which are not taken execute in a single clock.

Memory load and store operations are provided for byte, halfword, and word (32-bit)
data with automatic zero extension of byte and halfword load data These instructions
can execute in two clock cycles. Load and store multiple register instructions allow low
overhead context save and restore operations; these instructions can execute in (N+1)
clock cycles, where N is the numbers of registers to transfer.
A single Condition Code/Carry (C) bit is provided for condition testing and for use in
implementing arithmetic and logical operations greater than 32-bits. Typically, the C bit
is set only by explicit test/comparison operations, not as a side-effect of normal
instruction operation. Exceptions to this rule occur for specialized operations where it is
desirable to combine condition setting with actual computation.
M·CORE Programmer Model
The M210S programming model is defined separately for two privilege modes: supervisor
and user. Certain operations are not available in user mode. Programs access registers
based on the indicated mode. User programs can only access registers specific to the
user mode; system software executing in the supervisor mode can access all registers,
using the control registers to perform supervisory functions. User programs are thus
restricted from accessing privileged information, and the operating system performs
management and service tasks for the user programs by coordinating their activities.
The instruction set is tailored to support high-level languages and is optimized for those
instructions most commonly executed. A standard set of arithmetic and logical
instructions is provided as well as instruction support for bit operations, byte extraction,
data movement, control flow modification, and a small set of conditionally executed
instructions which can be useful in eliminating short conditional branches.
Most instructions execute in either mode, but some instructions that have important
system effects are privileged and can only execute in the supervisor mode. For instance,
user programs cannot execute the stop, doze, or wait instructions. To prevent a user
program from entering the supervisor mode except in a controlled manner, instructions
that can alter the S-bit in the program status register (PSR) are privileged. The trap #n
instructions provide controlled access to operating system services for user programs.
Access to special control registers is also precluded in user mode.
When the S-bit in the PSR is set, the processor executes instructions in the supervisor
mode. Bus cycles associated with an instruction indicate either supervisor or user access
depending on the mode.
The processor utilizes the user programming model when it is in normal user mode
processing. During exception processing, the processor changes from user to supervisor
mode. Exception processing saves the current value of the PSR in the EPSR or FPSR
shadow control register and then sets the S bit in the PSR, forcing the processor into the
supervisor mode. To return to the previous operating mode, a system routine may
execute the rte (Return from Exception) or rfi (Return from Fast Interrupt) instructions as
appropriate, causing the instruction pipeline to be flushed and refilled from the
appropriate address space

The Programmer's model registers depicted in Figure 2 provide operand storage and
control. The registers are partitioned into two levels of privilege: user and supervisor. The
user programming model consists of 16 general-purpose 32-bit registers, the 32-bit
Program Counter (PC) and the Condition/Carry (C) bit. The C bit is implemented as bit 0
of the PSR. This is the only portion of the PSR accessible by the user. The supervisor
programming model consists of an additional 16 general-purpose 32-bit registers (the
alternate file), as well as a set of status/control registers, and scratch registers. By
convention, register R15 serves as the Link register for subroutine calls, and register R0 is
typically used as the current Stack Pointer.

Access to the alternate file is made available via a control bit in the PSR, access to the
status/control/scratch registers is made available via the Move from Control
register(MFCR) and Move to Control register (MTCR) instructions. When the alternate file
is selected via the AF bit in the PSR, general purpose operands are accessed from it;
when the AF bit is cleared, operands are accessed from the normal file. This alternate
file is provided to allow very low overhead context switching capability for real-time
event handling
Data Organization in Registers and Memory
The operand data formats supported by the integer unit are standard twos-
complement data formats. The operand size for each instruction is either explicitly
encoded in the instruction (load/store instructions) or implicitly defined by the instruction
4


operation (index operations, byte extraction). Typically, instructions operate on all 32
bits of the source operand(s) and generate a 32-bit result.

Memory is viewed from a Big-Endian byte ordering perspective. The most significant
byte (byte 0) of word 0 is located at address 0. Bits are numbered within a word starting
with bit 31 as the most significant bit.
Operand Addressing Capabilities
The M210 accesses all memory operands through load and store instructions,
transferring data between the general purpose registers and memory. Register + 4-bit
scaled displacement addressing mode is used for the Load and Store instructions to
address byte, halfword, or word (32bit) data.
Load and Store Multiple instructions are defined which allow a subset of the 16 GPRs to
be transferred to/from a base address pointed to by Register R0 (The default stack
pointer by convention). Load and Store Register quadrant instructions utilize register
indirect addressing to transfer a register quadrant to/from memory. Execution of
load/store multiple instructions will create a burst of back to back writes of the registers
to memory.
Figure 3 illustrates the byte lane arrangement of data in registers and which memory
locations these bytes reside with respect to the data type.


The M·CORE Local Bus interface (MLB) consists of a full 32 bit address bus, 32 bit data
bus and transfer control signals for connecting to different speed memories and
peripherals easily as illustrated in Figure 4. Bus arbitration signals are made available for
multi-master hookups with address and data tri-state indicators with transfer request,
busy, abort and termination signals available for bus monitoring and control.
Emulation support signals complement an IEEE 1149.1 JTAG port for in-circuit control of
the target system with visibility to internal registers and memory. Global status and
control registers have a direct connection to two 32 bit parallel ports for debugging or
kernal process awareness. A key feature of the M210 architecture is its ability to service
interrupts. Interrupt handlers are easily accessable with vectored addresses or
autovectoring.

For power sensitive applications there are power management pins which may be used
to turn off circuits when the processor is in the stop, wait or doze modes. When exiting
these states the processor will assert the wakeup pin so that external circuits may be
enabled.
6


Bus loading is a concern for all processor interfaces including the MLB. Care should be
taken to evaluate the capacitive loading when designing interfaces for zero wait states.
The maximum capacitive loading for the MLB is 50pf therefore care should be taken
when designing memory and peripheral blocks.
Designing with the M210S Architecture
The M210S is the first of a family of synthesizable M·CORE architectures. The objective is
to provide a well defined set of deliverables to enable designers to rapidly develop
embedded applications. These deliverables include a full set of documentation, logic
design data bases, design for test patterns and scripts, models and functional
verification tools.
These tools are complemented by a well defined architectural design methodology for
creating the proper mix and match of memory and peripherals for a custom
microcontroller. Two versions of the M210 core development kit will be made available,
a unified design system hard macro cell version and a fully synthesizable version. Figure
5 illustrates a typical hookup to the M210S local bus. High speed zero wait state
accesses to resident memory is accomplished by direct connection to the MLB using the
appropriate address decoding and transfer control signals. Also any bus arbitration ,
power management and emulation control would be solved at the MLB interface.

The M·CORE architecture uses a standard bus bridge to connect its MLB to a set of
peripherals. This bus bridge is called the Intellectual Property (IP) Bus and is illustrated in
Figure 6. Based on unprecedented customer demand, Motorola Semiconductor
Products Sector is making its vast libraries of Intellectual Property available to select
market partners through Motorola's Technology Transfer Program (MTTP). This will allow
Motorola's partners to create silicon solutions for specific business opportunities while
allowing Motorola to maintain its market-focused strategy.
The goal of the IP Bus is to provide a unified peripheral interface to different Motorola
embedded cores such as the 68000, Power PC, Star12, and M·CORE. This allows
different levels of reuse of a portfolio of thousands of peripherals. The IP Bus allows easy
connection to Motorola and ARM AMBA peripherals.
The architecture is set up with a bridge which connects to these perihperals through a
gasket. An individual gasket is needed for each core. In summary, the design
methodology allows for the selection of a strategic synthesizable core connection to a
unified perhiperal bus. This capability enables a wide offering of off-the-shelf proven
technology for rapid system integration.
The introduction of the IP Bus standard also makes available peripheral device drivers
which have been debugged. This permits designers to rapidly model systems with
existing software so that hardware and software tradeoffs may be studied.

Once a system is defined at an architectural level, the hardware/software co-design
begins. Software design teams will need a simulation environment for writing drivers and
interfaces to kernals. Hardware designers will need software to facilitate subsystem


verification as well as functionality at the product level. The ideal situation is to provide
a full software simulation environment which permits source level debugging with
familiar tools and complete HDL simulation prior to prototyping or silicon spins. This
requires tools which permit multiple simulator engines to run simultaneously and pass
synchronization and data messages to each other.
Summit Design Inc. makes this powerful capability available through their Virtual-CPUTM
(V-CPU) product. This powerful system permits software programs running on a Software
Development Systems (SDS) SingleStepTM instruction simulator to execute programs and
communicate with a bus functional model in the verilog RTL simulation environment.
This permits developers to examine registers and memory easily during simulation runs.
At the same time, developers are able to bring up waveform windows to examine bus
interaction with peripheral circuits. Validation of circuits becomes easier since software
programs can precondition specific test cases easier than generating unique test
benches. Figure 7 illustrates a dual simulator environment where two debuggers
interact with different graphical user interfaces to provide visibility to both the software
functions in C and/or assembly language as well as the hardware components written
in verilog.

This tool provides an excellant functional verification or behavioral analysis platform.
The software simulator performs 1-2 million instructions per second when executing out
of designated internal memory. Once hardware/software tradeoffs have been
performed the next logical step is to produce a hardware prototype.
A key goal in the design phase will be to try to reuse as much of the verification suites
created during RTL definition as illustrated in Figure 8. This requires a verification plan
which reuses the verilog test bench at gate level and post place and route simulations if
a prototype is being synthesized into an FPGA.
A series of pin monitors for MLB activity as well as peripheral pin activity are being
offered as part of the M210S development environment. These monitors reside in a
verilog test bench and generate output files of pin activity as well as software history.
They are ideal for evaluating program behavior and may be used to generate



regression reference points at the RTL level. These reference files may then be
compared to simulation output at the gate level as well as the post place and route
The use of FPGA's for the implementation of Intellectual Property has exploded. With
synthesis based design replacing schematic based design, there are many benefits of
using FPGAs to bring soft designs to a hard reality. Schematic based designs target a
limited number of hardware architectures whereas RTL based designs are able to
target the majority of digital hardware architectures. Intellectual property providers
such as Motorola have turned to FPGA technology to realize many of these benefits.
Synthesizable cores can be easily retargeted from custom standard cell designs to
FPGA's. The M·CORE M210S has been ported to a 300K gate Xilinx VertixTM device.
During the port it was realized that the register file was better fit into RAM based cells on
the Vertix device to achieve higher gate count optimization.

Once the M·CORE M210S has been ported into a popular FPGA device, advanced
and high speed prototyping can begin. These prototypes can be thoroughly
evaluated and increase the quality of the design yet reduce time to market. An
additional benefit of the hard implementation allows these prototyping systems to be
used to develop real time embedded software applications. This is a big advantage
over simulation based software design in that software development is exponentially
acellerated. Software design and debug can be performed simultaneous with the
design of the final target hardware application. There are several examples of these
prototypes of which two will be described.
Motorola's M210S Evaluation System
The M·CORE M210S evaluation system is a rapid prototyping system designed around
FPGAs and intended for use in intellectual property creation and debug. This
evaluation system consists of three boards, a microcontroller/memory board, an
peripheral prototyping board and a special circuit interface board. They communicate
via a well defined board interface bus, thus allowing a complete system to be
developed.
Microcontroller and Memory Platform
The main board which emulates the M210S core is called the microController plus
Memory Board (CMB). The M210S core, MMC2102DV, is synthesized into a Xilinx Vertix
device on the CMB. The Xilinx Vertix device is then interfaced to an Altera 10K CPLD.
Since these FPGAs are RAM based, the CMB includes configuration Flash devices for
each Xilinx and Altera device. These Flash devices are used to store the gate files which
load the FPGA upon power up. The board also includes a DUART for serial
communication between the core and an external host or target. The external Flash
and RAM on the board are used for application program storage and variables
respectively.
The Altera CPLD is used to implement other modules such as synthesized peripherals. Its
primary use is the implementation of an external integration module. The M·CORE
architecture integration module (MIM) is synthesized into the Altera CPLD or the CPLD
can act as a simple pass through enabling the M·CORE architecture MLB to be seen by
the second or third components of the system, i.e. the MMCFPGA01. The MIM acts as
the interface to external logic. It provides the address, data, control, chip selects, and
DRAM control needed to interface to external devices. This integration module also
includes the peripheral bus interface. The peripheral bus interface implemented in the
MIM is the IP Bus.
Peripheral Prototyping Platform
The second component of the M210S evaluation system is an FPGA board or
MMCFPGA01. This board is used for additional peripheral emulation. Two FPGA boards
may be power stacked to increase gate count available for multiple peripheral designs.
Peripheral synthesis technology is slightly ahead of core synthesis. This is due to the
previous gate size limits of FPGA's relative to the current gate count of micro-RISC based
cores such as the M·CORE architecture. The user may choose from Motorola

peripherals, third party intellectual property providers, or design and integrate their own.
Some examples of peripherals include Ethernet, Firewire, USB, IRDA, and synchronous
DRAM control, as well as simple timers and serial modules. The peripheral examples
listed are supported by the IP Bus standard.
Circuit Prototyping Platform
The third board known as the Platform Board or PFB is used for prototyping special
circuits, logic analyer interfaces, additional memory and LCD displays. The M210S
evaluation system may operate with or without the platform board. The platform board
is popular with RTOS vendors who need large RAM blocks to port and develop
applications on new architectures. It also includes a large amount of prototype area
for through hole and surface mount packages to customize each system. The LCD
display allows demos, status, and diagnostic output for verification teams.
The benefit of synthesis of peripherals has been around for awhile. The M·CORE
processor FPGA market development kit was produced about a year ago for use in
integrating synthesized peripherals with the M·CORE architecture. The FPGA market
development kit consists of the MMCFPGA01 board interfacing to an M·CORE processor
board, the MMC2001CMB. The FPGA market development kit ships with the complete
Altera tool chain including synthesis, place and route, as well as system software for
programming external and FPGA flash.
Motorola's low cost talk about radios were first prototyped on this FPGA kit in that
timeframe. The FPGA market development kit was based on the same Altera 10K
device found implementing peripherals on the M·CORE M210S evaluation system. The
MMCFPGA01 board was re-used as a building block of the M·CORE M210S evaluation
system. The same peripherals implemented in the FPGA market development kit can be
reused in the new system with the synthesized core.
System Level Interface Connector
These systems are connected together through a Motorola development tools
interconnect standard known as Modular All Purpose Interface or MAPI as illustrated in
Figure 9. The MAPI standard can provide interconnect of the M·CORE architecture
local bus or MLB or any number of integration modules currently available. These
synthesizable integration modules consists of the external bus interface unit, interface to
busrt RAM and Flash, and controllers for external Synchronous Dynamic Random Access
Memory.
In addition the MAPI standard allows stacking of FPGA market development boards
increasing the amount of peripherals allowed. The M210 is debugged through a
OnCETM port. An Enhanced Background Debug Interface (EBDI) is used to connect the
target system to the host. This debug tool provides a JTAG controller implemented in an
FPGA memory mapped to a CPU32 based microcontroller. The microcontroller receives
debugger commands from the host and translates them into messages the JTAG
controller passes to the OnCE debug port.
Increased emphasis on better debugging tools is being addressed by providing new
features to on-chip debug blocks. A new real-time debug block is being added to the
next generation M·CORE architecture. This debug block, code named Nexus, is based
on a proposed industry standard developed by a consortium of over 20 companies.
12


The implementation of NEXUS and increased functionality of JTAG based debug ports
will assist design engineers in verifying and debugging their embedded applications.

FPGA prototyping methodology is being used to chec out the debug port
enhancements in hard implementations before actual standard cell silicon is
manufactured. This saves time and money required to do silicon re-spins. This same
methodology is applied to the debug of core interfaces to new peripherals.

The second system making use of FPGA's for the realization of Intellectual Property is the
Proteus system from AppNet, Inc..[4] The Proteus system is a rapid prototyping system
built around FPGAs which easily support the M·CORE architecture. The Proteus system
allows complex systems to be implemented by providing as many as twenty sites for
FPGA's, CPU, analog, custom, or other hard implementations.
The Proteus system has prototyped a pre-silicon version of a new M·CORE
microcontroller, code named SIKA, which uses the M210S, on-chip memory, an IP Bus
gasket and a set of IP Bus compliant peripherals. In this implementation the M210S core
is instantiated into one an FPGA site with a Xilinx Vertix device. Other sites contain
peripherals or sections of periherals that make up the SIKA device These sections of
peripherals include discrete analog sections such as phase lock loop or analog to

digital converters. With systems such as these, developers can write and debug code
executing instructions in the 10 to 20 Mhz range. This again is an order of magnitude
greater than software development on simulators and it allows actual connection into
the final target application. There is also a lower cost four site replica Proteus system
used by customers who only need to add one or two simple peripherals. The Proteus
replica system emulates a SIKA device with a one site expansion for simple peripheral(s)
The Proteus system is modular and composed of several boards. The software included
allows for quick compilation, partitioning, and debug of RTL based FPGA designs. The
Logic Board (LB) holds the raw gates used in the design of intellectual property. The
Mezzanine Board (MB) connects the LB gates to implement the design. The Clock
Mezzanine board (CM) implements the clocking scheme of the design. The CM
supports a wide range of clocking strategies including internally and externally
generated , and gated clocks. The CM also provides programmable delay buffers to
support clock skew adjustment.

The primary I/O Buffer Board (BB) interfaces the Proteus system to a target system. The
Incremental Mezzanine board (MM) supports quick ECO's to the system. Simple and
complex logic changes are supported through the MM. Since the system is FPGA
based it can be re-used for future designs. An example of the four site system emulating
SIKA is shown in Figure 10.
The software runs on a laptop or a host connected to a network. The network
connection allows for separate software developer and lab workstations. The
intellectual property modules are shown plugged into sites. The memory modules
include Flash, SRAM, or DRAM. The embedded processor ICE shown in the diagram is

Rapid Prototyping Limitations
Some of the disadvantages of these rapid prototyping systems are listed here.
Performance is the major limitation. With micro risc cores running 100+ Mhz, the FPGA's
cannot keep up. The M·CORE M210S has a performance limitation around 25Mhz
when synthesized into a FPGA. Metal options of the FPGA cores can easily double the
performance, but that reduces the flexibility of the system. The increased power
consumption is the second disadvantage. The intelligent clock gating schemes used to
decrease run current in the standard cell designs cannot be fully implemented in the
FPGA. The relative cost of the FPGA proves to be a big disadvantage with the Xilinx
V1000 costing over $3000. Operating voltages in the one volt range needed for wireless
applications cannot be emulated with this system configuration.
Conclusion
Te exponential rate of new product introductions has made it necessary to reuse
designs whenever possible. This has lead to the introduction of reusable licensed
microcontroller cores. The licensing of cores has made it a necessity to produce
synthesizeable cores. This new methodology permits delaying the choice of target
fabrication technology till the end of the design cycle. Synthesizeable cores allow this
type of methodology by allowing the design process to target different fabrication
houses by switching out libraries. The proliferation of synthesizeable cores generates the
need for tools to verify the integration of the core and peripherals. The increase in gate
count in the FPGA's, along with tools to support partitioning of the modules, make
systems described above ideal in enabling quick to market integration of silicon systems.
Motorola recognizes the tremendous growth rate of consumer, automotive and wireless
markets and is making its advanced M·CORE micro-RISC architecture available through
a core licensing program which addresses the complete product life cycle including
software, peripheral libraries, design methodology and fabrication. Full details may be
obtained at the Motorola M·CORE website www.mot.com/sps/mcore.
References
[1] Architectural Brief, M·CORE microRISC Engine, M·CORE 1/D, Motorola Inc., 1999
[2] Summit-Design Product Info, website,
http://www.summit-
design.com/products/verification/vcpu_whitepaper_mcore.html
[3] Motorola M·CORE website, http://www.mot.com/sps/mcore
[4] AppNet, Inc. website, http://www.appnetinc.com
M·CORE and OnCE are trademarks of Motorola. All other trademarks are proprietary to
their respective owners.

Nota: Es probable que en esta página web no aparezcan todos los elementos del presente documento.  Para tenerlo completo y en su formato original recomendamos descargarlo desde el menú en la parte superior

iMarketing.es – Consultoría informática y de gestión, servicios tecnológicos y de outsourcing

http://www.imarketing.es/articulos 

Buscar recursos sobre

Los más nuevos

Una frase memorable

Más lecturas interesantes

Acerca de GestioPolis: Qué es GestioPolisTérminos de uso y Política de privacidadMapa del sitioContáctoAliadosContratar publicidad

Derechos de Autor: Los contenidos están bajo la licencia Reconocimiento - No comercial - Compartir bajo la misma licencia 3.0 Unported de Creative Commons a menos que se indiquen derechos de autor específicos.  Si desea citar o utilizar públicamente alguno de los contenidos le solicitamos ponerse en contacto con el respectivo autor.

Derechos Reservados sobre el concepto del sitio web GestioPolis.com © 2008 Carlos López