Instant Download

Download your project material immediately after online payment.

Project File Details


3,000.00

100% Money Back Guarantee

File Type: MS Word (DOC) & PDF
File Size: 1,323KB
Number of Pages:66

ABSTRACT

With the increase in the possibility of incorporating multiple cores on a single chip
(MCSoC), the issue of an efficient interconnection that is scalable, takes up small area and has low
power consumption must be taken into consideration carefully. Network on chip (NoC) has evolved
as a promising solution for efficiently interconnecting multiple core on a single chip (MCSoC).
NoC brings conventional networking theories and methods to chip communication and brings
notable improvements over the conventional bus systems.
The aim of my research will be to study power consumption in NoC architecture and
propose an effective dynamic remapping algorithm to reduce power consumption on NoC. This is
done by monitoring the NoC at run time and dynamically re-mapping the cores to reduce power
consumption. Low power consumption is desirable in MCSoC because high power increases
capacitance, electro magnetic interference (EMI) and dissipates more heat, thereby reducing
performance. In addition, must devices built using MCSoC are hand-held devices (battery powered)
and therefore might not have access to continuous power supply.
I will be using the OASIS NoC which was developed at the Adaptive systems laboratory,
The University of Aizu, Graduate School of Computer Science and Engineering. Aizu, Japan To test
my Algorithm. OASIS NoC is a complexity effective on-chip interconnection network.

TABLE OF CONTENTS

Chapter I
1.0 Introduction……………………………………………………………………………….. 1
1.1 System on chip (SoC)…………………………………………………………………. 1
1.2 Multiprocessor System on chip (MPSoC)…………………………………….. 2
1.3 Communication on MPSoC…………………………………………………………. 4
1.4 Drawbacks of the conventional bus system……………………………………. 5
1.5 Network-on-chip (NoC)………………………………………………………………. 7
1.6 Properties of NoC………………………………………………………………………. 9
1.6.1. Network Topology………………………………………………………. 10
1.6.2. Switching Scheme………………………………………………………. 11
1.6.3. Flow Control………………………………………………………………. 13
1.6.4. Packet Format…………………………………………………………….. 14
1.6.5. Queuing Scheme…………………………………………………………. 14
1.6.6. Routing Scheme………………………………………………………….. 15
1.7 NoC Design Flow……………………………………………………………………….. 16
1.8 OASIS Network on Chip (NoC)……………………………………………………. 17
Chapter II
2.0 Objective of the Research………………………………………………………………. 20
2.2 Energy model of NoC……………………………………………………………………. 20
2.3 Causes and Types of power consumption on NoCs…………………………… 23
2.3.1. The router or Switch fabric……………………………………………. 23
2.3.2. Buffers on Router…………………………………………………………. 24
2.3.3. Link Length…………………………………………………………………. 24
2.3.4. Size of flits…………………………………………………………………… 24
2.3.5. Routing Protocol……………………………………………………………. 25
2.4 Desirability of Low Power…………………………………………………………………. 25
2.5 Advantages of the Adaptive NoC over the Application-specific NoC………. 26
2.6 Power Management Policies and Methods………………………………………… 27
2.6.1. Reconfigurable buffers/routers………………………………………….. 27
2.6.2. Multi-Mode Switch…………………………………………………………. 27
2.6.3. Dynamic Voltage scaling Closed-loop control concept ………… 27
2.6.4. Voltage island shut-down………………………………………………….. 28
2.6.5. Reconfigurable Architecture………………………………………….….. 28
2.6.6. Reconfigurable Topology………………………………………….…….. 28
2.7 Why Dynamic Re-mapping?………………………………………….………………… 28
2.8 NoC Monitoring…………………………………………..………………………………… 29
2.8.1 Probe Architecture……………………………………………………………………… 29
Chapter III
3.0 Application Mapping………………………………………….……………………….. 32
3.1 Static Mapping………………………………………….……………………………….. 34
3.2 Dynamic Mapping………………………………………….…………………………… 36
3.3 Dynamic Re-Mapping Algorithm………………………………………….………. 36
3.3.1. The Algorithm………………………………………….…………………… 38
3.3.2. Algorithm Discussion………………………………………….…………. 40
Chapter IV
4.0 Problem Statement………………………………………………………………………… 41
4.1 Simulation Set up and Algorithm Implementation (Case Study)…………… 41
4.2 Validation of Result and Performance Evaluation and Conclusion………… 42
4.2.1. The Lookup Table…………………………………………………………… 43
4.2.2. Selecting The Root……………………………………………………………. 44
4.2.3. Selecting Neighbours of the Root……………………………………….. 45
4.2.4. Selecting Neighbours of the Neighbours………………………………. 46
4.3 Conclusion and Future Work …………………………………………………………….. 50
Appendix.
References.

CHAPTER ONE

1.0 Introduction
For the next decade, Moore’s Law is still going to bring higher transistor densities allowing
billions of transistors to be integrated on a single chip. However, it became obvious that exploiting
significant amounts of instruction-level parallelism with deeper pipelines and more aggressive
wide-issue super-scalar techniques, and using most of the transistor budget for large on-chip caches
has come to a dead end. Scaling performance with higher clock frequencies, especially, is getting
more and more difficult because of heat dissipation problems and energy consumption that is too
high. The latter is not only a technical problem for mobile systems, but is also becoming a severe
problem for computing centres because high energy consumption leads to significant cost factors in
the budget. Improving performance can only be achieved by exploiting parallelism on all system
levels [1].
Multicore architectures offer a better performance/Watt ratio than single-core architectures
with similar performance. Combining multi-core and co-processor technology promise extreme
computing power for high CPU-time-consuming applications. [1]
1.1 System on chip (SoC):
System-on-a-chip or system on chip (SoC or SOC) refers to integrating all components of a
computer or other electronic system into a single integrated circuit (chip). It may contain digital,
analogue, mixed-signal, and often radio-frequency functions – all on a single chip substrate. A
typical application is in the area of embedded systems.
1
A typical SoC consists of:
· One micro-controller, microprocessor or DSP core(s). Some SoCs – called multiprocessor
System-on-Chip (MPSoC) – include more than one processor core.
· Memory blocks including a selection of ROM, RAM, EEPROM and flash.
· Timing sources including oscillators and phase-locked loops.
· Peripherals including counter-timers, real-time timers and power-on reset generators.
· External interfaces including industry standards such as USB, FireWire, Ethernet, USART,
SPI.
· Analogue interfaces including ADCs and DACs.
· Voltage regulators and power management circuits.
These blocks are connected by either a proprietary or industry-standard bus such as the
AMBA bus from ARM. DMA controllers route data directly between external interfaces and
memory, by-passing the processor core and thereby increasing the data throughput of the SoC [2].
1.2 Multiprocessor System on chip (MPSoC):
In computing, a processor is the unit that reads and executes program instructions, which are
fixed-length (typically 32 or 64 bit) or variable-length chunks of data. The data in the instruction
tells the processor what to do. The instructions are very basic things like reading data from memory
or sending data to the user display, but they are processed so rapidly that we experience the results
as the smooth operation of a program.
2
Processors were originally developed with only one core. The core is the part of the
processor that actually performs the reading and executing of the instruction. Single-core processors
can only process one instruction at a time. (To improve efficiency, processors commonly utilize
pipelines internally, which allow several instructions to be processed together, however instructions
are still consumed into the pipeline one at a time.)
A multi-core processor is composed of two or more independent cores. One can describe it
as an integrated circuit which has two or more individual processors (called cores in this sense).
Manufacturers typically integrate the cores onto a single integrated circuit die (known as a chip
multiprocessor or CMP), or onto multiple dies in a single chip package
Multi-core processors are widely used across many application domains including generalpurpose
computing, embedded systems, networks , digital signal processing (DSP), and graphics
[2].
1.3 Communication on MPSoC:
Traditionally, a bus based system is used for communication on MPSoC. The bus serves as
an on-chip interconnect for all communicating cores on a single chip. The bus-based architectures
have evolved over the last couple of years from single shared buses to multiple bridged buses
(hierarchical buses), and crossbars.
3
Figure 1: Evolution of on-chip communication.
Common network topologies to interconnect cores include bus, ring, 2-dimensional mesh,
and crossbar topologies. Cores may or may not share caches, and they may implement message
passing or shared memory inter-core communication methods.
The hierarchical buses, allow multiple buses to operate in parallel, however, all the
communicating partners need to be connected to a single crossbar. These bus-based architectures
suffer from scalability problems for a large number of connected processing elements[4].Figure
below shows conventional bus-based communication approach.
4
Figure 2. Conventional Bus System.
Buses have been deployed for communication since the beginning of circuit design and
made their way in SoC due to their well understood concepts, their compatibility with most of the
available node processors, the area taken on the chip and the zero latency after the arbiter has
granted control [13].
1.4 Drawbacks of the conventional bus system.
1. Poor scalability: On the bus architecture, resources are shared between communicating
cores, this limits the number of cores that can be integrated on a single bus.
2. Communication delay: On the bus architecture, only one core can send messages at a time,
this constraint leads to communication delay because a core has to wait for a while before
sending messages out. This leads to a lack of parallelism and communication concurrency.
5
3. Ad-hoc Global wire Engineering: There exists many different ways in which cores could be
connected in a bus architecture, this leads to ad-hoc and non standardized global wire
engineering on the chip.
4. Bandwidth is limited and shared and therefore the link speed goes down with increase in
number of connected cores.
5. The bus system usually have a single bus arbiter that regulates access and activities on the
bus. This central arbitration creates a single point of failure on the bus architecture.
6. Computation and communication are coupled on the Bus architecture.
7. On the bus architecture, no abstraction is done in terms of the layers of communication.
Therefore any change affects the whole architecture.
8. Clock skew is an inherent synchronization problem of long links, as using the same clock
for all components throughout the whole system will not be achievable with the growing
future complexity [4]
As technology scales, MPSoC applications exhibit huge communication demands, scalable
communication architectures are needed for efficient implementation of future complex Multicore
SoCs. Due to the lack of scalability, both in terms of power and performance, traditional bus-based
communication architectures fail to satisfy the tight requirements of future applications [1].
A new communication paradigm is therefore needed to solve the aforementioned problems
of the bus architecture. This communication architecture should aim to provide high throughput,
scalability, low power, reduced packet loss, utilize link efficiently, reduce contention and occupy
less area on silicon [13].
6
1.5 Network-on-chip (NoC):
Network on chip (NoC) has evolved as a promising solution for efficiently interconnecting
multiple cores on a single chip (MCSoC). NoC brings conventional networking theories and
methods to on-chip communication and brings notable improvements over the conventional bus
systems. Figure below shows a NoC architecture.
Figure 3: NoC Architecture.
The basic idea behind NoC is that processors are connected via a packet switched
communication network on a single chip similar to the way computers are connected to internet. A
chip based on NoC interconnection consists of several network clients (e.g. processors, memories,
and custom logic) which are connected to a network that routes packets between them [2].
7
PE 1
0,0
PE 5
2,1
DSP2
2,2
PE 3
1,2
MEM 1
0,2
MEM 2
1,1
DSP 1
1,0
PE 4
2,0
PE 2
0,1
S S S
S S S
S S S
S
MEM PE
DSP
Processing
Element Switch Memory Digital Signal
Processor
The NoC Architecture solves the problems associated with bus architecture. It is scalable,
parallelism and concurrency are easily supported because multiple cores can send and receive
packets at the same time (concurrent communication). NoC has distributed arbitration and the
aggregate bandwidth grows. The link speed in NoC is not affected by the number of connected
cores. NoC has separate layers of abstraction, Communication and Computation are separated, the
former is achieved by the NoC while the cores are responsible for the later. On the NoC
architecture, messages are transmitted via packets.
Packet switching supports asynchronous transfer of information. It provides extremely high
bandwidth by distributing the propagation delay across multiple switches, thus pipelining the signal
transmission. In addition, the NoC offers several promising features. First, it transmits packets
instead of words. Dedicated address line like in bus systems are not necessary since the destination
address of a packet is part of the packet.
Secondly, transmission can be conducted in parallel if the network provides more than one
transmission channel between a sender and a receiver. Thus, unlike bus-based system on chip, NoC
presents theoretical infinite scalability, facile IP core reusing, and higher parallelism [2].
In [15] the author summarises the various parts of a NoC Architecture. He described the
various parts of a Heterogeneous NoC. In each tile of a heterogeneous on-chip network, the PE
(Processing element) can flexibly be any processing unit like CPU/DSP/ARM/FPGA/ASIC core,
embedded memory, application-specific component or periphery, according to the target
applications. A router (or switch) in the tile is typically designed with a crossbar switch and arbiter
unit, and five input/output channels (each of them is made up of a controller and an input buffer)
which connects to the local PE and the extended routers in the four directions.
A processing element communicates with others by exchanging data with its local router
(switch), and passing data packages throughout the network links to approach the target node. A
crossbar switch in the router is used to route data packets from its input buffers to the appropriate
output links, and a crossbar arbiter is used to determine the transmission priorities when traffic
conflict happens inside the router. Thus packets traverse multiple links and go through routers in the
NoC fabric from the source to the destination [16].
8
Figure 4: Mesh Network with a router(switch).
1.6 Properties of NoC:
The Diagram below depicts the various properties of NoC architecture:
Figure 5: Properties of a NoC Architecture.
9
1.6.1 Network Topology:
The topology of a typical NoC system simply defines how the nodes are interconnected by
links. Topologies can vary depending on system’s modules sizes and placements (functional
requirements).
There are several well known standard topologies on which an application can be mapped.
We broadly classify them as direct and indirect topologies. In direct topology, each core is
connected to a single switch some examples of direct topology are mesh, torus, ring, butterfly,
octagon. For indirect topology, a set of cores are connected to a single switch. Fat tree and folded
torus are indirect topologies proposed [3].
Figure 6: Different Network topologies.
10
1.6.2 Switching Scheme:
Simply put, switching is how a message (packet) traverses the route. There is a large
protocol space to select from for NoCs. Circuit switching, packet switching, and wormhole
switching are possible choices for NoC protocols. These schemes can be mainly distinguished by
their flow control methodologies. When these switching techniques are implemented in on-chip
networks, they will have different performance along with different requirements on hardware
resources [3].
In circuit switching, a physical path from the source to the destination is reserved prior to the
transmission of data. Once a transmission starts, the transmission is not corrupted by other
transmission since packets are not stored in buffer as in packet switching (discussed later). The
advantage of circuit switching approach is that the network bandwidth is statically reserved for the
whole duration of the data.
Moreover, because circuit switching does not need packet buffers, area and power
consumption can be reduced. The overhead of this approach is that the setting up of an end-to end
path causes unnecessary delay. In summary, circuit switching can provide high performance but
little flexibility [3].
Another alternative to circuit switching is packet switching scheme. Packet based
communication has been brought to NoCs from the Internet world but loses the original advantage
of reliability in the absence of dynamic routing. Currently, most of the proposals for routing in
NoCs are based upon static routing mechanisms — XY-coordinate discipline [3].
In packet based communication, data is divided into fixed length packets and whenever the
source has a packet to be sent, it transmits the data. Every packet is composed of a control part, the
header, and a data part (also named payload). Network switches inspect the headers of incoming
packets to switch the packet to the appropriate output port. In this scheme, the need for sorting
entire packets in a switch makes the buffer requirement very high [3].
11
The last technique is the so called wormhole packet-switching, where each packet is further
divided into flits (flow control unit) and the input and output buffers are expected to store only a
few flits. Therefore, the buffer space requirement in the switches can be small and compact
compared to the packet switching scheme. The header flit reserves the routing channel of each
switch, the biddy flits will then follow the reserved channel, and the tail flit will later release the
channel reservation [3].
The advantage of the wormhole routing is that it does not require the complete packet to be
stored in the switch’s buffer while waiting for the header flit to route to the next stages. Thus, it
requires much less buffer spaces. One packet may occupy several intermediate switches at the same
time. Therefore, because of these advantages, wormhole routing is an ideal candidate switching
technique for on-chip multiprocessor interconnects networks [3].
The disadvantage is that by allowing a message to occupy the buffers and channels,
wormhole routing increases the possibility of deadlock. In addition, channel utilization is somehow
decreased if a flit from a given packet is blocked in a buffer [3].
Figure 7: Switching Techniques
12
Switching
Techniques
Cut through
Switching
Store and forward
Switching
Wormhole
Switching
Packet
Switching
Circuit
Switching
1.6.3 Flow control:
Flow control schedules the traversal of messages (packets) over time. This is used only for
dynamic routing. Flow control determines how resources, such as buffers and channels bandwidth
are allocated and how packet collisions are resolved. Whenever the packet is buffered, blocked in
place, dropped, or misrouted depends on the flow control strategy. A good flow control strategy
should avoid channel congestion while reducing the latency. Examples of flow control mechanisms
include T-Error, stall and go and ACK/NACK. [3]
Figure 8: Flow control Mechanisms.
13
Flow Control
Credit based Handshaking Stall and Go
Buffered Flow
Control
Buffer-less Flow
control
ACK/NACK T-Error
1.6.4 Packet Format:
Packet Size Selection: The packet size highly depends on the characteristics of the
application being used (i.e. data or control dominated). If a message has to be split in too many
small packets which have to be re-assembled at destination to obtain the original message, the
resulted overhead will be too high. On the other case, if the packet is too large, the packet might
block the link for too many cycles and potentially block other traffic with side effects on the
performance of the whole system. Therefore, the correct packet size is also crucial to make
optimum use of the network resources. [3]
1.6.5 Queuing scheme (Virtual Channels):
In [18], the authors describe the design of virtual channels in NoC. A virtual channel splits a
single channel into two channels, virtually providing two paths for the packets to be routed. There
can be two to eight virtual channels. The use of VCs reduces the network latency at the expense of
area, power consumption, and production cost of the NOC implementation. However, there are
various other added advantages offered by VCs [18].
Network deadlock/livelock: Since VCs provide more than one output path per channel there
is a lesser probability that the network will suffer from a deadlock; the network livelock probability
is eliminated (these deadlock and livelock are different from the architectural deadlock and livelock,
which are due to violations in inter-process communications) [18].
Performance improvement: A packet/flit waiting to be transmitted from an input/output port
of a router/switch will have to wait if that port of the router/switch is busy. However, VCs can
provide another virtual path for the packets to be transmitted through that route, thereby improving
the performance of the network[18].
Supporting guaranteed traffic: A VC may be reserved for the higher priority traffic, thereby
guaranteeing the low latency for high priority data flits [18].
14
Reduced wire cost: In today’s technology the wire costs are almost the same as that of the
gates. It is likely that in the future the cost of wires will dominate. Thus, it is important to use the
wires effectively, to reduce the cost of a system. A virtual channel provides an alternative path for
data traffic, thus it uses the wires more effectively for data transmission. Therefore, we can reduce
the wire width on a system (number of parallel wires for data transmission) [18].
1.6.6 Routing scheme:
Routing is the process of selecting paths in computer networking along which to send data
or physical traffic. Routing algorithms are responsible for correctly and efficiently routing packets
or circuits from the source to destination [16].
In [15], the author describes two types of routing techniques: static (deterministic)
routing and dynamic (adaptive) routing. Deterministic routing means routing paths are completely
determined offline, while adaptive routing is that paths are online determined depending on
dynamic network conditions. Deterministic routing has design simplicity and low latency under
loose network traffic, but performs throughput degradation when network congestion happens.
Adaptive routing uses alternative paths when network is congested, which provides higher
throughput, while it will experience higher latency if network congestion is low.
In NoCs, the routing scheme usually selects candidates among the routing paths that have
minimum distance between the source and destination nodes. There are many routing algorithms
available e.g. XY routing and odd-even routing [17]. They are both theoretically guaranteed to be
free of deadlock and livelock.
15
The XY routing strategy can be applied to regular two-dimensional mesh topologies without
obstacles. The position of the mesh nodes and their nested network components is described by
coordinates, the x-coordinate for the horizontal and the y-coordinate for the vertical position. A
packet is routed to the correct horizontal position first and then in vertical direction. XY routing
produces minimal paths without redundancy, assuming that the network description of a mesh node
does not define redundancy.
The odd-even turn model is a shortest path routing algorithm that restricts the locations
where some types of turns can take place such that the algorithm remains deadlock-free. More
precisely, the odd-even routing prohibits the east → north and east → south turns at any tiles
located in an even column. It also prohibits the north → west and south → west turns at any tiles
located in an odd column.
1.7 NoC Design Flow:
A typical heterogeneous NoC design flow is described in Figure 9. First of all, types of
needed PEs are selected according to the natures of target applications (PE selection). Tasks are
then mapped to the PEs (task allocation), and topology synthesis is applied. The mapping
procedure decides the position of the selected PEs in the topology (tile mapping). Then, path
selection for communications between the application tasks is performed (routing path allocation),
and the scheduling policy or result of each PE is decided (task scheduling). Some optimizations are
performed throughout the entire flow, like channel width selection and buffer sizing.
The figure below shows the NoC design flow. Each step is guided by protocols and
algorithms. A survey of various NoCs was made in [18]. Various techniques and Algorithms that
have been employed in NoC synthesis are discussed there.
16
Figure 9: NoC Design Flow.
1.8 OASIS NoC:
The OASIS NoC was developed at the Adaptive systems laboratory, The University of Aizu,
Graduate School of Computer Science and Engineering. Aizu, Japan.
OASIS NoC is a complexity effective on-chip interconnection network. it uses a 4×4 mesh
network and adopts wormhole switching. The size of one flit is 76 bit, and it has information of
destination address, next port direction information and payload. Only input ports have buffers in
OASIS. The maximum number of ports is five is the OASIS NoC (NORTH, SOUTH,EAST, WEST
& SELF) Each input port has FIFO which is 76 bits
Fig. 10: OASIS flit structure. DATA(64bit) is payload, TAIL(1bit) means last flits of a packet.
17
PE Selection
Task Allocation
Task Scheduling
Tile Mapping
Routing Path Allocation
NEXTPORT(5bit) is direction of output, XDEST(3bit) is destination x-address,
YDEST(3bit) is destination y-address. The next port direction in next address is decided in input
port. By comparing the destination address with the next address. The algorithm is as follows:
◦ IF Y destination address > Y next address
then next port == NORTH
ELSE next port == SOUTH
◦ IF X destination address > X next address
then next port == EAST
ELSE next port == WEST
◦ IF destination address == next address
then next address == SELF
Fig 11: OASIS Switch.
18
OASIS switch has an Allocator and an Arbiter. The switch allocator decides the direction of
flit in input port and when the flit should be sent to output. The Arbiter is used to resolve deadlock.
It determines high priority flit in input port, OASIS arbiter schedules in round-robin scheme. The
flow control mechanism in OASIS is called the STALL-GO.
.

GET THE FULL WORK

DISCLAIMER: All project works, files and documents posted on this website, projects.ng are the property/copyright of their respective owners. They are for research reference/guidance purposes only and the works are crowd-sourced. Please don’t submit someone’s work as your own to avoid plagiarism and its consequences. Most of the project works are provided by the schools' libraries to help in guiding students on their research. Use it as a guidance purpose only and not copy the work word for word (verbatim). If you see your work posted here, and you want it to be removed/credited, please call us on +2348157165603 or send us a mail together with the web address link to the work, to hello@projects.ng. We will reply to and honor every request. Please notice it may take up to 24 or 48 hours to process your request.