# PICOBLAZE ASSEMBLY CODE DEVELOPMENT ### 16.1 INTRODUCTION Because of its simplicity, PicoBlaze cannot effectively support high-level programming languages and the code is generally developed in assembly language. In this chapter, we provide an overview of code development, which is illustrated in a bottom-up fashion. We first introduce the segments of frequently used data and control operations and then examine the use of a subroutine and finally outline the derivation of overall program structure. ## 16.2 USEFUL CODE SEGMENTS The PicoBlaze microcontroller contains instructions for byte-oriented data manipulation and simple conditional branch. In this section, we illustrate how to construct code to perform bit and multiple-byte operations and to realize frequently used high-level language control constructs. ## 16.2.1 KCPSM3 conventions The KCPSM3 assembler uses the following conventions in an assembly program: - Use a ":" sign after a symbolic address in code, as in "done:". - Use a ";" sign before a comment. - Use HH for a constant, in which H is a hexadecimal digit. An example of a code segment follows: #### 16.2.2 Bit manipulation PicoBlaze's instruction set is primarily for byte-oriented operations. Bit-oriented operations are frequently needed to control low-level I/O activities, such as testing, setting, and clearing a 1-bit flag signal. To manipulate a single bit, we first define a *mask* to isolate and preserve (i.e., mask) the unrelated bits and then apply the designated operation on the desired bits (i.e., unmasked bits). We can set, clear, and toggle (i.e., invert) some bits of a data byte by performing **or**, **and**, and **xor** instructions with a proper mask. The following code segment shows how to set, clear, and toggle the second LSB of the s0 register: The toggle operation is based on the observation that for any Boolean variable $x, x \oplus 0 = x$ and $x \oplus 1 = x'$ . The same principle can be applied to multiple bits. For example, we can clear the upper nibble (i.e., four MSBs) by using ``` and s0, 0F ; mask = 0000 _11111 ``` We can also apply the concept of the and mask to the **test** instruction to check a single bit. For example, the following code segment tests the MSB of the s0 register and branches to a proper routine accordingly: A single bit can be extracted by applying the previous code. For example, the following code segment extracts the MSB of the s0 register and stores it in the s1 register: ``` done: ``` # 16.2.3 Multiple-byte manipulation A microcontroller sometimes needs to handle wide, multiple-byte data, such as a large counter. Since the data width of PicoBlaze is 8 bits, processing this type of data requires a mechanism to propagate information between two successive instructions. PicoBlaze uses the carry flag for this purpose. For the arithmetic instructions, there are two versions for addition and subtraction, one with carry and one without carry, as in the **add** and **addcy** instructions. For the shift and rotate instructions, carry can be shifted into the MSB or LSB of a register, and vice versa. Assume that x and y are 24-bit data and that each occupies three registers. The following code segment illustrates the use of carry in multiple-byte addition: ``` namereg s0, x0; least significant byte of x namereg s1, x1; middle byte of x namereg s2, x2; most significant byte of x namereg s3, y0; least significant byte of y namereg s4, y1; middle byte of y namereg s5, y2; most significant byte of y ; add: {x2,x1,x0} + {y2,y1,y0} add x0, y0; add least significant bytes addcy x1, y1; add middle bytes with carry addcy x2, y2; add most significant bytes with carry ``` The first instruction performs normal addition of the least significant bytes and stores the carry-out bit into the carry flag. The second instruction then includes the carry flag when adding the middle bytes. Similarly, the third instruction uses the carry flag from the previous addition to obtain the result for the most significant bytes. The incrementing and subtraction of multiple bytes can be achieved in a similar fashion: ``` ; increment: \{x2,x1,x0\} + 1 add x0, 01 ; inc least significant byte addcy x1, 00 ; add carry to middle byte addcy x2, 00 ; add carry to most significant byte ; subtract: \{x2,x1,x0\} - \{y2,y1,y0\} sub x0, y0 ; sub least significant byte subcy x1, y1 ; sub middle byte with borrow subcy x2, y2 ; sub most significant byte with borrow ``` Multiple-byte data can be shifted by including the carry flag in the individual shift instruction. For example, the **sla** instruction shifts data left one position and shifts the carry flag into LSB. The code for shifting a 3-byte data left can be written as ## 16.2.4 Control structure A high-level programming language usually contains various control constructs to alter the execution sequence. These include the if-then-else, case, and for-loop statements. On the other hand, PicoBlaze provides only simple conditional and unconditional **jump** instructions. Despite its simplicity, we can use them with a **test** or **compare** instruction to implement the high-level control constructs. The following examples illustrate the construction of the if-then-else, case, and for-loop statements. Let us first consider the if-then-else statement: ``` if (s0==s1) { /* then-branch statements */ } else { /* else-branch statements */ } ``` The corresponding assembly code segment is ``` compare s0, s1 jump nz, else_branch ;code for then branch ... jump if_done else_branch: ;code for else branch ... if_done: ;code following if statement ... ``` The code uses the **compare** instruction to check the s0==s1 condition and to set the zero flag. The following **jump** instruction examines the flag and jumps to the else branch if the flag is not set. The case statement can be considered as a multiway jump, in which execution is transferred according to the value of the selection expression. The following statement uses the s0 variable as the selection expression and jumps to the corresponding branch: ``` switch (s0) { case value1: /* case value1 statements */ break; case value2: /* case value2 statements */ break; case value3: /* case value3 statements */ break; default: /* default statements */ } ``` The multiway jump can be implemented by a hardware feature known as "index address mode" in some processors. However, since PicoBlaze does not support this feature, the case statement has to be constructed as a sequence of if-then-else statements. In other words, the previous case statement is treated as ``` if (s0==value1) { /* case value1 statements */ } else if (s0==value2) { /* case value2 statements */ } else if (s0==value3) { /* case value3 statements */ } else{ /* default statements */ } ``` The corresponding assembly code segment becomes ``` constant value1, ... constant value2, ... constant value3, ... compare s0, value1 ; test valuel jump nz, case_2 ; not equal to valuel, jump ; code for case 1 jump case_done case_2: compare s0, value2 ; test value2 jump nz, case_3 ; not equal to value2, jump ; code for case 2 jump case_done case_3: compare s0, value3 ; test value3 jump default ; not equal to value3, jump ; code for case 3 jump case_done default: ; code for default case . . . case_done: ; code following case statement ``` The for-loop statement executes a segment of the code repetitively. The loop statement can be implemented by using a counter to keep track of the iteration number. For example, consider the following: ``` for(i=MAX, i=0, i-1) { /* loop body statements */ } ``` The assembly code segment is ``` namereg s0, i ; loop index constant MAX, ... ; loop boundary ``` #### 16.3 SUBROUTINE DEVELOPMENT A subroutine, such as a function in C, implements a section of a larger program. It is coded to perform a specific task and can be used repetitively. Using subroutines allows us to divide a program into small, manageable parts and thus greatly improve the reliability and readability of a program. It is the base of modern programming practice and is supported by all high-level programming languages. PicoBlaze uses the **call** and **return** instructions to implement the subroutine. The **call** instruction saves the current content of the program counter and transfers program execution to the starting address of a subroutine. A subroutine ends with a **return** instruction, which restores the saved program counter and resumes the previous execution. A representative flow is shown in Figure 15.7. Note that PicoBlaze only saves and restores the content of the program counter during a function call and return. We have to manage the register and data RAM use manually to ensure that the original system state is not altered after a subroutine call. The following multiplication example illustrates the development of subroutines. We assume that the inputs are two 8-bit numbers in unsigned integer format and the output is a 16-bit product. The algorithm is based on a simple shift-and-add method. This method iterates through 8 bits of multiplier. In each iteration, the multiplicand is shifted left one position. If the corresponding multiplier bit is 1, the shifted multiplicand is added to the partial product. The assembly code is shown in Listing 16.1. The multiplicand and multiplier are stored in the s3 and s4 registers. The individual bit of multiplier is obtained by repetitively shifting s4 to the right, which moves the LSB to the carry flag. Note that instead of actually shifting the multiplicand to the left, we shift the partial product, which consists of 2 bytes and is stored in s5 and s6, to the right. Listing 16.1 Software integer multiplication ``` load i, 08 ; initialize loop index mult_loop: sr0 s4 ; shift LSB to carry ; LSB is 0 jump nc, shift_prod add s5, s3 ; LSB is 1 20 shift_prod: ; shift upper byte right, sra s5 carry to MSB, LSB to carry ; shift lower byte right, sra s6 ; LSB of s5 to MSB of s6 sub i, 01 ; dec loop index jump nz, mult_loop ; repeat until i=0 return ``` Because of the primitive nature of the assembly language, thorough documentation is instrumental. A subroutine should include a descriptive header and detailed comments. A representative header is shown in Listing 16.1. It consists of a short function description and the use of registers. The latter shows how the registers are allocated and is crucial to preventing conflict in a large program. #### 16.4 PROGRAM DEVELOPMENT Developing a complete assembly program consists of the following steps: - 1. Derive the pseudo code of the main program. - 2. Identify tasks in the main program and define them as subroutines. If needed, continue refining the complex subroutines and divide them into smaller routines. - 3. Determine the register and data RAM use. - 4. Derive assembly code for the subroutines. Steps 1, 2, and 4 basically follow a *divide-and-conquer* approach and are applicable for any software development. A microcontroller-based application is normally for a simple embedded system, in which the processor monitors the I/O activities continuously and responds accordingly. Its main program usually has the following structure: ``` call initialization_routine forever: call task1_routine call task2_routine ... call taskn_routine jump forever ``` Step 3 is unique for assembly code development. Unlike a high-level language program, in which the compiler allocates storage to variables automatically, we must manage the data storage manually in assembly code. PicoBlaze has 16 registers and 64 bytes of data RAM to store data. The registers can be considered as fast storage, in which the data can be manipulated directly. The data RAM, on the other hand, is "auxiliary" storage. Its data needs to be transferred to a register for processing. For example, if we want to increment a data item located in the RAM, it must first be loaded into a register, incremented there, and then stored back to the RAM. Because of the limited space for data storage, its use has to be planned carefully in advance, particularly when the code is complex and involves nested subroutines. To assist | 00 | lower byte of a | |----|---------------------------| | 01 | unused | | 02 | lower byte of b | | 03 | unused | | 04 | lower byte of $a^2$ | | 05 | upper byte of $a^2$ | | 06 | lower byte of $b^2$ | | 07 | upper byte of $b^2$ | | 80 | lower byte of $a^2 + b^2$ | | 09 | upper byte of $a^2 + b^2$ | | OA | carry of $a^2 + b^2$ | | | | Figure 16.1 Data RAM memory allocation. coding, we can first identify the needed *global storage* or *local storage*. The former keeps data that is needed in the entire program. The latter provides space to store intermediate results, and the data will be discarded after the required computation is completed. ## 16.4.1 Demonstration example The development process can best be explained by an example. Let us consider a program that uses the previous multiplication subroutine. It reads two inputs, a and b, from the switch, calculates $a^2 + b^2$ , and displays the result on eight discrete LEDs. Since the I/O interface is to be discussed in Chapter 17, we limit the I/O to a single input port, the 8-bit switch, and a single output port, the 8-bit LEDs. We assume that a and b are obtained from the upper nibble (i.e., the four MSBs) and the lower nibble (i.e., the four LSBs) of the switch. The main program is ``` call clear_data_ram forever: call read_switch call square call write_led jump forever ``` The subroutines are defined as follows: - clr\_data\_mem: clears data memory at system initialization - read\_switch: obtains the two nibbles from the switch and stores their values to the data RAM - square: uses the multiplication subroutine to calculate $a^2 + b^2$ - write\_led: writes the eight LSBs of the calculated result to the LED port For demonstration purposes, we create two smaller routines, get\_upper\_nibble and get\_lower\_nibble, within the read\_switch routine to obtain the upper nibble and lower nibble from a register. The next step in development is to plan the register and data RAM use. For global storage, we introduce a global register, sw\_in, to store the input value of switch and allocate 11 bytes of data RAM to store the inputs and result of the square routine. Allocation of the data RAM is shown in Figure 16.1. Note that the addresses 01 and 03 are not actually used. They are reserved to simplify the seven-segment LED display code, which is discussed in Chapter 17. All remaining registers are used as local storage. For program clarity, we define three symbolic names, data, addr, and i, as temporary registers for data, port and memory address, and loop index. The last step is to derive the assembly code for the subroutines. The complete code is shown in Listing 16.2. The clr\_data\_mem uses a loop to clear data memory. The i register is the loop index and is initialized with 64 (i.e., $40_{16}$ ). The index is decremented in each loop and 0 is loaded to the corresponding data RAM address. The write\_led routine fetches the eight LSBs of the calculated result from the data RAM and outputs them to the LED port. The read\_switch routine includes two smaller routines. The get\_upper\_nibble routine shifts the data register right four times to move the upper nibble to the four LSBs. The get\_lowe\_nibble routine clears the four MSBs of the data register to 0's and thus removes the upper nibble. The "glue instructions" of read\_switch input the switch values, set up the input for the two nibble routines, and store the result in the data RAM. The square routine fetches data from the data RAM, utilizes the mult\_soft routine to calculate $a^2$ and $b^2$ , performs addition, and stores the result back to the data RAM. ``` Listing 16.2 Square program with simple nibble input ; square circuit with simple I/O interface ; program operation: s; - read switch to a (4 MSBs) and b (4 LSBs) ; - calculate a*a + b*b - display data on 8 leds ----- 10; data constant constant UP_NIBBLE_MASK, OF ;000011111 15; data ram address alias constant a_lsb, 00 constant b_lsb, 02 constant aa_lsb, 04 20 constant aa_msb, 05 constant bb_lsb, 06 constant bb_msb, 07 constant aabb_lsb, 08 constant aabb_msb, 09 25 constant aabb_cout, OA ; register alias 30; commonly used local variables namereg s0, data ; reg for temporary data namereg s1, addr ; reg for temporary mem & i/o port addr namereg s2, i ; general-purpose loop index ; global variables 35 namereg sf, sw_in ``` ``` ; port alias ———input port definitions — constant sw_port, 01 ;8-bit switches ; output port definitions constant led_port, 05 ; main program ; calling hierarchy: 50; main ; - clr_data_mem ; - read_switch — get_upper_nibble - get_lower_nibble ss; - square - mult_soft ; - write_led call clr_data_mem forever: call read_switch call square call write_led jump forever ; routine: clr_data_mem ; function: clear data ram 70; temp register: data, i clr_data_mem: load i, 40 ; unitize loop index to 64 load data, 00 75 clr_mem_loop: store data, (i) ; dec loop index sub i, 01 jump nz, clr_mem_loop ; repeat until i=0 return ; routine: read switch function: obtain two nibbles from input input register: sw_in 85; temp register: data read_switch: ``` ``` load data, sw_in call get_lower_nibble store data, a_lsb ; store a to data ram load data, sw_in call get_upper_nibble store data, b_lsb ; store b to data ram ; routine: get_lower_nibble ; function: get lower 4 bits of data input register: data 100; output register: data get_lower_nibble: and data, UP_NIBBLE_MASK ; clear upper nibble return ; routine: get_upper_nibble ; function: get upper 4 bits of data ; input register: data 110; output register: data get_upper_nibble: ; right shift 4 times sr0 data sr0 data sr0 data sr0 data return 120; routine: write_led ; function: output 8 LSBs of result to 8 leds temp register: data write_led: fetch data, aabb_lsb output data, led_port return 130; routine: square ; function: calculate a*a + b*b data/result stored in ram started w/ SQ_BASE_ADDR ; temp register: s3, s4, s5, s6, data 135 square: ; calculate a*a ;loada fetch s3, a_lsb fetch s4, a_lsb ;load a call mult_soft ; calculate a*a store s6, aa_lsb ; store lower byte of a*a store s5, aa_msb ; store upper byte of a*a 140 ``` ``` : calculate b*b ;load b fetch s3, b_lsb fetch s4, b_lsb ;load b ;calculate b*b ;store lower byte of b*b call mult_soft store s6, bb_1sb store s5, 07 ; store upper byte of b*b ; calculate a*a+b*b calculate a*a+b*b fetch data, aa_lsb add data, s6 store data, aabb_lsb fetch data, aa_msb addcy data, s5 store data, aabb_msb load data, 00 ; get lower byte of a*a ; add lower byte of a*a+b*b ; store lower byte of a*a+b*b ; get upper byte of a*a+b*b ; store upper byte of a*a+b*b ; clear data, but keep carry load data, 00 ; clear data, but keep carry addcy data, 00 ; get carry-out from previous + store data, aabb_cout ; store carry-out of a*a+b*b return ; routine: mult_soft function: 8-bit unsigned multiplier using shift-and-add algorithm ; input register: s3: multiplicand 165 , s4: multiplier output register: s5: upper byte of product s6: lower byte of product 170; temp register: i mult_soft: ; clear s5 load s5, 00 load i, 08 ; initialize loop index 175 mult_loop: jump nc, shift_prod ; shift lsb to carry ; lsb is 0 add s5, s3 shift_prod: ; shift upper byte right, ; carry to MSB, LSB to carry sra s5 sra s6 ; shift lower byte right, ; lsb of s5 to MSB of s6 sub i, 01 ; dec loop index jump nz, mult_loop ; repeat until i = 0 return ``` # 16.4.2 Program documentation Developing an assembly program is a tedious process. The use of symbolic names and good documentation can make the code clear and reduce many unnecessary errors. It also helps future revision and maintenance. For the KCPSM3 assembler, we can use the **constant** directive to assign a symbolic name (alias) to a data constant, a memory address, or a port id, and use the **namereg** directive to assign a symbolic name to a register. A representative main program header is shown in Listing 16.2. It contains the following segments: - General program description: provides a general description for the purpose, operation, and I/O of the program - Data constants: declares symbolic names for constants - Data RAM address alias: declares symbolic names for data RAM addresses - Register alias: declares symbolic names for registers - Port alias: declares symbolic names for I/O ports - Program calling hierarchy: illustrates the calling structure and subroutines The aliases and directives have no effect on the final machine code. When the assembly code is processed, they are replaced with the actual constant values. However, using aliases can greatly enhance the readability of the assembly code and reduce unnecessary errors. The following code segment further illustrates the impact of the alias and documentation. The purpose of this segment is to obtain values for variables a, b, and c, and store them in proper data RAM locations. The location is specified by the UART input, which is the ASCII code of character a, b, or c. The segment with aliases and proper comments is ``` ; constant alias ; ASCII code for a constant ASCII_a, 61 constant ASCII_b, 62 ; ASCII code for b ; ASCII code for c constant ASCII_c, 63 ; data ram address alias constant a_addr, 02 constant b_addr, 04 constant c_addr, 06 ; register alias namereg s0, data ; reg for temporary data namereg s1, addr ; reg for temporary addr namereg sF, sw_in ; switch input ; port alias constant uart_rx_port, 01 ; switch input; UART input ; assembly code with alias ; get input input sw_in, sw_port ; get switch input data, uart_rx_port ; get char ; check received char compare data, ASCII_a ; check ASCII a jump nz, chk_ascii_b ;no, check next store sw_in, a_addr ; yes, store a to data ram jump done chk_ascii_b: compare data, ASCII_b ; check ASCII b jump nz, chk_ascii_c ;no, check next store sw_in, b_addr ; yes, store b to data ram jump done chk_ascii_c: compare data, ASCII_c ; check ASCII c jump nz, ascii_err ;no, error ``` If we use hard literals and strip the comments, the code becomes ``` ; assembly code with no alias or comments input sf, 01 input s0, 02 compare s0, 61 jump nz, addr1 store sf, 02 jump addr4 addr1: compare s0, 62 jump nz, addr2 store sf, 04 jump addr4 addr2: compare s0, 63 jump nz, addr3 store sf, 06 jump addr4 addr3: addr4: ``` While the functionality of this code segment is the same, it is very difficult to comprehend, debug, or modify. #### 16.5 PROCESSING OF THE ASSEMBLY CODE PicoBlaze-based development flow is reviewed in Section 15.4. After the assembly code is developed, it is then compiled (translated) to machine instructions in step 3. The instruction-set-level simulation can also be performed to verify the correctness of the code, as in step 4. The two steps and the direct downloading process (step 9) are discussed in detail in this section. Xilinx provides an assembler known as KCPSM3 for compiling in step 3 and downloading utility programs in step 9. The programs, HDL codes for the PicoBlaze processor, and relevant template files can be downloaded from the Xilinx Web site. A program known as PBlazeIDE from Mediatronix can perform the instruction-set-level simulation in step 4. It can also be used as an assembler. PBlazeIDE can be downloaded from Mediatronix's Web site. # 16.5.1 Compiling with KCSPM3 Assembler is the software that translates the instruction mnemonics to machine instructions, which are represented as 0's and 1's, and substitutes the aliases and symbolic branch addresses with actual values. The machine instructions are then downloaded to the instruction memory of a microcontroller. Since PicoBlaze is embedded inside FPGA, the instruction ROM becomes an HDL ROM module with the compiled assembly code. The ROM will be instantiated later in the top-level HDL code and synthesized along with PicoBlaze and the I/O interface circuit. Xilinx provides the *KCPSM3* assembler for this task. It is a command-line, DOS-based program. KCPSM3 basically takes an assembly program, along with the necessary template files, and generates the HDL code for the instruction ROM. The procedure of compiling an assembly program is as follows: - Create a directory for the project and copy kcpsm3.exe, ROM\_form.vhd, ROM\_form.v, and ROM\_form.coe to the directory. The latter three are code templates used by KCPSM3. - 2. Create the assembly program and save it as plain text file with an extension of .psm. Any PC-based editor, such as Notepad, can be used for this purpose. - 3. Invoke a DOS window by selecting Start ≻ Programs ≻ Accessories ≻ Command Prompt. In the DOS window, navigate to the project directory. - 4. Type kcpsm3 myfile.psm to run the program. - 5. Correct syntax errors if necessary and recompile. - 6. After successful compiling, the file containing the instruction ROM, myfile.v, is generated. In addition to the HDL file, KCPSM3 also generates files that are suitable for block RAM initialization and other utilities. The file with the .hex extension can be used for JTAG downloading, which is discussed in Section 16.5.3, and the file with the .fmt extension is a reformatted .psm file for "pretty printing." ## 16.5.2 Simulation by PBlazeIDE As the name indicates, instruction-set-level simulation simulates the operation of a PicoBlaze system instruction by instruction. The *PBlazeIDE* program can be used for this purpose. PBlazeIDE is a Windows-based program with an integrated development environment, which includes a text editor, an assembler, and an instruction-set-level simulator. PBlazeIDE uses slightly different instruction mnemonics and directives, as discussed in Section 15.5. Thus, the code written for by KCPSM3 cannot be used directly by PBlazeIDE, and vice versa. The mnemonic differences are summarized in Table 16.1, and the directive examples are shown in Table 16.2. Note that the PBlazeIDE assembler uses both decimal and hexadecimal format for constants. A hexadecimal number is started with a \$ sign, as in \$1A. The procedure of using PBlazeIDE for KCPSM3 code is as follows: - 1. Compile the assembly code with KCPSM3. - 2. Launch PBlazeIDE. - 3. Select Settings ≻ PicoBlaze 3. This specifies version 3 of PicoBlaze, which is used in the Spartan-3 device. - 4. Select File > Import and a dialog window appears. Select the corresponding .fmt file. The "import" function converts the KCPSM3 code to the PBlazeIDE code. The formatted program is easier for conversion. The converted file may sometimes need minor manual editing. - 5. Manually specify the **dsin**, **dsout**, and **dsio** directives for I/O ports. When one of these directives is used, a port indicator will be added to the simulation screen to show the activities of the port. | Table 16.1 | Mnemonic differences | hatwaan KC | DCM2 and DDlazaIDE | |------------|------------------------|-------------|-----------------------| | Table (b.) | -vinemonic differences | nerween Kul | PSIVIA and PBIAZELLE. | | PBlazeIDE | | |--------------|--| | addc | | | subc | | | comp | | | store sX, sY | | | fetch sX, sY | | | in sX, sY | | | in sX, \$KK | | | out sX, sY | | | out sX, \$KK | | | ret | | | reti | | | eint | | | dint | | | | | Table 16.2 Directive examples of KCPSM3 and PBlazeIDE | Function | KCPSM3 | PBlazeIDE | |---------------------------------------------|-----------------------------------------------------------------------|--------------------------------------------------------------------| | code location<br>constant<br>register alias | address 3FF<br>constant MAX, 3F<br>namereg addr, s2 | org \$3FF<br>MAX equ \$3F<br>addr equ s2 | | port alias | constant in_port, 00<br>constant out_port, 10<br>constant bi_port, 0F | <pre>in_port dsin \$00 out_port dsout \$10 bi_port dsio \$0F</pre> | - 6. Enter the simulation mode by selecting Simulate > Simulate. Perform simulation. - 7. If the assembly code needs to be revised, it must be done outside PBlazeIDE. Simply close the current file, invoke an external editor to edit the original .psm file, save the file, and restart from step 1. If the file is edited within PBlazeIDE, it cannot be converted back to KCPSM3 code. A representative simulation screenshot is shown in Figure 16.2. The simulator displays the assembly code in the central window and highlights the next instruction to be executed. The instruction address, instruction code, and breakpoints are shown next to the code. The current state of PicoBlaze is shown at the left, including the status of the flags, the content of the registers, and the content of the data RAM. The values of the program counter and stack pointer as well as some execution statistics are shown in the bottom row. The emulated I/O ports created by the **dsin**, **dsout**, and **dsio** directives are shown at the right. There are an input port, switch, and an output port, led, on this particular screen. Since PBlazeIDE has no information about I/O behavior, the input port data must be entered and modified manually during simulation. During simulation, the assembly program can be executed continuously, by one step, by one instruction, or to pause at a specific breakpoint. The simulation action is controlled by the commands of the Simulate menu or the icons on the top: Figure 16.2 Screenshot of pBlazeIDE in simulation mode. - Reset: clears the program counter and stack pointer - Run: runs the program continuously until a breakpoint - Single step: executes one instruction - Step over: executes the entire subroutine for a **call** instruction and executes one instruction for other instructions - Run to cursor: runs the program to the current cursor position - Pause: pauses the simulation - Toggle breakpoint: sets or clears a breakpoint at the current cursor position - Remove all breakpoints: clears all breakpoints # 16.5.3 Reloading code via the JTAG port After the instruction ROM HDL is generated, we can continue steps 6 and 8 in Figure 15.4 to synthesize the entire code and download the configuration file to the FPGA chips. Note that the synthesis flow must be repeated each time the assembly code is modified. Since synthesis is a complex process, it requires a significant amount of computation time. When the I/O configuration is fixed, resynthesizing the entire circuit after each assembly program modification is not really needed. It is possible to reload the machine code to the ROM, which is implemented by a block RAM, by using the FPGA's JTAG interface. This corresponds to the dotted line of step 9 in Figure 15.4. The basic procedure is as follows: - 1. Replace the original ROM template with one that contains the JTAG interface circuit. - 2. Use KCPSM3 to compile the assembly code as usual. - 3. Synthesize the top-level HDL code and program the FPGA chip. - 4. In subsequent assembly program modifications, compile the program as usual. Recall that a file in hex format (ended with the .hex extension) is generated. - 5. Use the Xilinx utility to embed the .hex file to a JTAG programming file and download the file to the FPGA's block RAM via the JTAG interface. The detailed procedure and the relevant programs and templates can be found in the JTAG\_loader directory of the downloaded KCPSM file. # 16.5.4 Compiling by PBlazeIDE As discussed earlier, PBlazeIDE is an integrated program that contains an assembler and editor. PBlazeIDE can generate an instruction ROM HDL file as well. However, the file is only in VHDL format. Since Xilinx IST supports mixed-language synthesis, this file can still be incorporated into the top-level Verilog module. The detailed procedure can be found in the IST manual. To obtain the instruction ROM file, we simply include the **vhdl** directive in the assembly code. Its syntax is ``` vhdl "ROM_form.vhd", "rom_target.vhd", "rom_entity_name" ``` The three parameters specify a VHDL template file, which is the same file as that discussed in Section 16.5.1, the name of the generated ROM VHDL file, and the desired entity name in the VHDL file. Note that since PBlazeIDE does not generate a .hex file, the reloading scheme discussed in Section 16.5.3 cannot be applied directly. Figure 16.3 PicoBlaze with a simple I/O interface. ## 16.6 SYNTHESES WITH PICOBLAZE After generating the HDL file for the instruction ROM, we can combine it with PicoBlaze to synthesize the entire system in an FPGA chip. Unlike a normal microcontroller, PicoBlaze has no built-in I/O peripherals. The I/O interface is created and customized as needed. The circuit is described in HDL code. Since the focus in this chapter is on assembly program development, we use a simple I/O configuration, which contains only one switch input port and one led output port, for synthesis. The development of a more sophisticated I/O interface is discussed in detail in Chapters 17 and 18. The top-level block diagram of this design is shown in Figure 16.3. It contains the PicoBlaze processor, which is labeled kcpsm3, the instruction ROM, and a register. The register functions as a buffer for the eight LEDs. When PicoBlaze executes the **output** instruction, it places the data on out\_port and asserts the write\_strobe signal, which enables the register and stores the data in the register. The sw signal is connected to in\_port. When PicoBlaze executes the **input** instruction, it retrieves the value of the sw signal and stores it in an internal register. The corresponding HDL code is shown in Listing 16.3. It consists of instantiations of the PicoBlaze processor and instruction ROM, and a segment for the output buffer. The kcpsm3 module is the name of the PicoBlaze processor, and its code is stored in an HDL file of the same name. The sio\_rom module is from the previously generated instruction ROM file. Listing 16.3 PicoBlaze with a simple I/O configuration ``` module pico_sio ( input wire clk, reset, input wire [7:0] sw, output wire [7:0] led ); // signal declaration // KCPSM3/ROM signals wire [9:0] address; wire [17:0] instruction; wire [7:0] port_id, in_port, out_port; wire write_strobe; // register signals reg [7:0] led_reg; ``` ``` //bodv // KCPSM and ROM instantiation kcpsm3 proc_unit (.clk(clk), .reset(reset), .address(address), .instruction(instruction), .port_id(), .write_strobe(write_strobe), .out_port(out_port), .read_strobe(), .in_port(in_port), .interrupt(1'b0), 25 .interrupt_ack()); sio_rom rom_unit (.clk(clk), .address(address), .instruction(instruction)); output interface always @(posedge clk) if (write_strobe) led_reg <= out_port;</pre> assign led = led_reg; // input interface assign in_port = sw; endmodule ``` ## 16.7 BIBLIOGRAPHIC NOTES The bibliographic information for this chapter is similar to that for Chapter 15. The procedure of reloading compiled code via JTAG port is explained in the article "PicoBlaze JTAG Loader Quick User Guide" by Kris Chaplin and Ken Chapman, which appears in the JTAG\_loader directory of the downloaded KCPSM file. # 16.8 SUGGESTED EXPERIMENTS # 16.8.1 Signed multiplication The subroutine in Listing 16.1 assumes that the inputs are in unsigned integer format. Modify the subroutine to perform the signed multiplication, in which the two inputs and output are interpreted as signed integers, and use simulation to verify its operation. #### 16.8.2 Multi-byte multiplication The subroutine in Listing 16.1 assumes that the inputs are 8 bits wide. Some application may need more precision and we want to extend the subroutine to take 16-bit unsigned inputs. An operand now requires two registers and the result needs four registers. Develop the subroutine and use simulation to verify its operation.