The SPARC port##############The SPARC port targets various machines from Sun product lineup. The initial effort is on theUltra 60 and Ultra 5, with plans to latter add the Sun T5120 and its newer CPU. This may changedepending on hardware donations and developer interest.Support for 32-bit versions of SPARC is currently not planned.SPARC ABI=========The SPARC architecture has 32 integer registers, divided as follows:- global registers (g0-g7)- input (i0-i7)- local (l0-l7)- output (o0-o7)Parameter passing and return is done using the output registers, which aregenerally considered scratch registers and can be corrupted by the callee. Thecaller must take care of preserving them.The input and local registers are callee-saved, but we have hardware assistancein the form of a register window. There is an instruction to shift the registersso that:- o registers become i registers- local and output registers are replaced with fresh sets, for use by thecurrent function- global registers are not affectedNote that as a side-effect, o7 is moved to i7, this is convenient because theseare usually the stack and frame pointers, respectively. So basically this setsthe frame pointer for free.Simple enough functions may end up using just the o registers, in that casenothing special is necessary, of course.When shifting the register window, the extra registers come from the registerstack in the CPU. This is not infinite, however, most implementations of SPARCwill only have 8 windows available. When the internal stack is full, an overflowtrap is raised, and the handler must free up old windows by storing them on thestack, likewise, when the internal stack is empty, an underflow trap must fillit back from the stack-saved data.Misaligned memory access========================The SPARC CPU is not designed to gracefully handle misaligned accesses.You can access a single byte at any address, but 16-bit access only at evenaddresses, 32bit access at multiple of 4 addresses, etc.For example, on x86, such accesses are not a problem, it is allowed and handleddirectly by the instructions doing the access. So there is no performance cost.On SPARC, however, such accesses will cause a SIGBUS. This means a trap handlerhas to catch the misaligned access and do it in software, byte by byte, thengive back control to the application. This is, of course, very slow, so weshould avoid it when possible.Fortunately, gcc knows about this, and will normally do the right thing:- For usual variables and structures, it will make sure to lay them out so thatthey are aligned. It relies on stack alignment, as well as malloc returningsufficiently aligned memory (as required by the C standard).- On packed structure, gcc knows the data is misaligned, and will automaticallyuse the appropriate way to access it (most likely, byte-by-byte).This leaves us with two undesirable cases:- Pointer arithmetics and casting. When computing addresses manually, it'spossible to generate a misaligned address and cast it to a type with a wideralignment requirement. In this case, gcc may access the pointer using amulti byte instruction and cause a SIGBUS. Solution: make sure the structis aligned, or declare it as packed so unaligned access are used instead.- Access to hardware: it is a common pattern to declare a struct as packed,and map it to hardware registers. If the alignment isn't known, gcc will usebyte by byte access. It seems volatile would cause gcc to use the proper wayto access the struct, assuming that a volatile value is necessarilyaligned as it should.In the end, we just need to be careful about pointer math resulting in unalinedaccess. -Wcast-align helps with that, but it also raises a lot of false positives(where the alignment is preserved even when casting to other types). So weenable it only as a warning for now. We will need to ceck the sigbus handler toidentify places where we do a lot of misaligned accesses that trigger it, andrework the code as needed. But in general, except for these cases, we're fine.The Ultrasparc MMUs============================First, a word of warning: the MMU was different in SPARCv8 (32bit)implementations, and it was changed again on newer CPUs.The Ultrasparc-II we are supporting for now is documented in the Ultrasparcuser manual. There were some minor changes in the Ultrasparc-III to accomodatelarger physical addresses. This was then standardized as JPS1, and Fujitsualso implemented it.Later on, the design was changed again, for example Ultrasparc T2 (UA2005architecture) uses a different data structure format to enlarge, again, thephysical and virtual address tags.For now te implementation is focused on Ultrasparc-II because that's what Ihave at hand, later on we will need support for the more recent systems.Ultrasparc-II MMU-----------------There are actually two separate units for the instruction and data addressspaces, known as I-MMU and D-MMU. They each implement a TLB (translationlookaside buffer) for the recently accessed pages.This is pretty much all there is to the MMU hardware. No hardware page tablewalk is provided. However, there is some support for implementing a TSB(Translation Storage Buffer) in the form of providing a way to compute anaddress into that buffer where the data for a missing page could be.It is up to software to manage the TSB (globally or per-process) and in generalkeep track of the mappings. This means we are relatively free to manage thingshowever we want, as long as eventually we can feed the iTLB and dTLB with therelevant data from the MMU trap handler.To make sure we can handle the fault without recursing, we need to pin a fewitems in place:In the TLB:- TLB miss handler code- TSB and any linked data that the TLB miss handler may need- asynchronous trap handlers and dataIn the TSB:- TSB-miss handling code- Interrupt handlers code and dataSo, from a given virtual address (assuming we are using only 8K pages and a512 entry TSB to keep things simple):VA63-44 are unused and must be a sign extension of bit 43VA43-22 are the 'tag' used to match a TSB entry with a virtual addressVA21-13 are the offset in the TSB at which to find a candidate entryVA12-0 are the offset in the 8K page, and used to form PA12-0 for the accessInside the TLBs, VA63-13 is stored, so there can be multiple entries matchingthe same tag active at the same time, even when there is only one in the TSB.The entries are rotated using a simple LRU scheme, unless they are locked ofcourse. Be careful to not fill a TLB with only locked entries! Also one musttake care of not inserting a new mapping for a given VA without first removingany possible previous one (no need to worry about this when handling a TLBmiss however, as in that case we obviously know that there was no previousentry).Entries also have a "context". This could for example be mapped to the processID, allowing to easily clear all entries related to a specific context.TSB entries format------------------Each entry is composed of two 64bit values: "Tag" and "Data". The data uses thesame format as the TLB entries, however the tag is different.They are as follow:Tag***Bit 63: 'G' indicating a global entry, the context should be ignored.Bits 60-48: context ID (13 bits)Bits 41-0: VA63-22 as the 'tag' to identify this entryData****Bit 63: 'V' indicating a valid entry, if it's 0 the entry is unused.Bits 62-61: size: 8K, 64K, 512K, 4MBBit 60: NFO, indicating No Fault OnlyBit 59: Invert Endianness of accesses to this pageBits 58-50: reserved for use by softwareBits 49-41: reserved for diagnosticsBits 40-13: Physical Address<40-13>Bits 12-7: reserved for use by softwareBit 6: Lock in TLBBit 5: Cachable physicalBit 4: Cachable virtualBit 3: Access has side effects (HW is mapped here, or DMA shared RAM)Bit 2: PrivilegedBit 1: WritableBit 0: GlobalTLB internal tag****************Bits 63-13: VA<63-13>Bits 12-0: context IDConveniently, a 512 entries TSB fits exactly in a 8K page, so it can be lockedin the TLB with a single entry there. However, it may be a wise idea to insteadmap 64K (or more) of RAM locked as a single entry for all the things that needsto be accessed by the TLB miss trap handler, so we minimize the use of TLBentries.Likewise, it may be useful to use 64K pages instead of 8K whenever possible.The hardware provides some support for mixing the two sizes but it makes thingsa bit more complex. Let's start out with simpler things.Software floating-point support===============================The SPARC instruction set specifies instruction for handling long doublevalues, however, no hardware implementation actually provides them. Theygenerate a trap, which is expected to be handled by the softfloat library.Since traps are slow, and gcc knows better, it will never generate thoseinstructions. Instead it directly calls into the C library, to functionsspecified in the ABI and used to do long double math using softfloats.The support code for this is, in our case, compiled into both the kernel andlibroot. It lives in src/system/libroot/os/arch/sparc/softfloat.c (and othersupport files). This code was extracted from FreeBSD, rather than the glibc,because that made it much easier to get it building in the kernel.Openboot bootloader===================Openboot is Sun's implementation of Open Firmware. So we should be able to sharea lot of code with the PowerPC port. There are some differences however.Executable format-----------------PowerPC uses COFF. Sparc uses a.out, which is a lot simpler. According to thespec, some fields should be zeroed out, but they say implementation may choseto allow other values, so a standard a.out file works as well.It used to be possible to generate one with objcopy, but support was removed,so we now use elf2aout (imported from FreeBSD).The file is first loaded at 4000, then relocated to its load address (we use202000 and executed there)Openfirmware prompt-------------------To get the prompt on display, use STOP+A at boot until you get the "ok" prompt.On some machines, if no keyboard is detected, the ROM will assume it is set upin headless mode, and will expect a BREAK+A on the serial port.STOP+N resets all variables to default values (in case you messed up input oroutput, for example).Useful commands---------------Disable autoboot to get to the openboot prompt and stop there.. code-block:: textsetenv auto-boot? falseConfiguring for keyboard/framebuffer io.. code-block:: textsetenv screen-#columns 160setenv screen-#rows 49setenv output-device screen:r1920x1080x60setenv input-device keyboardConfiguring openboot for serial port.. code-block:: textsetenv ttya-mode 38400,8,n,1,-setenv output-device ttyasetenv input-device ttyaresetBoot from network-----------------The openboot bootloader supports network booting. See the`Network booting guide <https://www.haiku-os.org/guides/network_booting/>`_for general information about the general network booting process. This pagedocuments the parts specific to the openboot bootloader configuration.In openboot, booting from the network is done simply by using the "net:" devicealias in the boot command line. This lets openboot load our bootloader, whichthen uses the openboot ability to send and receive data over the network to loadthe filesystem (and kernel contained in it) over the network. The two parts areindependent: it's also possible to load the bootloader from the network but boota local filesystem, or use the local bootloader and load the filesystem from thenetwork.The bootloader needs to be placed in a tftp server, I use atftpd in Debian,which serve files from /srv/tftp/ (so "somefile" in the example below will lookfor /srv/tftp/somefile).static ip*********This currently works best, because rarp does not let the called binary know theIP address. We need the IP address if we want to mount the root filesystem usingremote_disk server... code-block:: textboot net:192.168.1.2,somefile,192.168.1.89The first IP is the server from which to download (using TFTP), the second isthe client IP to use. Once the bootloader starts, it will detect that it isbooted from network and look for a the remote_disk_server on the same machine.rarp****This needs a reverse ARP server (easy to setup on any Linux system). You needto list the MAC address of the SPARC machine in /etc/ethers on the server. Themachine will get its IP, and will use TFTP to the server which replied, to getthe boot file from there... code-block:: textboot net:,somefile(net is an alias to the network card and also sets the load address: /pci@1f,4000/network@1,1)This currently does not work completely: the server address is not forwarded tothe bootloader, and as a result, remote filesystems will not be available. Thebootloader needs to be updated to know where to find the address in this case(it is done for PowerPC, I think).dhcp****This needs a DHCP/BOOTP server configured to send the info about where to findthe file to load and boot... code-block:: textboot net:dhcpDebugging---------The openboot environment provide several useful commands to assist in debugging:.. code-block:: text202000 dis (disassemble starting at 202000 until next return instruction)4000 1000 dump (dump 1000 bytes from address 4000).registers (show global registers).locals (show local/windowed registers)%pc dis (disassemble code being exectuted)ctrace (backtrace)The backtrace provides addresses and register values (allowing to know thefunction arguments), there is no symbols and function names printed.objdump (on the build machine) can be used to disassemble the kernel orbootloader and find the corresponding code:.. code-block:: text./cross-tools-sparc/bin/sparc64-unknown-haiku-objdump -d objects/haiku/sparc/release/system/kernel/kernel_sparc |c++filt|less./cross-tools-sparc/bin/sparc64-unknown-haiku-objdump -d objects/haiku/sparc/release/system/boot/openfirmware/boot_loader_openfirmware |c++filt|less