COMMON ERRORS IN OSP MODULES IMPLEMENTATION As general feature it is necessary to emphasize that the first problem students faces is they do not know what exactly is the OSP simulator. The OSP simulator is a collection of modules that together implement a modern day operating system. Several times during the semester, students will be given an assignment in which they have to implement one or more of the OSP modules. This simulator consist of ten primary modules whose interaction is depicted in figure 1.1 (page 4 of the OSP booklet). During the spring 1993 semester the students were able to implement the following modules : a) CPU Scheduling. b) SOCKETS. c) RESOURCE management. d) MEMORY management. In each of these implementations, I found several typical errors that must be considered for future references in order to help students. CPU SCHEDULING The intuition behind this module is based on given a set of processes ready and waiting to execute, cpu scheduling is essentially the task of deciding which of these processes should be allocated the cpu next. The goal of this module is to improve cpu utilization, system throughput, and response time. PROBLEMS FOUND DURING IMPLEMENTATION : The first problems students faced during the implementation of the cpu scheduling module were segmentation faults and bus errors. These kind of errors are considered as the most difficult to detect programming errors in these kind of module implementations. Usually the segmentation fault and bus errors are the result of a bad parameter passed to the external routines of the module that students are implementing. It also can be caused by a modification to the provided data structure in the header of a template file module.c. This kind of modifications should not be done under any circumstances. Another possible reason for these errors, it is the bad manipulation of pointers on the data structures provided by the file module.c. This will bring as a consequence that the debugger points in the direction of one of the modules the students did not write. Another problem was the bad understanding of the PTBR(page table base register). PTBR is a cpu register used to locate the page table of the current process and provides a convenient way to access the process control block (pcb) of the process which is in any determine moment in the cpu. Here, the student's problem was how to assign to the cpu a new ready process in order to be executed. POSSIBLE SOLUTIONS: The possible solution for the segmentation and bus errors is to create a macro that allows you in one manner or another to debug your program in such way that it will be more easy to correct this kind of errors. Here, there exist the possibility that a segmentation fault error could have been signaled in one function of the module which is being implemented by the student. However, It is possible that this error persist in other of the implemented functions of the same module. So, here a carefully error search must be necessary. The idea is the following : define Print(X) { FILE *ifp; ifp=fopen("/dev/tty","w"); fprintf(ifp,"%s\n",X);fclose(ifp); } declared this macro, the next step is to call this macro in each possible external routine(function) that belongs to the module the students are implementing. For example, consider the external procedure [dispatch] that belongs to the cpu scheduling module. here we can call Print ("begin dispatch") and at the end of this module we can say Print ("end dispatch"). the following step is to compile and link your moudule with the rest of the modules. Once, your module is error free, the next step is to run the complete OSP simulator and see what happens. Here can happens two things : 1) In the computer screen you will be able to see a sequence of messages. For example, in the CPU scheduling module you could see the message "begin dispatch", but you could never see the message "end dispatch" which means that there is an error in this subroutine and it must be corrected. 2) Another possible reason could be that you will never see the message "end dispatch", but it does not necessarily mean that the error will be in this subroutine. Here, a good thinking is necessary since that [dispatch] is a subroutine that depends on [insert_ready] subroutine. The idea is that you will see in the computer screen a sequence of messages like these : "begin dispatch", "begin insert_ready", but you will never see the end of these functions. So, maybe here in the last one you can find the ugly error. One final advise : using [lint], the C type checker, may help spot problems at an early stage. The solution for the confusion of the PTBR register is the following : We all know that in order to execute a program, it must be loaded completely in memory and then give the signal to the cpu in order to begin execution. (Consider FIFO implementation) if (head == NULL) PTBR=NULL /* no ready processes */ else { prepage(head); /* prepage ready process */ PTBR = head ---> page_table; /* pointing to the new */ PTBR-->pcb-->status = running; /* process page_table. */ } /* process running right */ /* now. */ SOCKETS A socket is a bidirectional port that a process creates in order to send and receive messages. The intuition behind this assignment is to generate interprocess communication. here, a process can specify a connection between a local socket that it has created(host socket) and a socket belonging to another process(peer socket); it only needs to know the name of the peer socket. After establishing the connection, the processes are free to exchange messages. PROBLEMS FOUND DURING IMPLEMENTATION : Here, we found problems of segmentation fault and bus errors. However, these errors can be solutioned with the strategy given in the CPU scheduling part. Other errors that students had during their design were : NOTE : the following references were obtained from the OSP booklet. EXIT_CODE open_sock(so_type,new_socket,so_name) page 57 "Fill in the socket template provided by SIMCORE as a parameter; i.e., initialize the fields so_type, is_connected, is_accepting, so_inbuf, pcb, prrb, and so_error." Problem : It is not clear what fields pcb and prrb should be initialized to. They should be initialized as follows : pcb = PTBR--> pcb; prrb = NULL void purge_sockets(pcb) page 62 and 63 " Purges the open sockets of process represented by pcb..." Problem : One can conclude that it is necessary to remove such a socket from our open_socket_index table. However, this booklet does not say if it is necessary to update the fields of the socket as follows : open_so_table[i]-->num_msg = 0; open_so_table[i]-->is_connected = false; open_so_table[i]-->is_accepting = false; open_so_table[i]-->pcb = NULL; open_so_table[i]-->prrb = NULL; open_so_table[i]-->so_inbuf = 0; open_so_table[i]-->pr_sw = NULL; open_so_table[i]-->dgram_msg_list = NULL; Finally, another important confusion, which was disturbing the students, was the difference between a stream socket connection and a datagram socket connection. How these two sockets can be connected, disconnected, closed, opened, and how to call the appropriate protocol in each moment. In the following example, I show you how to establish a connection between processes and how to call an appropriated protocol. EXIT_CODE connect_sock (socket, peer_so_name) SOCKET *socket; char *peer_so_name; { if (socket->so_type != stream) return fail; /* if socket already connected return fail */ if (socket->is_connected == true) return fail; /* request the protocol layer to make the connection */ if (!socket->pr_sw->pr_routines.pr_stream. pr_connect(socket,peer_so_name)) return fail; /* mark the socket as connected */ socket->is_connected = true; return ok; } RESOURCES AND MEMORY MANAGEMENT Here, I decided to join these two projects since with them the students had several similar problems. The intuitive idea behind the resource management project was that in a multiprogrammed system, processes compete for access to the resources of the system, e.g., cpu cycles, memory space, files, I/O devices, etc. This can result in a deadlock ( a circular wait among a set of processes, each waiting for a resource held by another process in the set). The purpose of module resources is to manage the allocation of resources to user processes. By another hand, the intuitive idea of the module memory management is to manage the memory with a form of virtual memory called paging. PROBLEMS FOUND DURING IMPLEMENTATION : The errors found in these two implementations were the following : first the segmentation fault and bus error which was discussed previously. Another error was the inconsistency with tables. This error was very popular in this two projects since the tables that contains SIMCORE must be consistent with the student's tables. Here the error of inconsistency in the Resource project was due to the addition and subtraction of available resources into the resource table, the thing was that the students sometimes add and subtract twice the same amount of resources without be aware of the situation. The same thing happens in module memory, the students incremented or decremented the count of a frame twice without be aware of the situation. Another typical problem was the bad understanding of interrupts. In both projects the students had to use interrupts. In the module resource, the interrupts that were used are the following : sigsvc,waitsvc, and killsvc. Monitor calls, sigsvc and waitsvc, provide an event-based facility for synchronizing processes. Any number of processes can wait on a single event E, and signaling that event has the effect of awakening all processes waiting on E. The killsvc interrupt is generated when a process terminates abnormally. This interrupt is used in order to delete deadlocked processes. On the other hand, the interrupt used in module memory was pagefault. This interrupt is called in case of pagefault; calls get_page( ) to bring the desired page of the current process into the memory. The problem here was that when an interrupt occurs, any process in the cpu must be suspended given that place to another process, then it is necessary to store the process that will be used by the interrupt handler with a temporal variable, otherwise this process could be lost. POSSIBLE SOLUTIONS : NOTE : The following segments of code are not the complete solutions for each specific external subroutine. Resource management : EXIT_CODE acquire(rrb) RRB *rrb; { PCB * saved_ptr; int temp; temp = rrb-->rsrc-->rsrc_id; saved_ptr = PTBR-->pcb; if (Resource_Tbl[temp].avail_qty < rrb-->quantity) if (saved_ptr == rrb-->pcb) /* at this point you can use */ { /* whatever you want saved_ptr */ /* or PTBR-->pcb */ Int_Vector.cause = waitsvc; Int_Vector.event = rrb-->event; Int_Vector.pcb = rrb-->pcb; get_int_handler(); } ............. ............. ............. } Memory management : void refer(logic_addr,action) int logic_addr; ACTION action; { PCB *saved_ptr; int pagenum; saved_ptr = PTBR-->pcb; pagenum = logic_addr/PAGE_SIZE; if (PTBR-->page_entry[pagenum].valid == false) { Int_Vector.cause = pagefault; Int_Vector.page_id = pagenum; Int_Vector.pcb = PTBR-->pcb; /* before interrupt you can */ gen_int_handler(); /* use it. */ } index = saved_ptr-->page_tbl-->page_entry[pagenum].frame_id; ................ /* after interrupt you have to use saved_ptr */ ................ ................ } GENERAL HINTS We have to be aware of the following crucial things for any module implementation of the OSP simulator : 1) If for some reason you use queues be sure to initialize your queues to NULL before you continue with the following function implementation in any module. 2) If you creates new data structures be sure to allocate memory for the variables of the types that belong to the OSP structure and for the types that you will create. 3) Problems with C language are very typical.