Notes from operating systems class. Full 14 week course with Dr. Perry Kivolowitz.
Address space on x64 is 2^64 HIMEM
(Can't go to 0, no null dereference. 0 is guarded to catch common 'pointer problems')
W,R,B,Z
W: Waiting
(aka Runnable, Sleeping)
(Ready to go from W to R, needs to be scheduled by OS.)
R: Running
(On CPU executing.)
(May be descheduled back to W eg. waiting for IO or Sleep.)
(May go from R to Z 'zombie'.)
B: Blocked
(Running can go to blocked if eg. waiting for IO or Sleep, when finished - goes back to W.)
(Cannot go from B to R directly.)
Z: Zombie
(Entire address space has been deleted, some data tracked by OS.)
(Housed in process control table.)
(Exists until parent of process recieves information required.)
Most number of running processes (on CPU) you have is == num cores.
PID:0 is the 'first process' - init
(Built by OS during boot.)
(init checks for dead process' exit with no parent. Releases Zs.)
(Keeps where B & W left off. Copy of registers.)
(PC, & all registers eg. SP.)
(Keeps process states: WRBZEU.)
(E is embryo process, still being built/constructed and address space allocation -> goes to W.)
(U is unused 'null' and is available to become a process.)
(Open files)(openfiletable array)
(inodes contain data block information)
(Process name)
(simplistic & limited)
(stack can't move/grow)
Address: 0
Code
Globals
Stack (fixed)
Heap β
Address: Himem
XV6p add guard page XV6p add stack
create shell - user programs 'Toy' shell - limited function Connected via pipes
Open File Table | openfiletable array
-------------- Sys Calls:
| stdin cin | 0 open
-------------- close
| stdout cout| 1 read
-------------- write
| stderr cerr| 2 seek
-------------- dup/dup2
| | 3 pipe
-------------- fork
| | 4 exec
-------------- execvp
| | 5
______________
Fork returns 0 in child
Fork returns PID in parent
pid = fork();
if(pid > 0){
//you are parent
}
else {
//you are child
}
C++ vs. xv6 C methods:
//C++:
cout << "pid" << getpid() << endl;
//C:
printf(fd, fmt, "print");
Variable number of arguments
void foo(...);
OFT
0 std in
1 std out
2 std err
3 rs (first avail)(will get closed by parent)
4 ws
Parent code:
int pipe fds[2];
= pipe(pipefds);
pid = fork() //copy of OFT
if (pid > 0) {
//parent
close(pipefds[0]);
write(pipefds[1],______);
}
else if (pid == 0) {
//child
close(pipefds[1]);
close(0); //STD_FILE null ?
dup(pipefds[0]);
close(pipefds[0]);
=read(0); //exec("wc")
}
OFT child
0 rs
1 std out
2 std err
3
4
pipefds
|read|write|
//'bookkeeping' is important rs & ws need to be closed when applicable
parent
ls stdout ->
stdin -> wc
= pipe(pipefds);
fork();
//if c1
else {
fork();
//if c2
}
close(rs);
close(ws);
...
wait(c1); //gets pid returned from fork
wait(c2);
OFT parent
0
1
2
3 rs
4 ws
//if child
close(rs);
close(1);
dup(ws);
close(ws);
exec("ls");
OFT c1
0 in
1 ws
2 err
3
4
//if c2
close(ws);
close(0);
dup(rs);
close(rs);
exec("wc");
OFT c2
0 rs
1 out
2 err
3
4
-i & -o will be implimented in children
-i Child will open file and move fd where it needs to go
-o Child will open file and move fd where it needs to go before exec (no return from exec)
-a and -o can't both be called, use the last called option.
Can start P1, use getopt
Programs execute directly on CPU
'Limited' because certain parts are executed.
Some instructions can only execute in priveleged state. (Elevation from user code to kernel).
AKA, unless you need something special, your program is executing directly on the CPU.
Fork system call flow/pipeline in XV6:
fork();
puts # in a register
causes T_SYSCALL trap //your process is locked elevates privelages
alltraps //does bookkeeping & calls trap() Executes at kernel.
return from trap (de-escelates privleges)
include in user program
T_SYSCALL 64 //someone wants to make a sys call
During boot, the kernel initializes a trap table.
XV6 is simplified.
Entire trap table is one call after another to all traps. (If you get here call all traps, generall catch all).
//Optimizes the common case.
//Contains
// System call numbers
#define SYS_fork 1
#define SYS_exit 2
#define SYS_wait 3
#define SYS_pipe 4
#define SYS_read 5
#define SYS_kill 6
#define SYS_exec 7
#define SYS_fstat 8
#define SYS_chdir 9
#define SYS_dup 10
#define SYS_getpid 11
#define SYS_sbrk 12
#define SYS_sleep 13
#define SYS_uptime 14
#define SYS_open 15
#define SYS_write 16
#define SYS_mknod 17
#define SYS_unlink 18
#define SYS_link 19
#define SYS_mkdir 20
#define SYS_close 21
syscall(void)
{
int num;
struct proc *curproc = myproc();
num = curproc->tf->eax;
if(num > 0 && num < NELEM(syscalls) && syscalls[num]) {
curproc->tf->eax = syscalls[num]();
} else {
cprintf("%d %s: unknown sys call %d\n",
curproc->pid, curproc->name, num);
curproc->tf->eax = -1;
}
}
Operating system internals are like the internals of a watch ~ Dr. Perry Kivolowitz
fork
pip
exec
dup/dup2
Mentioned:
seek(fd,pos,flag)
//essentially a 'tape deck', positions in files can be read or written to (if random access). Flag denotes relativity to beginning, current, or end position of file. Part of inode is current position in open file.
Redirect vs pipe:
opening a pipe gives two file descriptors (read & write).
Manipulation of std in/out using close() and dup().
Once something is connected to fd[0] (std in), process' have no context to this.
Redirect effects std input.
The shell handles '>' as overwrite and '>>' as append if destination is existing.
Shell '<' gives an input to a command.
Std err can be redirected using '2>'.
Read:
(OSTEP ch 7) https://pages.cs.wisc.edu/~remzi/OSTEP/cpu-sched.pdf
(OSTEP ch 8) https://pages.cs.wisc.edu/~remzi/OSTEP/cpu-sched-mlfq.pdf
Initial assumptions:
Great Idea (of CS):
Create an algorithm that connot be realized in the actual world. Why? (Perfection is the benchmark)
Turn around Time:
Response Time:
First Come First Serve (FCFS)
Shortest Job First (SJF)
Shortest Time-to-Completion First (STCF)
Round-Robin (RR)
Scheduling quantum or time slice for Round Robin (RR) scheduling algorithm.
Ideal quantum?...
When quantum is used up process goes from running to waiting.
Multi-Level Feedback Queue (MLFQ)
Basic rules for MLFQ:
Ideal epoch?...
Mentioned:
Every queue level can have different parameters. e.g. quantam can be different.
Sophisticated OS may have 50 or more queues.
P2 will be first project inside XV6
Goal: Add guard page for NULL dereference
(Heap is the only thing allowed to expand)
Modify address space so that there is,
Mentioned:
Kernel printf (for limited debugging)
proc.c
ln 210 copyuvm located in vm.c
The linker is going to place "main()" at a fixed location.
( Ν‘Β° ΝΚ Ν‘Β°) $(LD) $(LDFLAGS) -N -e main -Ttext 0x1000 -o $@ $^
STCF, SJF, and FCFS experience the convoy effect.
DBMSs and OSs have a lot in common.
Mentioned:
Third project will be a 'lottery scheduler'.
Lottery Scheduler
e.g. Tickets issued:
A = 100
B = 50
C = 200
----
100|50|200
100|150|350
A B C
Use random number to win the lottery.
Cons: Short term inaccurate. Sorting/searching might not scale well with large # of processes.
Pros: Long term settles to accurate. Easy to impliment.
Can impliment temporary boosts of priority.
Can assign process groups (sub a sub c etc.).
Stride Scheduler
Bypasses disadvantages of randomess and produces same benefits of lottery.
Each program is assigned a stride.
divide by 10,000
Priorities | Stride
A: 100 100
B: 50 200
C: 200 50
-------------------
Pass | A | B | C | (Run)
0 0 0 A
100 0 0 B
100 200 0 C
100 200 50 C
100 200 100 A
200 200 100 C
200 200 150 C
200 200 200 B
200 400 200 A
300 400 200 C
300 400 250 C
300 400 300 A
400 400 300 C
...
---------------
Update pass value when quantum is exhausted.
P3 will be to impliment Stride Scheduling.
Difficulty of P3 is parsing the input.
Special case for freshly awoken sleep process' - get reassigned the smallest existing pass value. (process pass = minimum pass)
When a new process is added - pass value gets assigned the same as the smallest existing pass value.
Mentioned:
Stride, when a process is scheduled and running it is temporarily removed from stride table view.
P3 CSV commands in file.
P3 can be implimented however necessary as it is simulated (linear search acceptable).
P3 is 'deterministic'.
Grade will be known ahead of time.
For P3, take advantage of CPP's abilities.
Completely Fair Scheduler (CFS) quantum is adjusted based on number of runnable tasks.
Review P2
Hints were given. See notes above from 9.20.21.
Multiprocessor scheduling.
CPU CPU (registers)
cache cache (L1)(L2)...
|_________|
|
Memory (GB)
A thread shares an address space vs. forked proceses. (roomates vs. neighbors).
New & delete are based on malloc() and free(). New is intelligent based on context - malloc() just gives bytes.
void * malloc(size_t);
Note: in cpp, sizeof is an operator (like + or -), remember to not get sizeof pointers.
void free(void*)
p = malloc(0x1000);
free(p*);
p is a pointer to:
|sentinel|size|sentinel|4096 bytes|
Sentinel will detect mem over/underruns.
free() knows how to interpret proper size from pointer p.
Don't forget to free() for every malloc().
Freeing too soon is a common bug especially with multithreading.
Don't free properly allocated memory more than once.
Common bug:
int * p;
instead do:
int * p = malloc(sizeof(int));
or:
int * p = NULL;
Good practice:
//Create & init
struct foo * f = malloc(sizeof(foo));
memset(f,0,sizeof(foo));
//Create & init
char * s = (char*)malloc(100);
*s = 0;
Mentioned:
Don't forget NULL terminators
Eg. assign space for c string:
malloc(strlen() + 1);
There is always a minimum of one page between heap stack.
P3
Stride scheduler refer to ch 9 of OSTEP
Some commands take one or two arguments.
Commands:
assume only perfect input
Notes:
Old OG address spaces:
Static relocation:
Dynamic relocations:
Base/Bounds, Offset/Length:
Code, heap, stack + Base/Bounds
Explicit vs. Implicit Segmentation:
Explicit:
Implicit:
Stack, heap, code is all still contiguious.
Paging:
Heap is managed in user space by the runtime library.
The OS allocates usable RAM to your address space, but then the heap is managed by runtime library.
malloc(1k) a
malloc(2k) b
malloc(1k) c
malloc(2k) d
free b
free d
Pointers will be adjusted by runtime when free spaces are touching to make a region 'coalesced' (2k + 1k = 3k free).
Managed (except for head), by list in the free memory.
Algorithms to decide how to fill 'holes':
Slab:
eg.
1K each
1000 of them
malloc(1k) * 1k
split into 1k chunks
maintain free list
Need one? Consult list and hand out.
List could be a stack.
Fast, efficient, no fragmentation.
OS is only involved once per block.
Mentioned:
Implimentation of one or more algorithms listed for a project. (P5?)
P4 will be back to xv6, P5 will be back to userland.
P4 will likely be moving the stack. Traditional layout w/stack on bottom of address space.
Can start looking at p4 *now*.
(For scalibility)
(Page tables referenced by page tables)
24 24 16
1st order | 2nd order | offset
Vbit valid -> many more pages (1000x or more) Vbit invalid -> 1000x or more invalid pages
A caching system in hardware for address translations.
Virtual Addresss
Virtual Memory
Has a backing device - e.g. disk
Has larger capacity than main memory.
Layout:
| -TLB- | P | V | D | R |
[Swap]
TLB: Translocation Lookaside Buffer
P: Pagetable
V: Valid
D: Dirty
R: Referenced
If there is a pagefault, pass handling (trap) from hardware (low level event) to software (high level event handling).
Noticed: you need some page off the swap device.
E.g.
Pages in proc. Physical RAM
1 1 2
2 2 1
3 3 3
4
FIFO
Random
LRU - least recently used
Locality
Clock Algorithm (v1)
Clock Algorithm (v2)
Methods of concurrency:
Rollups allow for correct multithread operation at full speed.
C++ program using advanced cpp-isms. Needs to be a gorgeous program.
Project will create a vector of unsigned int of arbitrary size (cmd line arg) - fill it with nums 0 to size of vector (consecutive integers) shuffle vector (ramdom order in vector 'algorithms include). Based on num threads (default 4), subdivide vector into equal portions (last will get strays). Then, use threads to 'observe' 'pile' (shared resource) and place integer onto pile (print). Exit the thread when it has gone through its segment.
Notes:
Mutual exclusion is a way to ensure that only one thread at a time can access a shared resource. Mutexes acquire a lock on a shared resource. Don't acquire a lock twice.
mutex m;
m.lock();
m.unlock();
m.try_lock(); // returns true if lock acquired.
Lock guard: Give it a mutex and it will lock it when it is created and unlock it when it is destroyed. This is a safer way to use a mutex than locking it manually. It is also a way to make sure that the mutex is unlocked if an exception is thrown.
Ways to impliment mutual exclusion:
lock(int * m) {
while(*m == 1) {
*m = 1;
}
}
unlock(int * m) {
*m = 0;
}
Locks typically require hardware support for efficient implementation. This support usually takes the form of one or more atomic instructions such as
These instructions allow a single process to test if the lock is free, and if free, acquire the lock in a single atomic operation.
//code:
while(testAndSet(*m, 1) == 1);
while(compareAndSwap(*m, 1, 0) == 1);
while(true) {
while(loadLinked(*m) == 1);
if (storeConditional(*m, 1) == 1) {
return;
}
//these are atomic instructions:
testAndSet(*old, new){
temp = *old;
*old = new;
return temp;
}
compareAndSwap(*m, int new, int expected){
temp = *m;
if(temp == expected){
*m = new;
}
return temp;
}
loadLinked(*m){
return *m;
}
storeConditional(int *m, int v){
if(noone has written to *m since LL){
*m = v;
return 1;
}
return 0;
}
int fetchAndAdd(int *ptr){
int old = *ptr;
*ptr = old + 1;
return old;
}
lock.init(tl_t*t){
t->ticket=0;
t->turn =0;
}
unlock(tl_t*){
l->turn++
}
Default algorithm results in deadlock.
Solve one of two ways:
How the flugelhorn is put down is not specified.
Use C++11 style mutexes and not pthreads.
After you play, yield the cpu. sched_yield()
Protect cout
with a mutex so no output will overlay any other output.
Immediately try to play again after yielding.
Condition variables have a built in queue.
When someone signals, there is a choice to be made: Do you wake up one of the sleeping threads or all of them.
C++ Example:
#include <mutex>
#include <condition_variable>
void child0() {
m.lock();
cout << "child0\n";
done = 1;
cv.notify_one();
m.unlock();
}
void child1() {
m.lock();
cout << "child1\n";
cv.notify_one();
m.unlock();
}
void thr_join0(){
unique_lock<mutex> lk(m);
while (done==0){
cv.wait(1k);
}
}
void thr_join1(){
unique_lock<mutex> lk(m);
cv.wait(1k);
}
Mutex is acting like a mutual exclusion controller for the queue inside the condition variable.
You can build a semaphore out of a condition variable, counter, and mutex. Semaphore is a synchronization primitive. Allows a certain number of concurrent threads but not more.
When you establish a semaphore you have to set the counter. Initialize the counter (semaphore) with the number of resources you are willing to give away IMMEDIATELY after initialization.
Semaphores are extremely flexible.
This pattern is very common.
Circular buffer / ring buffer is a producer consumer.
The producer has to synchronize with the consumer and vice versa so there is no lapping.
The semaphore is perfect for implementing producer consumer patterns.
Hint for p7: Make an array of pointers to functions, create an index - select from there.
Access to hardware is like accessing memory.
graph LR
Read --> Fread --> Cin
Read --> User/OS
Read --> Buffering_system --> Device_Driver
Device_Driver --> Hardware --> Memory_Mapped_Device_Registers
Device_Driver --> Memory_Mapped_Device_Registers
Pointers to hardware must be marked as volitile.
The way to set bits in a type can be done using bit-fields or bit-bashing:
Bit-fields:
struct CNTRL_REG_T {
unsigned short data:4;
unsigned short unused:5;
unsigned short on_off: 1;
unsigned short unused: 2;
unsigned short enable: 1;
};
CNTRL_REG_T c;
c.data
Bit-bashing:
unsigned short* c = Address;
#define GETDATA((c))* (c)& 0xF
unsigned short *c = Address;
#define GETDATA((c)) *(C) &~0xF
#define GETDATA((c)) *(C) |9
Platter spins. Data blocks (512B) are concentric.
In order to read a block, the heads must position over the data-specific ring and then wait for the rotation.
Disc will spin at fixed RPM. Rotational latency.
Blocks are written all at once or not at all.
The head must move across the surface of the disc to reach the lane/track of required block. Seek latency.
The head moving from any track to any track is around ~5ms.
HDDs are ballistic, therefore there is a Settling Latency <1ms.
Data transfer must occur, therefore there is a Transfer Latency.
Performance improvements can be made in software by changing block order.
Redundant Array of Inexpensive drives
RAID0 (Striping)
RAID1 (Mirroring)
RAID1+0 (Striping and mirroring)
RAID4 (Striping with parity)
RAID5 (Striping with rotating parity)
RAID6 (Striping with rotating double parity)
Mentioned:
P8 is building a pipe
Two parts:
- producer consumer
- shmops (Shared memory operations)
Mac OSX is a dog's breakfast. P9 is FSCK
Purpose:
Blocks are either used or free.
'lingua franca' (drive must support common tongue for access to block 0 boot block)
Bootstrapping process (booting):
graph LR
BIOS --> boot_block --> boot_loader
boot_loader --> super_block
Boot loader
Super Block
e.g. XV6 superblock structure (fs.h):
// mkfs computes the super block and builds an initial file system. The
// super block describes the disk layout:
struct superblock {
uint size; // Size of file system image (blocks)
uint nblocks; // Number of data blocks
uint ninodes; // Number of inodes.
uint nlog; // Number of log blocks
uint logstart; // Block number of first log block
uint inodestart; // Block number of first inode block
uint bmapstart; // Block number of first free map block
};
inodes - Information ninodes
XV6 inode structure:
// On-disk inode structure
struct dinode {
short type; // File type
short major; // Major device number (T_DEV only)
short minor; // Minor device number (T_DEV only)
short nlink; // Number of links to inode in file system
uint size; // Size of file (bytes)
uint addrs[NDIRECT+1]; // Data block addresses
};
Special files represent hardware:
inode addrs: Take array [13]
XV6 Directory Structure:
// Directory is a file containing a sequence of dirent structures.
#define DIRSIZ 14
struct dirent {
ushort inum;
char name[DIRSIZ];
};
Every directory entry is 16 bytes wide. (inode num (2B)+ name(14B))
Note the longest file name supported in xv6 is 14 chars.
A file is independent of the name it is referenced by.
Tips:
include using ""
open(0) //gives fd
lseek(fd, offset, whence)
close(0)
Build a model of data in ram (several data structures that duplicate data on disk).
Read in superblock into superblock struct,
struct SuperBlock sb;
lseek(fd, BISZE*1, seek_set); //Process return codes and do something!
read(fd, &sb, sizeof(sb))//Process return values.
malloc size for holding all inodes.
Record/create metadata about block n in data structure for block usage. (Helps with missing block)
Parent and current point to current if root directory.
33 items in directory, moving into another block.
Link count and number of files n should be the same.
Should be very specifc about error messages.
Metablocks = everything up to start of data.
Error should be:
Block: 65 is found more than once in the inodes:
2
3Each image has a specific problem. Report on the first error and stop processing.
./pckfs -f
Bitbash helper functin that returns value (bool) of bit. So one can say get me the bitmap entry for block n. blocknum(which block), int/blocknum( which byte) int%blocknum(which bit).
May be a lot of type casting. void* is your friend, but there is no address arithmetic with void*.
Remember,if file name is 14 bytes - there will be no null terminator.