Archive

You are currently browsing the archives for the xen category.

Mar

8

Linux 2.6.33 64bit xen domU and CONFIG_DEBUG_RODATA

By cormander

So I setup new jobs in my Hudson server to track the “latest” kernel. It pulls in the latest changes, builds the kernel image, and with the -xenU ones, does a test boot. The current latest as of today is linux version 2.6.33.

It works on 32bit:

http://build.cormander.com/job/linux-2.6.latest-i686-vanilla-xenU/2/console

It does not work on 64bit:

http://build.cormander.com/job/linux-2.6.latest-x86_64-vanilla-xenU/2/console

But works on 64bit with grsecurity:

http://build.cormander.com/job/linux-2.6.latest-x86_64-grsec-xenU/1/console

Here is the output of the fail:

[    0.370394] EXT3-fs (xvda2): warning: maximal mount count reached, running e2fsck is recommended
[    0.372607] EXT3-fs (xvda2): using internal journal
[    0.372632] EXT3-fs (xvda2): mounted filesystem with writeback data mode
[    0.372670] VFS: Mounted root (ext3 filesystem) on device 202:2.
[    0.372771] Freeing unused kernel memory: 668k freed
[    0.373202] Write protecting the kernel read-only data: 10240k
[    0.379890] Freeing unused kernel memory: 648k freed
[    0.379910] BUG: unable to handle kernel paging request at ffff88000155e000
[    0.379922] IP: [] free_init_pages+0xb1/0xda
[    0.379939] PGD 1a2a067 PUD 1a2e067 PMD 1d38067 PTE 1000000155e025
[    0.379955] Oops: 0003 [#1] SMP
[    0.379965] last sysfs file:
[    0.379973] CPU 0
[    0.379984] Pid: 1, comm: swapper Not tainted 2.6.33 #1 /
[    0.379992] RIP: e030:[]  [] free_init_pages+0xb1/0xda
[    0.380005] RSP: e02b:ffff880007c5fea0  EFLAGS: 00010286
[    0.380005] RAX: 00000000cccccccc RBX: ffff88000155e000 RCX: 0000000000000400
[    0.380005] RDX: ffffea000004ac91 RSI: 0000000000000000 RDI: ffff88000155e000
[    0.380005] RBP: ffff880007c5fed0 R08: 0000000000000000 R09: ffff880007c08000
[    0.380005] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000400
[    0.380005] R13: ffff880001600000 R14: ffffea0000000000 R15: 00000000cccccccc
[    0.380005] FS:  0000000000000000(0000) GS:ffff880001d45000(0000) knlGS:0000000000000000
[    0.380005] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[    0.380005] CR2: ffff88000155e000 CR3: 0000000001a29000 CR4: 0000000000000660
[    0.380005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.380005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000000
[    0.380005] Process swapper (pid: 1, threadinfo ffff880007c5e000, task ffff880007c60000)
[    0.380005] Stack:
[    0.380005]  0000000000000000 6db6db6db6db6db7 0000000000000400 ffff880000000000
[    0.380005] <0> ffffffff81600000 0000000000000000 ffff880007c5ff00 ffffffff8102c9c4
[    0.380005] <0> ffffffff81b9d960 0000000000000040 ffffffff81afbc60 ffffffff81afbc68
[    0.380005] Call Trace:
[    0.380005]  [] mark_rodata_ro+0xe0/0x146
[    0.380005]  [] init_post+0x2b/0x19d
[    0.380005]  [] kernel_init+0x19f/0x1aa
[    0.380005]  [] kernel_thread_helper+0x4/0x10
[    0.380005]  [] ? int_ret_from_sys_call+0x7/0x1b
[    0.380005]  [] ? retint_restore_args+0x5/0x6
[    0.380005]  [] ? kernel_thread_helper+0x0/0x10
[    0.380005] Code: 89 df e8 27 49 00 00 48 c1 e8 0c 48 89 df 4c 89 e1 48 6b c0 38 48 81 e7 00 f0 ff ff 31 f6 4c 01 f0 c7 40 08 01 00 00 00 44 89 f8  ab 48 89 df 48 81 c3 00 10 00 00 e8 93 a3 08 00 48 ff 05 33
[    0.380005] RIP  [] free_init_pages+0xb1/0xda
[    0.380005]  RSP
[    0.380005] CR2: ffff88000155e000
[    0.380005] ---[ end trace 39c6a8b0e7165bad ]---
[    0.394371] swapper used greatest stack depth: 5160 bytes left
[    0.394385] Kernel panic - not syncing: Attempted to kill init!
[    0.394395] Pid: 1, comm: swapper Tainted: G      D    2.6.33 #1
[    0.394403] Call Trace:
[    0.394413]  [] panic+0x75/0x137
[    0.394425]  [] ? exit_ptrace+0xb1/0x131
[    0.394436]  [] do_exit+0x77/0x777
[    0.394446]  [] ? xen_restore_fl_direct_end+0x0/0x1
[    0.394458]  [] ? kmsg_dump+0x126/0x140
[    0.394470]  [] ? __acpi_nmi_disable+0x14/0x1d
[    0.394480]  [] oops_end+0xb9/0xc1
[    0.394490]  [] no_context+0x1f3/0x202
[    0.394500]  [] ? __acpi_nmi_disable+0x14/0x1d
[    0.394511]  [] ? atomic_notifier_call_chain+0x13/0x15
[    0.394522]  [] __bad_area_nosemaphore+0x1c0/0x1e6
[    0.394533]  [] ? xen_force_evtchn_callback+0xd/0xf
[    0.394544]  [] ? check_events+0x12/0x20
[    0.394554]  [] ? xen_force_evtchn_callback+0xd/0xf
[    0.395349]  [] ? check_events+0x12/0x20
[    0.395349]  [] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
[    0.395349]  [] bad_area_nosemaphore+0xe/0x10
[    0.395349]  [] do_page_fault+0x1a0/0x2dd
[    0.395349]  [] page_fault+0x25/0x30
[    0.395349]  [] ? free_init_pages+0xb1/0xda
[    0.395349]  [] mark_rodata_ro+0xe0/0x146
[    0.395349]  [] init_post+0x2b/0x19d
[    0.395349]  [] kernel_init+0x19f/0x1aa
[    0.395349]  [] kernel_thread_helper+0x4/0x10
[    0.395349]  [] ? int_ret_from_sys_call+0x7/0x1b
[    0.395349]  [] ? retint_restore_args+0x5/0x6
[    0.395349]  [] ? kernel_thread_helper+0x0/0x10

A little digging and I unset CONFIG_DEBUG_RODATA in the vanilla configuration (it gets unset by the grsecurity patch) and rebuilt it. It works! Okay, so we have a regression in xen pv_ops when RODATA is enabled, because linux-2.6.32.9 (and other 2.6.32 kernels before it) boot just fine with DEBUG_RODATA enabled.

I don’t even know who to report this to. Then again, I don’t use vanilla xen domU kernels.. I just use them as a reference point when testing vanilla vs. grsecurity. I only actually use the grsecurity xen domU kernels, and those work. So I guess I’ll just sit back and wait for someone else to notice, and fix it.

Oct

16

Proposed Architecture for RavenCore + Xen

By cormander

Here is my proposed architecture for Xen being incorporated into RavenCore:

* Everything is based on a resource, eg; CPU, memory, disk, IP addresses, etc
* Resources are “detected” when you add a dom0 (xen host) to the cluster
* The total number of resources in the cluster are put into the “Administrative Resource Pool”
* The admin user create clients, and can assign them resources. The total number of client resources cannot exceed the amount in the admin pool (for obvious reasons)
* The clients can then use those resources to allocate as many virtual machines as they like – one big virtual machine using it all, or many small ones using a small portion each.
* A virtual machine, if desired, can be defined as a “shared host” and that client then has administrative rights to the provisioning of domains, email, etc on that virtual host. Basically, they get RavenCore as it is today on that VM
* If you add a server to the cluster that is not a dom0, it can still be defined as a “shared host” and either assigned to a client (who will become the admin of it) or just create clients on it directly.

What I want to stay away from in this design, for now, is hardware dependency. In particular, dependency on storage. To start with anyway, I’m not going to build in logic in the interface for things like a SAN. If you have one – great – you will be able to use it, but disk configuration is manual. Basically, when you install the control panel, you tell it what disk(s) you’d like to put into the admin pool. You have to manually create them (either fs files with dd, or a physical disk itself, or LVM logical volumes) and then point to them from the interface.

Sep

23

Xen guest install guide

By cormander

Very similar to my Install FC9 into a chroot from a DVD iso article, I have created a new page on this site:

Quick CentOS5 Xen paravirt guest install

It’s a page because I’m going to keep it updated as much as possible with kernel versions and such so people can continue to copy/paste out of it.

Feedback is welcome, just send me an email.

Sep

17

weird pygrub fs cache issue

By cormander

Here’s something for the weird scrapbook, an issue I had this morning with pygrub seeing an _old_ snapshot of the /boot partition of a Xen VM. On this VM I re-installed new kernel (from source), removed a few really old ones, and rebooted. After not coming back up for several minutes I got a console to it from the Xen host running CentOS5, and I got a kernel panic from the VM.

Looking at the output, it was trying to load modules from /lib/modules with unknown symbols, which typically happens when those modules weren’t compiled with the correct kernel symvers file. I had just manually installed the kernel, so I knew this wasn’t the case, but maybe I had copied them into the wrong directory? Puzzled, halted it and started it from the console with -c to see it as it booted up. I noticed that the number of kernels in the pygrub menu was 4, when there really were only 2 kernels in the menu.lst file.

So, I stop it there and mounted up the /boot partition from the virtual machine’s disk on the host. I looked at it, and as I expected, only two kernels in the menu.list file. Unmount and start again, and pygrub was still showing 4 kernels to choose from.

If I tried to boot a kernel that didn’t really exist, it would boot the vmlinuz and initrd file, but fail to load anything from /lib/modules on the root partition, because I uninstalled them. If I tried to boot the kernel I just reinstalled, it would panic on unknown symbols on boot. I had only one bootable kernel in the virtual machine (the second one, which still really existed).

Looking in the process tree of the host, I spotted a hdparm command that was ran on the device that this virtual machine is pointed to. I sent it a TERM signal and it didn’t die, so I started sending it KILL ( -9 ) signals. It still didn’t die. The only explanation that I could come up with is that it was a process that was hung waiting on a kernel thread.

It then suddenly hit me, even though I mount the /boot partition from the host and see what is really there, pygrub doesn’t do a mount. It actually does an open system call on the block device and reads data directly. This would put the contents of the /boot partition in filesystem cache. Now with the hdparm command open with that block device as a filehandle, and nothing else “writing” to the block device, I was guessing it would be held in the host kernel’s filesystem cache, so pygrub never actually touches the disk.

Not wanting to have to reboot, I tried clearing the filesystem cache by running these commands:

sync
echo 1> /proc/sys/vm/drop_caches

Then I ran the free command to confirm the filesystem cache was in-fact cleared. After it was, I started up pygrub again, BAM, it sees only two kernel entries, as it should.

I’ve never seen this happen before, but it sure was strange. This scenerio makes me want to see if I can backport the new pv-grub feature which is in the recently released Xen-3.3 to the CentOS5.2 version of xen.