Next Previous Contents

3. x86 architecture specific questions

3.1 Why it doesn't work on my machine?

  1. Can I use my Cyrix/AMD/non-Intel CPU in SMP?

    Short answer: no.

    Long answer: Intel claims ownership to the APIC SMP scheme, and unless a company licenses it from Intel they may not use it. There are currently no companies that have done so. (This of course can change in the future) FYI - Both Cyrix and AMD support the non-proprietary OpenPIC SMP standard but currently there are no motherboards that use it.

  2. Why doesn't my old Compaq work?

    Put it into MP1.1/1.4 compliant mode.

    check "Configure Hardware" -> "View / Edit details" -> "Advanced mode" (F7 I think) for a configuration option "APIC mode" and set this to "full Table mode". This is an official Compaq recommandation. (Daniel Roesen)

    (Adrian Portelli)To do this:

    1. Press F10 when the server boots to enter the System Configuration Utility
    2. Press Enter to dismiss the splash screen
    3. Immediately press CTRL+A
    4. A message will appear informing you that you are now in "Advanced Mode"
    5. Then select "Configure Hardware" -> "View / Edit details"
    6. You will then see the advanced settings (intermixed with the ordinary ones)
    7. Stroll down to "APIC Mode" and then select "Fully Mapped"
    8. Save changes and reboot

  3. Why doesnt my ALR work?

    From Robert Hyatt : ALR Revolution quad-6 seems quite safe, while some older revolution quad machines without P6 processors seem "iffy"...

  4. Why does SMP go so slowly? or Why does one CPU show a very low bogomips value while the first one is normal?

    From Alan Cox: If one of your CPU's is reporting a very low bogomips value the cache is not enabled on it. Your vendor probably provides a buggy BIOS. Get the patch to work around this or better yet send it back and buy a board from a competent supplier.

    A 2.0 kernel (> 2.0.36) contains the MTRR patch which should solve this problem (select option "Handle buggy SMP BIOSes with bad MTRR setup" in the "General setup" menu).

    I think buggy SMP BIOS handling is automatic in latest 2.2 kernels.

  5. I've heard IBM machines have problems

    Some IBM machines have the MP1.4 bios block in the EBDA, allowed but not supported below 2.2 kernels.

    There is an old 486SLC based IBM SMP box. Linux/SMP requires hardware FPU support.

  6. Is there any advantage of Intel MP 1.4 over 1.1 specification?

    Nope (according to Alan :) ), 1.4 is just a stricker specs of 1.1.

  7. Why does the clock drift so rapidly when I run linux SMP?

    This is known problem with IRQ handling and long kernel locks in the 2.0 series kernels. Consider upgrading to a later 2.2 kernel.

    From Jakob Oestergaard: Or, consider running xntpd. That should keep your clock right on time. (I think that I've heard that enabling RTC in the kernel also fixes the clock drift. It works for me! but I'm not sure whether that's general or I'm just being lucky)

    There are some kernel fixes in the later 2.2.x series that may fix this.

  8. Why are my CPU's numbered 0 and 2 instead of 0 and 1 (or some other odd numbering)?

    The CPU number is assigned by the MB manufacturer and doesn't mean anything. Ignore it.

  9. My quad-Xeon system hangs as soon as it has decompressed the kernel

    (Doug Ledford) Try recompiling LILO with LARGE_EBDA support and then making sure to always use make bzImage when compiling the kernel. That appears to have fixed the SMP boot hangs here on Intel multi-Xeon boards. However, please note that this also appears to break LILO in that the root= option no longer works, so make sure you rdev your kernel image at the same time you run lilo to make sure that the kernel loads the correct root filesystem at boot.

    (Robert M. Hyatt) With 3 cpus, do you have a terminator in the 4th slot?

  10. During boot machine hang signaling an IOAPIC problem

    Try boot options "noapic" (John Aldrich) and/or "reboot=bios" (Terry Shull).

  11. My system locks up during heavy NFS traffic

    Try the later 2.2.x kernels and the knfsd patches. This is currently under investigation. (Wade Hampton)

  12. My system locks up with no oops messages

    If you are using kernels 2.2.11 or 2.2.12, get the latest kernel. For example 2.2.13 has a number of SMP fixes. Several people have reported these kernels to be unstable for SMP. These same kernels may have NFS problems that can cause lockups. Also, use a serial console to capture your oops messages. (Wade Hampton)

    If the problem remains (and the other suggestions on this list didn't help either), then you could try the latest 2.3 kernels. They have more verbose (and more robust) SMP/APIC code, and automatic hard-lockup-prevention code which will produce meaningful oopses instead of a silent hang. (Ingo Molnar)

    (Osamu Aoki) You MUST also disable all BIOS related power save features. Example of good configuration (Dual Celeron 466 Abit BP6):


     POWER MANAGEMENT SETUP.
       ACPI:              Disabled
       POWER MANAGEMENT:  Disabled
       PM CONTROL by APM: No
    

    If power management features are activated, some random freeze can occur.

  13. Debugging lockups

    (item by Wade Hampton)

    A good means of debugging lockups is to get the ikd patch from Andrea Arcangeli: ftp://ftp.suse.com/pub/people/andrea/kernel-patches

    There are several of debug options, but do NOT use the soft lockup option! For newer SMP boxes, turn kernel debugging then turn on the NMI oopser. To verify that the NMI oopser is working, after booting the new kernel, /cat /proc/interrupts and verify that you are getting NMIs. When the box locks up, you should get an OOPS.

    You may also try the %eip option. This allows the kernel to print on the console the %eip address every time a kernel function is called. When the box locks up, write down the first column ordered by the second column then lookup the addresses in the System.map file. This works only in console mode.

    Also note that the use of a serial console can greatly facilitate debugging kernel lockups, not just SMP kernel lockups!

  14. "APIC error interrupt on CPU#n, should never happen" messages in logs

    A message like:


    APIC error interrupt on CPU#0, should never happen.
    ... APIC ESR0: 00000002
    ... APIC ESR1: 00000000
    

    indicates a 'receive checksum error'. This cannot be caused by Linux as the APIC message checksumming part is completely in hardware. It might be marginal hardware. As long as you dont see any instability, they are not a problem - APIC messages are retried until delivered. (Ingo Molnar)

3.2 Possible causes of crash

In this section you'll find some possible reasons for a crash of an SMP machine (credits are due to Jakob Østergaard for this part). As far as I (David) know, theses problems are Intel specific.

3.3 Motherboard specific information

Please note: Some more specific information can be found with the list of Motherboards rumored to run Linux SMP

Motherboards with known problems

3.4 Low cost SMP Linux box (dual Celeron box)

(Stéphane Écolivet)

The lowest cost SMP Linux boxes with nowadays buyable processors are dual Celeron systems. Such a system is not officially possible according to Intel. Better think about the second generation of Celeron, those with 128 Kb L2 cache.

Is it possible to run a dual Intel Celeron box ?

Official answer from Intel: no, Celeron cannot work in SMP mode.

Practical answer: it is possible, but requires hardware alteration for Slot 1 processors. Alteration is described by Tomohiro Kawada on his Dual Celeron System page. Of course, this kind of modification removes warranties... Some versions of Celeron processor are also available in Socket 370 format. In that case, alteration may just be done on the Socket 370 to Slot 1 adapter or may even be sold pre-wired for SMP use. (Andy Poling, Hans - Erik Skyttberg, James Beard)

There is also a motherboard (ABIT BP6) allowing two Celerons in Socket 370 format to be inserted (Martijn Kruithof, Ryan McCue). ABIT Computer BP6 verified tested and native to linux with dual ppga socket 370 (Andre Hedrick).

How does Linux behave on a dual Celeron system ?

Fine, thank you.

Celeron processors are known to be easily overclockable. And dualCeleron system ?

It may work. However, overclocking this kind of system is not as easy as overclocking a mono-processor one. It is definitly not a good idea for a production system. For personal use, dual Celeron 300A systems running rock-solid at 450 MHz have been reported. (numerous people)

And making a quad Celeron system ?

It is impossible. Celeron processors have nearly the same features as basic Pentium II chips. If you want more than 2 processors in your system, you'll have to look at Pentium Pro, Pentium II Xeon or Pentium III (?) boxes.

What about mixing Celeron and Pentium II processor ?

A system using a "re-enable" Celeron processor and a Pentium II processor with the same steppings may theorically work.

Alexandre Charbey as made such a system:


Next Previous Contents