Discussion:
Hard and silent lock up since linux 3.14 with PCIe pass through (vfio)
Andreas Hartmann
2014-09-23 19:03:18 UTC
Permalink
Hello!

Since long time now, I'm using w/o any problem PCIe pass through with a
Gigabyte GA-990XA-UD3/GA-990XA-UD3 mainboard (AMD 990X chipset) and
enabled IOMMU with vfio-pci.

The last kernel working w/o any problem is kernel 3.13.7 (I didn't use
.8 and .9, but I do not think they would have been problematic).

Since 3.14.19 (I didn't test any 3.14 kernel before) I'm encountering a
hard and silent lock up of the complete machine when starting the VM
with the PCIe card passed through.

That's the relevant PCIe card, which locks up the machine (here
running w/ 3.12.28) when passed to the VM:

03:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network Adapter (rev 01)
Subsystem: Qualcomm Atheros Device 3112
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 17
Region 0: Memory at fdbc0000 (64-bit, non-prefetchable) [size=128K]
Expansion ROM at fda00000 [size=64K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/4 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <2us, L1 <64us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout+ NonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
Capabilities: [140 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Capabilities: [300 v1] Device Serial Number 00-00-00-00-00-00-00-00
Kernel driver in use: vfio-pci
Kernel modules: ath9k


Unbinding it works w/o any problem. The lock up encounters about 4 s
after the start of the VM.

On 3.12.x, I can see the following message on the error terminal when
starting the VM:
vfio-pci: 03:00.0: invalid ROM contents.

I compared AMD-Vi debug output between 3.12 and 3.14, but couldn't see
any difference. I compared /proc/interrupts between 3.12 and 3.14
and couldn't see any difference too so far.


qemu version I'm using is 1.7.0.


It is strange(?), that a second VM using PCI (legacy) pass through works
w/o any problem. I tried to start the problematic VM even w/o running
this VM - same result: machine is locked up hard.


Do you have any idea, what could be going on there? Or how to debug it
to see what happened?



Thanks,
kind regards,
Andreas Hartmann
Alex Williamson
2014-09-23 20:07:46 UTC
Permalink
On Tue, 2014-09-23 at 21:03 +0200, Andreas Hartmann wrote:
> Hello!
>
> Since long time now, I'm using w/o any problem PCIe pass through with a
> Gigabyte GA-990XA-UD3/GA-990XA-UD3 mainboard (AMD 990X chipset) and
> enabled IOMMU with vfio-pci.
>
> The last kernel working w/o any problem is kernel 3.13.7 (I didn't use
> .8 and .9, but I do not think they would have been problematic).
>
> Since 3.14.19 (I didn't test any 3.14 kernel before) I'm encountering a
> hard and silent lock up of the complete machine when starting the VM
> with the PCIe card passed through.
>
> That's the relevant PCIe card, which locks up the machine (here
> running w/ 3.12.28) when passed to the VM:
>
> 03:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network Adapter (rev 01)
> Subsystem: Qualcomm Atheros Device 3112
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 0, Cache Line Size: 64 bytes
> Interrupt: pin A routed to IRQ 17
> Region 0: Memory at fdbc0000 (64-bit, non-prefetchable) [size=128K]
> Expansion ROM at fda00000 [size=64K]
> Capabilities: [40] Power Management version 3
> Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [50] MSI: Enable- Count=1/4 Maskable+ 64bit+
> Address: 0000000000000000 Data: 0000
> Masking: 00000000 Pending: 00000000
> Capabilities: [70] Express (v2) Endpoint, MSI 00
> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
> MaxPayload 128 bytes, MaxReadReq 512 bytes
> DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <2us, L1 <64us
> ClockPM- Surprise- LLActRep- BwNot-
> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> Compliance De-emphasis: -6dB
> LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> Capabilities: [100 v1] Advanced Error Reporting
> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout+ NonFatalErr+
> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
> AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
> Capabilities: [140 v1] Virtual Channel
> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
> Arb: Fixed- WRR32- WRR64- WRR128-
> Ctrl: ArbSelect=Fixed
> Status: InProgress-
> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
> Status: NegoPending- InProgress-
> Capabilities: [300 v1] Device Serial Number 00-00-00-00-00-00-00-00
> Kernel driver in use: vfio-pci
> Kernel modules: ath9k
>
>
> Unbinding it works w/o any problem. The lock up encounters about 4 s
> after the start of the VM.
>
> On 3.12.x, I can see the following message on the error terminal when
> starting the VM:
> vfio-pci: 03:00.0: invalid ROM contents.
>
> I compared AMD-Vi debug output between 3.12 and 3.14, but couldn't see
> any difference. I compared /proc/interrupts between 3.12 and 3.14
> and couldn't see any difference too so far.
>
>
> qemu version I'm using is 1.7.0.
>
>
> It is strange(?), that a second VM using PCI (legacy) pass through works
> w/o any problem. I tried to start the problematic VM even w/o running
> this VM - same result: machine is locked up hard.
>
>
> Do you have any idea, what could be going on there? Or how to debug it
> to see what happened?

Are you able to setup a serial console on this system? Enabling sysrq
and getting a dump of task states (t) via serial is often the best way
to determine the problem. There weren't many vfio changes between 3.13
and 3.14. Have you tested whether the problem still occurs on 3.16 +
newer QEMU? Maybe also remove the ROM from the equation with the
rombar=0 option for the vfio-pci device in QEMU. Thanks,

Alex
Andreas Hartmann
2014-09-24 14:54:51 UTC
Permalink
Alex Williamson wrote:
> On Tue, 2014-09-23 at 21:03 +0200, Andreas Hartmann wrote:
>> Hello!
>>
>> Since long time now, I'm using w/o any problem PCIe pass through with a
>> Gigabyte GA-990XA-UD3/GA-990XA-UD3 mainboard (AMD 990X chipset) and
>> enabled IOMMU with vfio-pci.
>>
>> The last kernel working w/o any problem is kernel 3.13.7 (I didn't use
>> .8 and .9, but I do not think they would have been problematic).
>>
>> Since 3.14.19 (I didn't test any 3.14 kernel before) I'm encountering a
>> hard and silent lock up of the complete machine when starting the VM
>> with the PCIe card passed through.
>>
>> That's the relevant PCIe card, which locks up the machine (here
>> running w/ 3.12.28) when passed to the VM:
>>
>> 03:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network Adapter (rev 01)
>> Subsystem: Qualcomm Atheros Device 3112
>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>> Latency: 0, Cache Line Size: 64 bytes
>> Interrupt: pin A routed to IRQ 17
>> Region 0: Memory at fdbc0000 (64-bit, non-prefetchable) [size=128K]
>> Expansion ROM at fda00000 [size=64K]
>> Capabilities: [40] Power Management version 3
>> Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
>> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>> Capabilities: [50] MSI: Enable- Count=1/4 Maskable+ 64bit+
>> Address: 0000000000000000 Data: 0000
>> Masking: 00000000 Pending: 00000000
>> Capabilities: [70] Express (v2) Endpoint, MSI 00
>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>> DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
>> RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
>> MaxPayload 128 bytes, MaxReadReq 512 bytes
>> DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <2us, L1 <64us
>> ClockPM- Surprise- LLActRep- BwNot-
>> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
>> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>> LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>> DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>> Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
>> Compliance De-emphasis: -6dB
>> LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>> Capabilities: [100 v1] Advanced Error Reporting
>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>> CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout+ NonFatalErr+
>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>> AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
>> Capabilities: [140 v1] Virtual Channel
>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
>> Arb: Fixed- WRR32- WRR64- WRR128-
>> Ctrl: ArbSelect=Fixed
>> Status: InProgress-
>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
>> Status: NegoPending- InProgress-
>> Capabilities: [300 v1] Device Serial Number 00-00-00-00-00-00-00-00
>> Kernel driver in use: vfio-pci
>> Kernel modules: ath9k
>>
>>
>> Unbinding it works w/o any problem. The lock up encounters about 4 s
>> after the start of the VM.
>>
>> On 3.12.x, I can see the following message on the error terminal when
>> starting the VM:
>> vfio-pci: 03:00.0: invalid ROM contents.
>>
>> I compared AMD-Vi debug output between 3.12 and 3.14, but couldn't see
>> any difference. I compared /proc/interrupts between 3.12 and 3.14
>> and couldn't see any difference too so far.
>>
>>
>> qemu version I'm using is 1.7.0.
>>
>>
>> It is strange(?), that a second VM using PCI (legacy) pass through works
>> w/o any problem. I tried to start the problematic VM even w/o running
>> this VM - same result: machine is locked up hard.
>>
>>
>> Do you have any idea, what could be going on there? Or how to debug it
>> to see what happened?
>
> Are you able to setup a serial console on this system? Enabling sysrq
> and getting a dump of task states (t) via serial is often the best way
> to determine the problem.

I'll try it.


It should be most probably something like

console=tty0 console=ttyS0,115200n8
on the sending machine as kernel option and

/sbin/agetty -h -t 60 ttyS0 115200 vt102 on the client.


Probably my biggest problem: I don't have a second machine with a serial
port :-(. I hope this USB to serial adapter will be supported on client
side (as receiver):

Logilink USB 2.0 Seriell Adapter


> There weren't many vfio changes between 3.13 and 3.14.

It could be a pci problem, too?

> Have you tested whether the problem still occurs on 3.16 +

Same problem.

> newer QEMU?

Reluctantly - it is a production system.

> Maybe also remove the ROM from the equation with the
> rombar=0 option for the vfio-pci device in QEMU.

Same problem :-(. The machine really is completely dead: it even pings
any more.



Regards,
Andreas
Andreas Hartmann
2014-09-24 17:16:52 UTC
Permalink
Andreas Hartmann wrote:
> Alex Williamson wrote:
[...]
>> Are you able to setup a serial console on this system? Enabling sysrq
>> and getting a dump of task states (t) via serial is often the best way
>> to determine the problem.
>
> I'll try it.

I did it now like this:

minicom on the client.

Magic SysKeyRequest via minicom:

Ctrl-A shift-f [Syskey, like m or t, ...]


On the sender:
As kerneloption: console=tty0 console=ttyS0,115200n8

Before, you have to enable syskeyrequest via
sysctl -w kernel.sysrq=1

After doing all of this, you can test w/ m or t. The output will appear
on tty9 (via Alt-F10 on openSUSE).


But what was the result after starting the VM? -> Machine is definitely
completely dead. It doesn't react on anything any more.


Remarkable:
After hard reset, the USB keyboard doesn't work any more in Linux (but
in Grub 2), because the driver gets a timeout accessing the USB 3 hw
(other USB chips are working fine). It is necessary to switch of the
machine completely and remove the mains. After ~ 30s, it can be
repowered and all is working fine again (after repairing the broken FS
the VM resides on the host).


Any more hints are welcome :-)


Thanks,
kind regards,
Andreas
Andreas Hartmann
2014-10-10 09:39:36 UTC
Permalink
shortly: I retested w/ qemu 2.1.0 and Linux 3.17.0 - no change in behaviour.

Alex Williamson wrote:
> On Tue, 2014-09-23 at 21:03 +0200, Andreas Hartmann wrote:
>> Hello!
>>
>> Since long time now, I'm using w/o any problem PCIe pass through with a
>> Gigabyte GA-990XA-UD3/GA-990XA-UD3 mainboard (AMD 990X chipset) and
>> enabled IOMMU with vfio-pci.
>>
>> The last kernel working w/o any problem is kernel 3.13.7 (I didn't use
>> .8 and .9, but I do not think they would have been problematic).
>>
>> Since 3.14.19 (I didn't test any 3.14 kernel before) I'm encountering a
>> hard and silent lock up of the complete machine when starting the VM
>> with the PCIe card passed through.
>>
>> That's the relevant PCIe card, which locks up the machine (here
>> running w/ 3.12.28) when passed to the VM:
>>
>> 03:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network Adapter (rev 01)
>> Subsystem: Qualcomm Atheros Device 3112
>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>> Latency: 0, Cache Line Size: 64 bytes
>> Interrupt: pin A routed to IRQ 17
>> Region 0: Memory at fdbc0000 (64-bit, non-prefetchable) [size=128K]
>> Expansion ROM at fda00000 [size=64K]
>> Capabilities: [40] Power Management version 3
>> Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
>> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>> Capabilities: [50] MSI: Enable- Count=1/4 Maskable+ 64bit+
>> Address: 0000000000000000 Data: 0000
>> Masking: 00000000 Pending: 00000000
>> Capabilities: [70] Express (v2) Endpoint, MSI 00
>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>> DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
>> RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
>> MaxPayload 128 bytes, MaxReadReq 512 bytes
>> DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <2us, L1 <64us
>> ClockPM- Surprise- LLActRep- BwNot-
>> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
>> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>> LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>> DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>> Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
>> Compliance De-emphasis: -6dB
>> LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>> Capabilities: [100 v1] Advanced Error Reporting
>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>> CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout+ NonFatalErr+
>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>> AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
>> Capabilities: [140 v1] Virtual Channel
>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
>> Arb: Fixed- WRR32- WRR64- WRR128-
>> Ctrl: ArbSelect=Fixed
>> Status: InProgress-
>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
>> Status: NegoPending- InProgress-
>> Capabilities: [300 v1] Device Serial Number 00-00-00-00-00-00-00-00
>> Kernel driver in use: vfio-pci
>> Kernel modules: ath9k
>>
>>
>> Unbinding it works w/o any problem. The lock up encounters about 4 s
>> after the start of the VM.
>>
>> On 3.12.x, I can see the following message on the error terminal when
>> starting the VM:
>> vfio-pci: 03:00.0: invalid ROM contents.
>>
>> I compared AMD-Vi debug output between 3.12 and 3.14, but couldn't see
>> any difference. I compared /proc/interrupts between 3.12 and 3.14
>> and couldn't see any difference too so far.
>>
>>
>> qemu version I'm using is 1.7.0.
>>
>>
>> It is strange(?), that a second VM using PCI (legacy) pass through works
>> w/o any problem. I tried to start the problematic VM even w/o running
>> this VM - same result: machine is locked up hard.
>>
>>
>> Do you have any idea, what could be going on there? Or how to debug it
>> to see what happened?

> There weren't many vfio changes between 3.13 and 3.14.

It could be a pci problem, too? It is strange, that there is no problem
with the pci-card, but the pcie card hangs the machine!

> Have you tested whether the problem still occurs on 3.16 +

Same problem with 3.17.0

> newer QEMU?

Same problem With qemu 2.1.0.

> Maybe also remove the ROM from the equation with the
> rombar=0 option for the vfio-pci device in QEMU.

Same problem :-(. The machine really is completely dead: it even pings
any more.



Regards,
Andreas
Bjorn Helgaas
2014-10-10 14:37:37 UTC
Permalink
On Fri, Oct 10, 2014 at 3:39 AM, Andreas Hartmann
<***@freenet.de> wrote:
> shortly: I retested w/ qemu 2.1.0 and Linux 3.17.0 - no change in behaviour.
>
> Alex Williamson wrote:
>> On Tue, 2014-09-23 at 21:03 +0200, Andreas Hartmann wrote:
>>> Hello!
>>>
>>> Since long time now, I'm using w/o any problem PCIe pass through with a
>>> Gigabyte GA-990XA-UD3/GA-990XA-UD3 mainboard (AMD 990X chipset) and
>>> enabled IOMMU with vfio-pci.
>>>
>>> The last kernel working w/o any problem is kernel 3.13.7 (I didn't use
>>> .8 and .9, but I do not think they would have been problematic).
>>>
>>> Since 3.14.19 (I didn't test any 3.14 kernel before) I'm encountering a
>>> hard and silent lock up of the complete machine when starting the VM
>>> with the PCIe card passed through.

Since we're not really making any progress on this yet, would it be
possible to bisect it? We already know that 3.13.7 works and 3.14.19
fails, and "git bisect start v3.14 v3.13" says it's about 13 steps. I
know that's still quite a bit of work, but at least it sounds like the
problem is easy to reproduce.

Bjorn
Andreas Hartmann
2014-10-10 14:49:25 UTC
Permalink
Bjorn Helgaas schrieb:
> On Fri, Oct 10, 2014 at 3:39 AM, Andreas Hartmann
> <***@freenet.de> wrote:
>> shortly: I retested w/ qemu 2.1.0 and Linux 3.17.0 - no change in behaviour.
>>
>> Alex Williamson wrote:
>>> On Tue, 2014-09-23 at 21:03 +0200, Andreas Hartmann wrote:
>>>> Hello!
>>>>
>>>> Since long time now, I'm using w/o any problem PCIe pass through with a
>>>> Gigabyte GA-990XA-UD3/GA-990XA-UD3 mainboard (AMD 990X chipset) and
>>>> enabled IOMMU with vfio-pci.
>>>>
>>>> The last kernel working w/o any problem is kernel 3.13.7 (I didn't use
>>>> .8 and .9, but I do not think they would have been problematic).
>>>>
>>>> Since 3.14.19 (I didn't test any 3.14 kernel before) I'm encountering a
>>>> hard and silent lock up of the complete machine when starting the VM
>>>> with the PCIe card passed through.
>
> Since we're not really making any progress on this yet, would it be
> possible to bisect it? We already know that 3.13.7 works and 3.14.19
> fails, and "git bisect start v3.14 v3.13" says it's about 13 steps. I
> know that's still quite a bit of work, but at least it sounds like the
> problem is easy to reproduce.

Which git repository should I use best?

Is it possible to do one checkout and work afterwards always on base of
this? Unfortunately my internet connection is very slow :-(.


Thanks for your hint!

Regards,
Andreas
Bjorn Helgaas
2014-10-10 15:55:16 UTC
Permalink
On Fri, Oct 10, 2014 at 8:49 AM, Andreas Hartmann
<***@freenet.de> wrote:
> Bjorn Helgaas schrieb:
>> On Fri, Oct 10, 2014 at 3:39 AM, Andreas Hartmann
>> <***@freenet.de> wrote:
>>> shortly: I retested w/ qemu 2.1.0 and Linux 3.17.0 - no change in behaviour.
>>>
>>> Alex Williamson wrote:
>>>> On Tue, 2014-09-23 at 21:03 +0200, Andreas Hartmann wrote:
>>>>> Hello!
>>>>>
>>>>> Since long time now, I'm using w/o any problem PCIe pass through with a
>>>>> Gigabyte GA-990XA-UD3/GA-990XA-UD3 mainboard (AMD 990X chipset) and
>>>>> enabled IOMMU with vfio-pci.
>>>>>
>>>>> The last kernel working w/o any problem is kernel 3.13.7 (I didn't use
>>>>> .8 and .9, but I do not think they would have been problematic).
>>>>>
>>>>> Since 3.14.19 (I didn't test any 3.14 kernel before) I'm encountering a
>>>>> hard and silent lock up of the complete machine when starting the VM
>>>>> with the PCIe card passed through.
>>
>> Since we're not really making any progress on this yet, would it be
>> possible to bisect it? We already know that 3.13.7 works and 3.14.19
>> fails, and "git bisect start v3.14 v3.13" says it's about 13 steps. I
>> know that's still quite a bit of work, but at least it sounds like the
>> problem is easy to reproduce.
>
> Which git repository should I use best?

The linux-stable repository [1] contains both the v3.13.x and the
v3.14.x branches, but apparently you can't bisect directly between
v3.13.7 and v3.14.19:

$ git bisect start v3.14.19 v3.13.7
Bisecting: a merge base must be tested
[d8ec26d7f8287f5788a494f56e8814210f0e64be] Linux 3.13

I'm not an expert at bisecting, but here's what I would try:

- Clone the repo from [1] (this same repo can be used for all your testing)
- Checkout, build, and test v3.14
- If v3.14 works (unlikely), bisect between v3.14 and v3.14.19 to
see which change broke it
- If v3.14 fails, checkout, build, and test v3.13
- If v3.13 fails (very unlikely), bisect between v3.13 and v3.13.7
to see which change fixed it
- If v3.13 works and v3.14 fails (most likely), bisect between v3.13 and v3.14

Bjorn

[1] git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
Andreas Hartmann
2014-10-10 16:09:32 UTC
Permalink
Bjorn Helgaas wrote:
> On Fri, Oct 10, 2014 at 8:49 AM, Andreas Hartmann
> <***@freenet.de> wrote:
>> Bjorn Helgaas wrote:
>>> On Fri, Oct 10, 2014 at 3:39 AM, Andreas Hartmann
>>> <***@freenet.de> wrote:
>>>> shortly: I retested w/ qemu 2.1.0 and Linux 3.17.0 - no change in behaviour.
>>>>
>>>> Alex Williamson wrote:
>>>>> On Tue, 2014-09-23 at 21:03 +0200, Andreas Hartmann wrote:
>>>>>> Hello!
>>>>>>
>>>>>> Since long time now, I'm using w/o any problem PCIe pass through with a
>>>>>> Gigabyte GA-990XA-UD3/GA-990XA-UD3 mainboard (AMD 990X chipset) and
>>>>>> enabled IOMMU with vfio-pci.
>>>>>>
>>>>>> The last kernel working w/o any problem is kernel 3.13.7 (I didn't use
>>>>>> .8 and .9, but I do not think they would have been problematic).
>>>>>>
>>>>>> Since 3.14.19 (I didn't test any 3.14 kernel before) I'm encountering a
>>>>>> hard and silent lock up of the complete machine when starting the VM
>>>>>> with the PCIe card passed through.
>>>
>>> Since we're not really making any progress on this yet, would it be
>>> possible to bisect it? We already know that 3.13.7 works and 3.14.19
>>> fails, and "git bisect start v3.14 v3.13" says it's about 13 steps. I
>>> know that's still quite a bit of work, but at least it sounds like the
>>> problem is easy to reproduce.
>>
>> Which git repository should I use best?
>
> The linux-stable repository [1] contains both the v3.13.x and the
> v3.14.x branches, but apparently you can't bisect directly between
> v3.13.7 and v3.14.19:

I know that the first version after 3.13.0 (patch-v3.13-next-20140121)
is already broken. Therefore, it must be between 3.13.7 and
patch-v3.13-next-20140121.



Thanks,
regards,
Andreas
Bjorn Helgaas
2014-10-10 16:41:22 UTC
Permalink
On Fri, Oct 10, 2014 at 10:09 AM, Andreas Hartmann
<***@freenet.de> wrote:
> Bjorn Helgaas wrote:
>> On Fri, Oct 10, 2014 at 8:49 AM, Andreas Hartmann
>> <***@freenet.de> wrote:
>>> Bjorn Helgaas wrote:
>>>> On Fri, Oct 10, 2014 at 3:39 AM, Andreas Hartmann
>>>> <***@freenet.de> wrote:
>>>>> shortly: I retested w/ qemu 2.1.0 and Linux 3.17.0 - no change in behaviour.
>>>>>
>>>>> Alex Williamson wrote:
>>>>>> On Tue, 2014-09-23 at 21:03 +0200, Andreas Hartmann wrote:
>>>>>>> Hello!
>>>>>>>
>>>>>>> Since long time now, I'm using w/o any problem PCIe pass through with a
>>>>>>> Gigabyte GA-990XA-UD3/GA-990XA-UD3 mainboard (AMD 990X chipset) and
>>>>>>> enabled IOMMU with vfio-pci.
>>>>>>>
>>>>>>> The last kernel working w/o any problem is kernel 3.13.7 (I didn't use
>>>>>>> .8 and .9, but I do not think they would have been problematic).
>>>>>>>
>>>>>>> Since 3.14.19 (I didn't test any 3.14 kernel before) I'm encountering a
>>>>>>> hard and silent lock up of the complete machine when starting the VM
>>>>>>> with the PCIe card passed through.
>>>>
>>>> Since we're not really making any progress on this yet, would it be
>>>> possible to bisect it? We already know that 3.13.7 works and 3.14.19
>>>> fails, and "git bisect start v3.14 v3.13" says it's about 13 steps. I
>>>> know that's still quite a bit of work, but at least it sounds like the
>>>> problem is easy to reproduce.
>>>
>>> Which git repository should I use best?
>>
>> The linux-stable repository [1] contains both the v3.13.x and the
>> v3.14.x branches, but apparently you can't bisect directly between
>> v3.13.7 and v3.14.19:
>
> I know that the first version after 3.13.0 (patch-v3.13-next-20140121)
> is already broken. Therefore, it must be between 3.13.7 and
> patch-v3.13-next-20140121.

I assume patch-v3.13-next-20140121 is the linux-next tree from
20140121. v3.13 was released on Jan 19, 2014, so 20140121 was during
the merge window, and the linux-next tree from that day would be
Linus' tree (v3.13 plus whatever he had merged during the first day or
two), plus all the remaining stuff in subsystem trees that had not yet
been merged. The result (patch-v3.13-next-20140121) should be a
fairly good approximation of v3.14.

v3.13.7 is a branch based on v3.13. patch-v3.13-next-20140121 would
essentially be a branch based on v3.13 also. So while they share a
common v3.13 ancestor, I don't think you can bisect directly between
them. And linux-next is rebuilt from scratch every day, so I don't
think there is a git tree with patch-v3.13-next-20140121 in it anyway.

Bjorn
Andreas Hartmann
2014-10-10 22:32:19 UTC
Permalink
Bjorn Helgaas wrote:
> On Fri, Oct 10, 2014 at 10:09 AM, Andreas Hartmann
> <***@freenet.de> wrote:
>> Bjorn Helgaas wrote:
>>> On Fri, Oct 10, 2014 at 8:49 AM, Andreas Hartmann
>>> <***@freenet.de> wrote:
>>>> Bjorn Helgaas wrote:
>>>>> On Fri, Oct 10, 2014 at 3:39 AM, Andreas Hartmann
>>>>> <***@freenet.de> wrote:
>>>>>> shortly: I retested w/ qemu 2.1.0 and Linux 3.17.0 - no change in behaviour.
>>>>>>
>>>>>> Alex Williamson wrote:
>>>>>>> On Tue, 2014-09-23 at 21:03 +0200, Andreas Hartmann wrote:
>>>>>>>> Hello!
>>>>>>>>
>>>>>>>> Since long time now, I'm using w/o any problem PCIe pass through with a
>>>>>>>> Gigabyte GA-990XA-UD3/GA-990XA-UD3 mainboard (AMD 990X chipset) and
>>>>>>>> enabled IOMMU with vfio-pci.
>>>>>>>>
>>>>>>>> The last kernel working w/o any problem is kernel 3.13.7 (I didn't use
>>>>>>>> .8 and .9, but I do not think they would have been problematic).
>>>>>>>>
>>>>>>>> Since 3.14.19 (I didn't test any 3.14 kernel before) I'm encountering a
>>>>>>>> hard and silent lock up of the complete machine when starting the VM
>>>>>>>> with the PCIe card passed through.
>>>>>
>>>>> Since we're not really making any progress on this yet, would it be
>>>>> possible to bisect it? We already know that 3.13.7 works and 3.14.19
>>>>> fails, and "git bisect start v3.14 v3.13" says it's about 13 steps. I
>>>>> know that's still quite a bit of work, but at least it sounds like the
>>>>> problem is easy to reproduce.
>>>>
>>>> Which git repository should I use best?
>>>
>>> The linux-stable repository [1] contains both the v3.13.x and the
>>> v3.14.x branches, but apparently you can't bisect directly between
>>> v3.13.7 and v3.14.19:
>>
>> I know that the first version after 3.13.0 (patch-v3.13-next-20140121)
>> is already broken. Therefore, it must be between 3.13.7 and
>> patch-v3.13-next-20140121.


Ok, this is the result of git bisect:

425c1b223dac456d00a61fd6b451b6d1cf00d065 is the first bad commit
commit 425c1b223dac456d00a61fd6b451b6d1cf00d065
Author: Alex Williamson <***@redhat.com>
Date: Tue Dec 17 16:43:51 2013 -0700

PCI: Add Virtual Channel to save/restore support

While we don't really have any infrastructure for making use of VC
support, the system BIOS can configure the topology to non-default
VC values prior to boot. This may be due to silicon bugs, desire to
reserve traffic classes, or perhaps just BIOS bugs. When we reset
devices, the VC configuration may return to default values, which can
be incompatible with devices upstream. For instance, Nvidia GRID
cards provide a PCIe switch and some number of GPUs, all supporting
VC. The power-on default for VC is to support TC0-7 across VC0,
however some platforms will only enable TC0/VC0 mapping across the
topology. When we do a secondary bus reset on the downstream switch
port, the GPU is reset to a TC0-7/VC0 mapping while the opposite end
of the link only enables TC0/VC0. If the GPU attempts to use TC1-7,
it fails.

This patch attempts to provide complete support for VC save/restore,
even beyond the minimally required use case above. This includes
save/restore and reload of the arbitration table, save/restore and
reload of the port arbitration tables, and re-enabling of the
channels for VC, VC9, and MFVC capabilities.

Signed-off-by: Alex Williamson <***@redhat.com>
Signed-off-by: Bjorn Helgaas <***@google.com>


Kind regards,
Andreas
Bjorn Helgaas
2014-10-10 22:54:08 UTC
Permalink
On Sat, Oct 11, 2014 at 12:32:19AM +0200, Andreas Hartmann wrote:
> Bjorn Helgaas wrote:
> > On Fri, Oct 10, 2014 at 10:09 AM, Andreas Hartmann
> > <***@freenet.de> wrote:
> >> Bjorn Helgaas wrote:
> >>> On Fri, Oct 10, 2014 at 8:49 AM, Andreas Hartmann
> >>> <***@freenet.de> wrote:
> >>>> Bjorn Helgaas wrote:
> >>>>> On Fri, Oct 10, 2014 at 3:39 AM, Andreas Hartmann
> >>>>> <***@freenet.de> wrote:
> >>>>>> shortly: I retested w/ qemu 2.1.0 and Linux 3.17.0 - no change in behaviour.
> >>>>>>
> >>>>>> Alex Williamson wrote:
> >>>>>>> On Tue, 2014-09-23 at 21:03 +0200, Andreas Hartmann wrote:
> >>>>>>>> Hello!
> >>>>>>>>
> >>>>>>>> Since long time now, I'm using w/o any problem PCIe pass through with a
> >>>>>>>> Gigabyte GA-990XA-UD3/GA-990XA-UD3 mainboard (AMD 990X chipset) and
> >>>>>>>> enabled IOMMU with vfio-pci.
> >>>>>>>>
> >>>>>>>> The last kernel working w/o any problem is kernel 3.13.7 (I didn't use
> >>>>>>>> .8 and .9, but I do not think they would have been problematic).
> >>>>>>>>
> >>>>>>>> Since 3.14.19 (I didn't test any 3.14 kernel before) I'm encountering a
> >>>>>>>> hard and silent lock up of the complete machine when starting the VM
> >>>>>>>> with the PCIe card passed through.
> >>>>>
> >>>>> Since we're not really making any progress on this yet, would it be
> >>>>> possible to bisect it? We already know that 3.13.7 works and 3.14.19
> >>>>> fails, and "git bisect start v3.14 v3.13" says it's about 13 steps. I
> >>>>> know that's still quite a bit of work, but at least it sounds like the
> >>>>> problem is easy to reproduce.
> >>>>
> >>>> Which git repository should I use best?
> >>>
> >>> The linux-stable repository [1] contains both the v3.13.x and the
> >>> v3.14.x branches, but apparently you can't bisect directly between
> >>> v3.13.7 and v3.14.19:
> >>
> >> I know that the first version after 3.13.0 (patch-v3.13-next-20140121)
> >> is already broken. Therefore, it must be between 3.13.7 and
> >> patch-v3.13-next-20140121.
>
>
> Ok, this is the result of git bisect:
>
> 425c1b223dac456d00a61fd6b451b6d1cf00d065 is the first bad commit
> commit 425c1b223dac456d00a61fd6b451b6d1cf00d065
> Author: Alex Williamson <***@redhat.com>
> Date: Tue Dec 17 16:43:51 2013 -0700
>
> PCI: Add Virtual Channel to save/restore support
>
> While we don't really have any infrastructure for making use of VC
> support, the system BIOS can configure the topology to non-default
> VC values prior to boot. This may be due to silicon bugs, desire to
> reserve traffic classes, or perhaps just BIOS bugs. When we reset
> devices, the VC configuration may return to default values, which can
> be incompatible with devices upstream. For instance, Nvidia GRID
> cards provide a PCIe switch and some number of GPUs, all supporting
> VC. The power-on default for VC is to support TC0-7 across VC0,
> however some platforms will only enable TC0/VC0 mapping across the
> topology. When we do a secondary bus reset on the downstream switch
> port, the GPU is reset to a TC0-7/VC0 mapping while the opposite end
> of the link only enables TC0/VC0. If the GPU attempts to use TC1-7,
> it fails.
>
> This patch attempts to provide complete support for VC save/restore,
> even beyond the minimally required use case above. This includes
> save/restore and reload of the arbitration table, save/restore and
> reload of the port arbitration tables, and re-enabling of the
> channels for VC, VC9, and MFVC capabilities.
>
> Signed-off-by: Alex Williamson <***@redhat.com>
> Signed-off-by: Bjorn Helgaas <***@google.com>

Wow, I'm amazed that you could get that done so fast... you must have spent
your whole day working on this!

To double-check this, can you try applying the patch below? It should be
enough to make things work if 425c1b223dac is really what's causing the
trouble.

This patch is based on v3.17, but 425c1b223dac appeared in v3.14, so you
should be able to apply it to v3.14 or any later kernel.

Bjorn


diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 2c9ac70254e2..8ef8bc56a584 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1007,8 +1007,6 @@ int pci_save_state(struct pci_dev *dev)
return i;
if ((i = pci_save_pcix_state(dev)) != 0)
return i;
- if ((i = pci_save_vc_state(dev)) != 0)
- return i;
return 0;
}
EXPORT_SYMBOL(pci_save_state);
@@ -1072,7 +1070,6 @@ void pci_restore_state(struct pci_dev *dev)
/* PCI Express register must be restored first */
pci_restore_pcie_state(dev);
pci_restore_ats_state(dev);
- pci_restore_vc_state(dev);

pci_restore_config_space(dev);

@@ -2170,8 +2167,6 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
if (error)
dev_err(&dev->dev,
"unable to preallocate PCI-X save buffer\n");
-
- pci_allocate_vc_save_buffers(dev);
}

void pci_free_cap_save_buffers(struct pci_dev *dev)
Andreas Hartmann
2014-10-11 06:20:14 UTC
Permalink
Bjorn Helgaas wrote:
> On Sat, Oct 11, 2014 at 12:32:19AM +0200, Andreas Hartmann wrote:
>> Bjorn Helgaas wrote:
>>> On Fri, Oct 10, 2014 at 10:09 AM, Andreas Hartmann
>>> <***@freenet.de> wrote:
>>>> Bjorn Helgaas wrote:
>>>>> On Fri, Oct 10, 2014 at 8:49 AM, Andreas Hartmann
>>>>> <***@freenet.de> wrote:
>>>>>> Bjorn Helgaas wrote:
>>>>>>> On Fri, Oct 10, 2014 at 3:39 AM, Andreas Hartmann
>>>>>>> <***@freenet.de> wrote:
>>>>>>>> shortly: I retested w/ qemu 2.1.0 and Linux 3.17.0 - no change in behaviour.
>>>>>>>>
>>>>>>>> Alex Williamson wrote:
>>>>>>>>> On Tue, 2014-09-23 at 21:03 +0200, Andreas Hartmann wrote:
>>>>>>>>>> Hello!
>>>>>>>>>>
>>>>>>>>>> Since long time now, I'm using w/o any problem PCIe pass through with a
>>>>>>>>>> Gigabyte GA-990XA-UD3/GA-990XA-UD3 mainboard (AMD 990X chipset) and
>>>>>>>>>> enabled IOMMU with vfio-pci.
>>>>>>>>>>
>>>>>>>>>> The last kernel working w/o any problem is kernel 3.13.7 (I didn't use
>>>>>>>>>> .8 and .9, but I do not think they would have been problematic).
>>>>>>>>>>
>>>>>>>>>> Since 3.14.19 (I didn't test any 3.14 kernel before) I'm encountering a
>>>>>>>>>> hard and silent lock up of the complete machine when starting the VM
>>>>>>>>>> with the PCIe card passed through.
>>>>>>>
>>>>>>> Since we're not really making any progress on this yet, would it be
>>>>>>> possible to bisect it? We already know that 3.13.7 works and 3.14.19
>>>>>>> fails, and "git bisect start v3.14 v3.13" says it's about 13 steps. I
>>>>>>> know that's still quite a bit of work, but at least it sounds like the
>>>>>>> problem is easy to reproduce.
>>>>>>
>>>>>> Which git repository should I use best?
>>>>>
>>>>> The linux-stable repository [1] contains both the v3.13.x and the
>>>>> v3.14.x branches, but apparently you can't bisect directly between
>>>>> v3.13.7 and v3.14.19:
>>>>
>>>> I know that the first version after 3.13.0 (patch-v3.13-next-20140121)
>>>> is already broken. Therefore, it must be between 3.13.7 and
>>>> patch-v3.13-next-20140121.
>>
>>
>> Ok, this is the result of git bisect:
>>
>> 425c1b223dac456d00a61fd6b451b6d1cf00d065 is the first bad commit
>> commit 425c1b223dac456d00a61fd6b451b6d1cf00d065
>> Author: Alex Williamson <***@redhat.com>
>> Date: Tue Dec 17 16:43:51 2013 -0700
>>
>> PCI: Add Virtual Channel to save/restore support
>>
>> While we don't really have any infrastructure for making use of VC
>> support, the system BIOS can configure the topology to non-default
>> VC values prior to boot. This may be due to silicon bugs, desire to
>> reserve traffic classes, or perhaps just BIOS bugs. When we reset
>> devices, the VC configuration may return to default values, which can
>> be incompatible with devices upstream. For instance, Nvidia GRID
>> cards provide a PCIe switch and some number of GPUs, all supporting
>> VC. The power-on default for VC is to support TC0-7 across VC0,
>> however some platforms will only enable TC0/VC0 mapping across the
>> topology. When we do a secondary bus reset on the downstream switch
>> port, the GPU is reset to a TC0-7/VC0 mapping while the opposite end
>> of the link only enables TC0/VC0. If the GPU attempts to use TC1-7,
>> it fails.
>>
>> This patch attempts to provide complete support for VC save/restore,
>> even beyond the minimally required use case above. This includes
>> save/restore and reload of the arbitration table, save/restore and
>> reload of the port arbitration tables, and re-enabling of the
>> channels for VC, VC9, and MFVC capabilities.
>>
>> Signed-off-by: Alex Williamson <***@redhat.com>
>> Signed-off-by: Bjorn Helgaas <***@google.com>
>
> Wow, I'm amazed that you could get that done so fast... you must have spent
> your whole day working on this!

If I would have been more familiar with the versioning of the kernels
and if I would have a faster internet connection and if there wouldn't
be another bug in systemd, which has bitten me on booting with broken fs
(but I found a cool workaround now :-)), I would have been much faster:
my 8 core machine and 8 GB of RAM, where I've been compiling the kernel
in and my special kernel config (which Im using since 3.10) only
containing my requests, with parts of the process automated makes it
possible to have a turn around of ~ 7 minutes :-).
I too had no problem with reproducibility, because the problem always
comes up at the start of the vm after 1 or 2 secs.

>
> To double-check this, can you try applying the patch below? It should be
> enough to make things work if 425c1b223dac is really what's causing the
> trouble.
>
> This patch is based on v3.17, but 425c1b223dac appeared in v3.14, so you
> should be able to apply it to v3.14 or any later kernel.
>
> Bjorn
>
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 2c9ac70254e2..8ef8bc56a584 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1007,8 +1007,6 @@ int pci_save_state(struct pci_dev *dev)
> return i;
> if ((i = pci_save_pcix_state(dev)) != 0)
> return i;
> - if ((i = pci_save_vc_state(dev)) != 0)
> - return i;
> return 0;
> }
> EXPORT_SYMBOL(pci_save_state);
> @@ -1072,7 +1070,6 @@ void pci_restore_state(struct pci_dev *dev)
> /* PCI Express register must be restored first */
> pci_restore_pcie_state(dev);
> pci_restore_ats_state(dev);
> - pci_restore_vc_state(dev);
>
> pci_restore_config_space(dev);
>
> @@ -2170,8 +2167,6 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
> if (error)
> dev_err(&dev->dev,
> "unable to preallocate PCI-X save buffer\n");
> -
> - pci_allocate_vc_save_buffers(dev);
> }
>
> void pci_free_cap_save_buffers(struct pci_dev *dev)
>

This patch proofed the git bisect result. I applied it to
patch-v3.13-next-20140122 and the machine worked pretty fine :-).


Thanks,
Regards,
Andreas
Alex Williamson
2014-10-15 08:04:27 UTC
Permalink
On Sat, 2014-10-11 at 08:20 +0200, Andreas Hartmann wrote:
> Bjorn Helgaas wrote:
> > On Sat, Oct 11, 2014 at 12:32:19AM +0200, Andreas Hartmann wrote:
> >> Bjorn Helgaas wrote:
> >>> On Fri, Oct 10, 2014 at 10:09 AM, Andreas Hartmann
> >>> <***@freenet.de> wrote:
> >>>> Bjorn Helgaas wrote:
> >>>>> On Fri, Oct 10, 2014 at 8:49 AM, Andreas Hartmann
> >>>>> <***@freenet.de> wrote:
> >>>>>> Bjorn Helgaas wrote:
> >>>>>>> On Fri, Oct 10, 2014 at 3:39 AM, Andreas Hartmann
> >>>>>>> <***@freenet.de> wrote:
> >>>>>>>> shortly: I retested w/ qemu 2.1.0 and Linux 3.17.0 - no change in behaviour.
> >>>>>>>>
> >>>>>>>> Alex Williamson wrote:
> >>>>>>>>> On Tue, 2014-09-23 at 21:03 +0200, Andreas Hartmann wrote:
> >>>>>>>>>> Hello!
> >>>>>>>>>>
> >>>>>>>>>> Since long time now, I'm using w/o any problem PCIe pass through with a
> >>>>>>>>>> Gigabyte GA-990XA-UD3/GA-990XA-UD3 mainboard (AMD 990X chipset) and
> >>>>>>>>>> enabled IOMMU with vfio-pci.
> >>>>>>>>>>
> >>>>>>>>>> The last kernel working w/o any problem is kernel 3.13.7 (I didn't use
> >>>>>>>>>> .8 and .9, but I do not think they would have been problematic).
> >>>>>>>>>>
> >>>>>>>>>> Since 3.14.19 (I didn't test any 3.14 kernel before) I'm encountering a
> >>>>>>>>>> hard and silent lock up of the complete machine when starting the VM
> >>>>>>>>>> with the PCIe card passed through.
> >>>>>>>
> >>>>>>> Since we're not really making any progress on this yet, would it be
> >>>>>>> possible to bisect it? We already know that 3.13.7 works and 3.14.19
> >>>>>>> fails, and "git bisect start v3.14 v3.13" says it's about 13 steps. I
> >>>>>>> know that's still quite a bit of work, but at least it sounds like the
> >>>>>>> problem is easy to reproduce.
> >>>>>>
> >>>>>> Which git repository should I use best?
> >>>>>
> >>>>> The linux-stable repository [1] contains both the v3.13.x and the
> >>>>> v3.14.x branches, but apparently you can't bisect directly between
> >>>>> v3.13.7 and v3.14.19:
> >>>>
> >>>> I know that the first version after 3.13.0 (patch-v3.13-next-20140121)
> >>>> is already broken. Therefore, it must be between 3.13.7 and
> >>>> patch-v3.13-next-20140121.
> >>
> >>
> >> Ok, this is the result of git bisect:
> >>
> >> 425c1b223dac456d00a61fd6b451b6d1cf00d065 is the first bad commit
> >> commit 425c1b223dac456d00a61fd6b451b6d1cf00d065
> >> Author: Alex Williamson <***@redhat.com>
> >> Date: Tue Dec 17 16:43:51 2013 -0700
> >>
> >> PCI: Add Virtual Channel to save/restore support
> >>
> >> While we don't really have any infrastructure for making use of VC
> >> support, the system BIOS can configure the topology to non-default
> >> VC values prior to boot. This may be due to silicon bugs, desire to
> >> reserve traffic classes, or perhaps just BIOS bugs. When we reset
> >> devices, the VC configuration may return to default values, which can
> >> be incompatible with devices upstream. For instance, Nvidia GRID
> >> cards provide a PCIe switch and some number of GPUs, all supporting
> >> VC. The power-on default for VC is to support TC0-7 across VC0,
> >> however some platforms will only enable TC0/VC0 mapping across the
> >> topology. When we do a secondary bus reset on the downstream switch
> >> port, the GPU is reset to a TC0-7/VC0 mapping while the opposite end
> >> of the link only enables TC0/VC0. If the GPU attempts to use TC1-7,
> >> it fails.
> >>
> >> This patch attempts to provide complete support for VC save/restore,
> >> even beyond the minimally required use case above. This includes
> >> save/restore and reload of the arbitration table, save/restore and
> >> reload of the port arbitration tables, and re-enabling of the
> >> channels for VC, VC9, and MFVC capabilities.
> >>
> >> Signed-off-by: Alex Williamson <***@redhat.com>
> >> Signed-off-by: Bjorn Helgaas <***@google.com>
> >
> > Wow, I'm amazed that you could get that done so fast... you must have spent
> > your whole day working on this!
>
> If I would have been more familiar with the versioning of the kernels
> and if I would have a faster internet connection and if there wouldn't
> be another bug in systemd, which has bitten me on booting with broken fs
> (but I found a cool workaround now :-)), I would have been much faster:
> my 8 core machine and 8 GB of RAM, where I've been compiling the kernel
> in and my special kernel config (which Im using since 3.10) only
> containing my requests, with parts of the process automated makes it
> possible to have a turn around of ~ 7 minutes :-).
> I too had no problem with reproducibility, because the problem always
> comes up at the start of the vm after 1 or 2 secs.

Hi Andreas,

Sorry for the breakage. Is it possible to run lspci on the device in a
loop from the host and capture whether we're failing to restore some of
the VC bits to their previous state? Does the problem also occur if you
unbind from host driver, echo 1 > reset in pci-sysfs, and re-bind to the
host? I'll also try to reproduce on my 990fx system, but I won't be
able to do that until next week due to travel. Thanks,

Alex

> > To double-check this, can you try applying the patch below? It should be
> > enough to make things work if 425c1b223dac is really what's causing the
> > trouble.
> >
> > This patch is based on v3.17, but 425c1b223dac appeared in v3.14, so you
> > should be able to apply it to v3.14 or any later kernel.
> >
> > Bjorn
> >
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 2c9ac70254e2..8ef8bc56a584 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -1007,8 +1007,6 @@ int pci_save_state(struct pci_dev *dev)
> > return i;
> > if ((i = pci_save_pcix_state(dev)) != 0)
> > return i;
> > - if ((i = pci_save_vc_state(dev)) != 0)
> > - return i;
> > return 0;
> > }
> > EXPORT_SYMBOL(pci_save_state);
> > @@ -1072,7 +1070,6 @@ void pci_restore_state(struct pci_dev *dev)
> > /* PCI Express register must be restored first */
> > pci_restore_pcie_state(dev);
> > pci_restore_ats_state(dev);
> > - pci_restore_vc_state(dev);
> >
> > pci_restore_config_space(dev);
> >
> > @@ -2170,8 +2167,6 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
> > if (error)
> > dev_err(&dev->dev,
> > "unable to preallocate PCI-X save buffer\n");
> > -
> > - pci_allocate_vc_save_buffers(dev);
> > }
> >
> > void pci_free_cap_save_buffers(struct pci_dev *dev)
> >
>
> This patch proofed the git bisect result. I applied it to
> patch-v3.13-next-20140122 and the machine worked pretty fine :-).
>
>
> Thanks,
> Regards,
> Andreas
Andreas Hartmann
2014-10-17 01:04:52 UTC
Permalink
Hello Alex,

Alex Williamson wrote:
> Hi Andreas,
[...]
> Sorry for the breakage. Is it possible to run lspci on the device in a
> loop from the host and capture whether we're failing to restore some of
> the VC bits to their previous state?

> Does the problem also occur if you
> unbind from host driver,

The machine is booted w/ blacklisted ath9k. Then, the device is bound to
vfio:

echo "168c 0030" > /sys/bus/pci/drivers/vfio-pci/new_id
echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind
echo 0000:03:00.0 > /sys/bus/pci/drivers/vfio-pci/bind

afterwards the VM is started -> hang.

W/o starting th VM, I can bind it to vfio and unbind it from vfio w/o
any problem.

> echo 1 > reset in pci-sysfs,

echo 1 > /sys/bus/pci/devices/0000:03:00.0 works w/o any problem while
bound to vfio. Even after unbinding from vfio and rebinding to vfio
again ... .

> and re-bind to the

Do you mean loading ath9k in host system after unbinding from vfio? If
yes: Works w/o any problem. It's even possible to reset it or do a
ifconfig wlan0 up, ifconfig wlan0 down, rmmod ath9k, bind it to vfio
again and reset it, ....

Looks like the hang only is triggered by qemu-system_x86_64 on startup
the VM.

> host? I'll also try to reproduce on my 990fx system, but I won't be
> able to do that until next week due to travel. Thanks,


Regards,
Andreas
Alex Williamson
2014-10-21 21:06:20 UTC
Permalink
Hi Andreas,

On Fri, 2014-10-17 at 03:04 +0200, Andreas Hartmann wrote:
> Hello Alex,
>
> Alex Williamson wrote:
> > Hi Andreas,
> [...]
> > Sorry for the breakage. Is it possible to run lspci on the device in a
> > loop from the host and capture whether we're failing to restore some of
> > the VC bits to their previous state?
>
> > Does the problem also occur if you
> > unbind from host driver,
>
> The machine is booted w/ blacklisted ath9k. Then, the device is bound to
> vfio:
>
> echo "168c 0030" > /sys/bus/pci/drivers/vfio-pci/new_id
> echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind
> echo 0000:03:00.0 > /sys/bus/pci/drivers/vfio-pci/bind
>
> afterwards the VM is started -> hang.
>
> W/o starting th VM, I can bind it to vfio and unbind it from vfio w/o
> any problem.
>
> > echo 1 > reset in pci-sysfs,
>
> echo 1 > /sys/bus/pci/devices/0000:03:00.0 works w/o any problem while
> bound to vfio. Even after unbinding from vfio and rebinding to vfio
> again ... .
>
> > and re-bind to the
>
> Do you mean loading ath9k in host system after unbinding from vfio? If
> yes: Works w/o any problem. It's even possible to reset it or do a
> ifconfig wlan0 up, ifconfig wlan0 down, rmmod ath9k, bind it to vfio
> again and reset it, ....
>
> Looks like the hang only is triggered by qemu-system_x86_64 on startup
> the VM.
>
> > host? I'll also try to reproduce on my 990fx system, but I won't be
> > able to do that until next week due to travel. Thanks,

Could you send me the lspci -vvvxxxx for the device and parent root
port? Thanks,

Alex
Alex Williamson
2014-10-21 21:32:32 UTC
Permalink
On Tue, 2014-10-21 at 15:06 -0600, Alex Williamson wrote:
> Hi Andreas,
>
> On Fri, 2014-10-17 at 03:04 +0200, Andreas Hartmann wrote:
> > Hello Alex,
> >
> > Alex Williamson wrote:
> > > Hi Andreas,
> > [...]
> > > Sorry for the breakage. Is it possible to run lspci on the device in a
> > > loop from the host and capture whether we're failing to restore some of
> > > the VC bits to their previous state?
> >
> > > Does the problem also occur if you
> > > unbind from host driver,
> >
> > The machine is booted w/ blacklisted ath9k. Then, the device is bound to
> > vfio:
> >
> > echo "168c 0030" > /sys/bus/pci/drivers/vfio-pci/new_id
> > echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind
> > echo 0000:03:00.0 > /sys/bus/pci/drivers/vfio-pci/bind
> >
> > afterwards the VM is started -> hang.
> >
> > W/o starting th VM, I can bind it to vfio and unbind it from vfio w/o
> > any problem.
> >
> > > echo 1 > reset in pci-sysfs,
> >
> > echo 1 > /sys/bus/pci/devices/0000:03:00.0 works w/o any problem while
> > bound to vfio. Even after unbinding from vfio and rebinding to vfio
> > again ... .
> >
> > > and re-bind to the
> >
> > Do you mean loading ath9k in host system after unbinding from vfio? If
> > yes: Works w/o any problem. It's even possible to reset it or do a
> > ifconfig wlan0 up, ifconfig wlan0 down, rmmod ath9k, bind it to vfio
> > again and reset it, ....
> >
> > Looks like the hang only is triggered by qemu-system_x86_64 on startup
> > the VM.

Also, this might be because QEMU since 1.7 will favor doing a bus reset
for a device over PM reset while the sysfs reset interface will only do
a bus reset if there are no other methods available and there are no
other devices on the bus. Can you reproduce the hang using the sysfs
reset interface without QEMU if you modify the kernel like this:

--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3308,15 +3308,15 @@ static int __pci_dev_reset(struct pci_dev *dev, int prob
if (rc != -ENOTTY)
goto done;

- rc = pci_pm_reset(dev, probe);
+ rc = pci_dev_reset_slot_function(dev, probe);
if (rc != -ENOTTY)
goto done;

- rc = pci_dev_reset_slot_function(dev, probe);
+ rc = pci_parent_bus_reset(dev, probe);
if (rc != -ENOTTY)
goto done;

- rc = pci_parent_bus_reset(dev, probe);
+ rc = pci_pm_reset(dev, probe);
done:
return rc;
}



> > > host? I'll also try to reproduce on my 990fx system, but I won't be
> > > able to do that until next week due to travel. Thanks,
>
> Could you send me the lspci -vvvxxxx for the device and parent root
> port? Thanks,
>
> Alex
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to ***@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Andreas Hartmann
2014-10-22 16:22:49 UTC
Permalink
Alex Williamson wrote:
> On Tue, 2014-10-21 at 15:06 -0600, Alex Williamson wrote:
>> Hi Andreas,
>>
>> On Fri, 2014-10-17 at 03:04 +0200, Andreas Hartmann wrote:
>>> Hello Alex,
>>>
>>> Alex Williamson wrote:
>>>> Hi Andreas,
>>> [...]
>>>> Sorry for the breakage. Is it possible to run lspci on the device in a
>>>> loop from the host and capture whether we're failing to restore some of
>>>> the VC bits to their previous state?
>>>
>>>> Does the problem also occur if you
>>>> unbind from host driver,
>>>
>>> The machine is booted w/ blacklisted ath9k. Then, the device is bound to
>>> vfio:
>>>
>>> echo "168c 0030" > /sys/bus/pci/drivers/vfio-pci/new_id
>>> echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind
>>> echo 0000:03:00.0 > /sys/bus/pci/drivers/vfio-pci/bind
>>>
>>> afterwards the VM is started -> hang.
>>>
>>> W/o starting th VM, I can bind it to vfio and unbind it from vfio w/o
>>> any problem.
>>>
>>>> echo 1 > reset in pci-sysfs,
>>>
>>> echo 1 > /sys/bus/pci/devices/0000:03:00.0 works w/o any problem while
>>> bound to vfio. Even after unbinding from vfio and rebinding to vfio
>>> again ... .
>>>
>>>> and re-bind to the
>>>
>>> Do you mean loading ath9k in host system after unbinding from vfio? If
>>> yes: Works w/o any problem. It's even possible to reset it or do a
>>> ifconfig wlan0 up, ifconfig wlan0 down, rmmod ath9k, bind it to vfio
>>> again and reset it, ....
>>>
>>> Looks like the hang only is triggered by qemu-system_x86_64 on startup
>>> the VM.
>
> Also, this might be because QEMU since 1.7 will favor doing a bus reset
> for a device over PM reset while the sysfs reset interface will only do
> a bus reset if there are no other methods available and there are no
> other devices on the bus. Can you reproduce the hang using the sysfs
> reset interface without QEMU if you modify the kernel like this:
>
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3308,15 +3308,15 @@ static int __pci_dev_reset(struct pci_dev *dev, int prob
> if (rc != -ENOTTY)
> goto done;
>
> - rc = pci_pm_reset(dev, probe);
> + rc = pci_dev_reset_slot_function(dev, probe);
> if (rc != -ENOTTY)
> goto done;
>
> - rc = pci_dev_reset_slot_function(dev, probe);
> + rc = pci_parent_bus_reset(dev, probe);
> if (rc != -ENOTTY)
> goto done;
>
> - rc = pci_parent_bus_reset(dev, probe);
> + rc = pci_pm_reset(dev, probe);
> done:
> return rc;
> }

This way it's crashing with echo 1 > reset, too.


Regards,
Andreas
Alex Williamson
2014-10-22 20:36:55 UTC
Permalink
On Wed, 2014-10-22 at 18:22 +0200, Andreas Hartmann wrote:
> Alex Williamson wrote:
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -3308,15 +3308,15 @@ static int __pci_dev_reset(struct pci_dev *dev, int prob
> > if (rc != -ENOTTY)
> > goto done;
> >
> > - rc = pci_pm_reset(dev, probe);
> > + rc = pci_dev_reset_slot_function(dev, probe);
> > if (rc != -ENOTTY)
> > goto done;
> >
> > - rc = pci_dev_reset_slot_function(dev, probe);
> > + rc = pci_parent_bus_reset(dev, probe);
> > if (rc != -ENOTTY)
> > goto done;
> >
> > - rc = pci_parent_bus_reset(dev, probe);
> > + rc = pci_pm_reset(dev, probe);
> > done:
> > return rc;
> > }
>
> This way it's crashing with echo 1 > reset, too.

Ok, so it's somehow related to doing a bus reset with virtual channel
save/restore while PM reset with VC save/restore works ok as apparently
does bus reset without VC save/restore. Let's try to do a manual bus
reset so we can look at the post reset state of the device before the
kernel tries to restore it.

First bind the target device 03:00.0 to pci-stub or vfio-pci so that we
know it's not being used.

Next capture lspci -xxxx -s 3:00.0 so we have the starting state.

Then we'll do a bus reset using setpci:
# setpci -s 00:05.0 3e.w=40:40
<if you script this, wait at least 2ms here>
# setpci -s 00:05.0 3e.w=00:40
<wait 1 second here>

Now re-capture lspci -xxxx -s 3:00.0

The interesting lines for your device are 140: and 150:, so if you want
to avoid sending massive emails you can just send those for the before
and after. You'll need to reboot the system before you do anything else
with this device since it's now in an uninitialized state. Based on
what the lspci output reports (or whether you experience a hang simply
from this), we may want to try writing additional bits with setpci to
mimic the VC restore behavior. Thanks,

Alex
Andreas Hartmann
2014-10-23 16:00:06 UTC
Permalink
Alex Williamson wrote:
> On Wed, 2014-10-22 at 18:22 +0200, Andreas Hartmann wrote:
>> Alex Williamson wrote:
>>> --- a/drivers/pci/pci.c
>>> +++ b/drivers/pci/pci.c
>>> @@ -3308,15 +3308,15 @@ static int __pci_dev_reset(struct pci_dev *dev, int prob
>>> if (rc != -ENOTTY)
>>> goto done;
>>>
>>> - rc = pci_pm_reset(dev, probe);
>>> + rc = pci_dev_reset_slot_function(dev, probe);
>>> if (rc != -ENOTTY)
>>> goto done;
>>>
>>> - rc = pci_dev_reset_slot_function(dev, probe);
>>> + rc = pci_parent_bus_reset(dev, probe);
>>> if (rc != -ENOTTY)
>>> goto done;
>>>
>>> - rc = pci_parent_bus_reset(dev, probe);
>>> + rc = pci_pm_reset(dev, probe);
>>> done:
>>> return rc;
>>> }
>>
>> This way it's crashing with echo 1 > reset, too.
>
> Ok, so it's somehow related to doing a bus reset with virtual channel
> save/restore while PM reset with VC save/restore works ok as apparently
> does bus reset without VC save/restore. Let's try to do a manual bus
> reset so we can look at the post reset state of the device before the
> kernel tries to restore it.
>
> First bind the target device 03:00.0 to pci-stub or vfio-pci so that we
> know it's not being used.
>
> Next capture lspci -xxxx -s 3:00.0 so we have the starting state.
>
> Then we'll do a bus reset using setpci:
> # setpci -s 00:05.0 3e.w=40:40
> <if you script this, wait at least 2ms here>
> # setpci -s 00:05.0 3e.w=00:40
> <wait 1 second here>
>
> Now re-capture lspci -xxxx -s 3:00.0

The machine is booted w/ vfio bound to 3:00.0 as usual (now for testing
linux 3.14)

lspci -xxxx -s 3:00.0
setpci -s 00:05.0 3e.w=40:40
usleep 10
setpci -s 00:05.0 3e.w=00:40
sleep 1
lspci -xxxx -s 3:00.0

I didn't get the second lspci because the machine already was hanging.
The first output is attached completely.



Hope this helps,
thanks,
regards,
Andreas
Alex Williamson
2014-10-23 16:33:42 UTC
Permalink
On Thu, 2014-10-23 at 18:00 +0200, Andreas Hartmann wrote:
> Alex Williamson wrote:
> > On Wed, 2014-10-22 at 18:22 +0200, Andreas Hartmann wrote:
> >> Alex Williamson wrote:
> >>> --- a/drivers/pci/pci.c
> >>> +++ b/drivers/pci/pci.c
> >>> @@ -3308,15 +3308,15 @@ static int __pci_dev_reset(struct pci_dev *dev, int prob
> >>> if (rc != -ENOTTY)
> >>> goto done;
> >>>
> >>> - rc = pci_pm_reset(dev, probe);
> >>> + rc = pci_dev_reset_slot_function(dev, probe);
> >>> if (rc != -ENOTTY)
> >>> goto done;
> >>>
> >>> - rc = pci_dev_reset_slot_function(dev, probe);
> >>> + rc = pci_parent_bus_reset(dev, probe);
> >>> if (rc != -ENOTTY)
> >>> goto done;
> >>>
> >>> - rc = pci_parent_bus_reset(dev, probe);
> >>> + rc = pci_pm_reset(dev, probe);
> >>> done:
> >>> return rc;
> >>> }
> >>
> >> This way it's crashing with echo 1 > reset, too.
> >
> > Ok, so it's somehow related to doing a bus reset with virtual channel
> > save/restore while PM reset with VC save/restore works ok as apparently
> > does bus reset without VC save/restore. Let's try to do a manual bus
> > reset so we can look at the post reset state of the device before the
> > kernel tries to restore it.
> >
> > First bind the target device 03:00.0 to pci-stub or vfio-pci so that we
> > know it's not being used.
> >
> > Next capture lspci -xxxx -s 3:00.0 so we have the starting state.
> >
> > Then we'll do a bus reset using setpci:
> > # setpci -s 00:05.0 3e.w=40:40
> > <if you script this, wait at least 2ms here>
> > # setpci -s 00:05.0 3e.w=00:40
> > <wait 1 second here>
> >
> > Now re-capture lspci -xxxx -s 3:00.0
>
> The machine is booted w/ vfio bound to 3:00.0 as usual (now for testing
> linux 3.14)
>
> lspci -xxxx -s 3:00.0
> setpci -s 00:05.0 3e.w=40:40
> usleep 10
> setpci -s 00:05.0 3e.w=00:40
> sleep 1
> lspci -xxxx -s 3:00.0
>
> I didn't get the second lspci because the machine already was hanging.
> The first output is attached completely.

Hmm, that doesn't make much sense. You had found that if you disabled
the VC save/restore then QEMU works. That should have still been using
secondary bus reset as we're trying to do here, so I don't understand
why we can't do a manual secondary bus reset now.

If you use Bjorn's previous patch to disable VC save/restore and my
patch to reorder the reset mechanisms, does echo 1 > reset for the sysfs
entry for the device also still cause a hang?

Can you provide a link to the specific model for this card? Thanks,

Alex
Andreas Hartmann
2014-10-23 17:12:04 UTC
Permalink
Alex Williamson wrote:
> On Thu, 2014-10-23 at 18:00 +0200, Andreas Hartmann wrote:
>> Alex Williamson wrote:
>>> On Wed, 2014-10-22 at 18:22 +0200, Andreas Hartmann wrote:
>>>> Alex Williamson wrote:
>>>>> --- a/drivers/pci/pci.c
>>>>> +++ b/drivers/pci/pci.c
>>>>> @@ -3308,15 +3308,15 @@ static int __pci_dev_reset(struct pci_dev *dev, int prob
>>>>> if (rc != -ENOTTY)
>>>>> goto done;
>>>>>
>>>>> - rc = pci_pm_reset(dev, probe);
>>>>> + rc = pci_dev_reset_slot_function(dev, probe);
>>>>> if (rc != -ENOTTY)
>>>>> goto done;
>>>>>
>>>>> - rc = pci_dev_reset_slot_function(dev, probe);
>>>>> + rc = pci_parent_bus_reset(dev, probe);
>>>>> if (rc != -ENOTTY)
>>>>> goto done;
>>>>>
>>>>> - rc = pci_parent_bus_reset(dev, probe);
>>>>> + rc = pci_pm_reset(dev, probe);
>>>>> done:
>>>>> return rc;
>>>>> }
>>>>
>>>> This way it's crashing with echo 1 > reset, too.
>>>
>>> Ok, so it's somehow related to doing a bus reset with virtual channel
>>> save/restore while PM reset with VC save/restore works ok as apparently
>>> does bus reset without VC save/restore. Let's try to do a manual bus
>>> reset so we can look at the post reset state of the device before the
>>> kernel tries to restore it.
>>>
>>> First bind the target device 03:00.0 to pci-stub or vfio-pci so that we
>>> know it's not being used.
>>>
>>> Next capture lspci -xxxx -s 3:00.0 so we have the starting state.
>>>
>>> Then we'll do a bus reset using setpci:
>>> # setpci -s 00:05.0 3e.w=40:40
>>> <if you script this, wait at least 2ms here>
>>> # setpci -s 00:05.0 3e.w=00:40
>>> <wait 1 second here>
>>>
>>> Now re-capture lspci -xxxx -s 3:00.0
>>
>> The machine is booted w/ vfio bound to 3:00.0 as usual (now for testing
>> linux 3.14)
>>
>> lspci -xxxx -s 3:00.0
>> setpci -s 00:05.0 3e.w=40:40
>> usleep 10
>> setpci -s 00:05.0 3e.w=00:40
>> sleep 1
>> lspci -xxxx -s 3:00.0
>>
>> I didn't get the second lspci because the machine already was hanging.
>> The first output is attached completely.
>
> Hmm, that doesn't make much sense. You had found that if you disabled
> the VC save/restore then QEMU works. That should have still been using
> secondary bus reset as we're trying to do here, so I don't understand
> why we can't do a manual secondary bus reset now.
>
> If you use Bjorn's previous patch to disable VC save/restore and my
> patch to reorder the reset mechanisms, does echo 1 > reset for the sysfs
> entry for the device also still cause a hang?

I will test it.

> Can you provide a link to the specific model for this card? Thanks,

http://www.tp-link.com.de/support/download/?model=TL-WDN4800&version=V1


Regards,
Andreas
Andreas Hartmann
2014-10-23 17:33:46 UTC
Permalink
Alex Williamson wrote:
[...]
> If you use Bjorn's previous patch to disable VC save/restore and my
> patch to reorder the reset mechanisms, does echo 1 > reset for the sysfs
> entry for the device also still cause a hang?

Yes - it's hanging too (w/ vfio bound to the device - didn't test other
possibilities).


Regards,
Andreas
Alex Williamson
2014-10-23 19:37:03 UTC
Permalink
On Thu, 2014-10-23 at 19:33 +0200, Andreas Hartmann wrote:
> Alex Williamson wrote:
> [...]
> > If you use Bjorn's previous patch to disable VC save/restore and my
> > patch to reorder the reset mechanisms, does echo 1 > reset for the sysfs
> > entry for the device also still cause a hang?
>
> Yes - it's hanging too (w/ vfio bound to the device - didn't test other
> possibilities).

Does it happen regardless of the slot the card is plugged into? Thanks,

Alex
Andreas Hartmann
2014-10-22 15:34:05 UTC
Permalink
This post might be inappropriate. Click to display it.
Alex Williamson
2014-10-22 16:02:29 UTC
Permalink
On Wed, 2014-10-22 at 17:34 +0200, Andreas Hartmann wrote:
> Alex Williamson schrieb:
> > Hi Andreas,
> >
> > On Fri, 2014-10-17 at 03:04 +0200, Andreas Hartmann wrote:
> >> Hello Alex,
> >>
> >> Alex Williamson wrote:
> >>> Hi Andreas,
> >> [...]
> >>> Sorry for the breakage. Is it possible to run lspci on the device in a
> >>> loop from the host and capture whether we're failing to restore some of
> >>> the VC bits to their previous state?
> >>
> >>> Does the problem also occur if you
> >>> unbind from host driver,
> >>
> >> The machine is booted w/ blacklisted ath9k. Then, the device is bound to
> >> vfio:
> >>
> >> echo "168c 0030" > /sys/bus/pci/drivers/vfio-pci/new_id
> >> echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind
> >> echo 0000:03:00.0 > /sys/bus/pci/drivers/vfio-pci/bind
> >>
> >> afterwards the VM is started -> hang.
> >>
> >> W/o starting th VM, I can bind it to vfio and unbind it from vfio w/o
> >> any problem.
> >>
> >>> echo 1 > reset in pci-sysfs,
> >>
> >> echo 1 > /sys/bus/pci/devices/0000:03:00.0 works w/o any problem while
> >> bound to vfio. Even after unbinding from vfio and rebinding to vfio
> >> again ... .
> >>
> >>> and re-bind to the
> >>
> >> Do you mean loading ath9k in host system after unbinding from vfio? If
> >> yes: Works w/o any problem. It's even possible to reset it or do a
> >> ifconfig wlan0 up, ifconfig wlan0 down, rmmod ath9k, bind it to vfio
> >> again and reset it, ....
> >>
> >> Looks like the hang only is triggered by qemu-system_x86_64 on startup
> >> the VM.
> >>
> >>> host? I'll also try to reproduce on my 990fx system, but I won't be
> >>> able to do that until next week due to travel. Thanks,
> >
> > Could you send me the lspci -vvvxxxx for the device and parent root
> > port? Thanks,
>
>
> Done with kernel 3.12.28 in host while the device was used in VM:
>
> # lspci -vt
> -[0000:00]-+-00.0 Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (external gfx0 port B)
> +-00.2 Advanced Micro Devices, Inc. [AMD/ATI] RD990 I/O Memory Management Unit (IOMMU)
> +-02.0-[01]--+-00.0 Advanced Micro Devices, Inc. [AMD/ATI] Turks PRO [Radeon HD 6570/7570]
> | \-00.1 Advanced Micro Devices, Inc. [AMD/ATI] Turks/Whistler HDMI Audio [Radeon HD 6000 Series]
> +-04.0-[02]----00.0 Etron Technology, Inc. EJ168 USB 3.0 Host Controller
> +-05.0-[03]----00.0 Qualcomm Atheros AR93xx Wireless Network Adapter
> +-09.0-[04]----00.0 Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> +-0a.0-[05]----00.0 Etron Technology, Inc. EJ168 USB 3.0 Host Controller
> +-11.0 Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode]
> +-12.0 Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
> +-12.2 Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
> +-13.0 Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
> +-13.2 Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
> +-14.0 Advanced Micro Devices, Inc. [AMD/ATI] SBx00 SMBus Controller
> +-14.2 Advanced Micro Devices, Inc. [AMD/ATI] SBx00 Azalia (Intel HDA)
> +-14.3 Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 LPC host controller
> +-14.4-[06]--+-06.0 Intel Corporation 82557/8/9/0/1 Ethernet Pro 100
> | \-0e.0 VIA Technologies, Inc. VT6306/7/8 [Fire II(M)] IEEE 1394 OHCI Controller
> +-14.5 Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI2 Controller
> +-15.0-[07]--
> +-16.0 Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
> +-16.2 Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
> +-18.0 Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 0
> +-18.1 Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 1
> +-18.2 Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 2
> +-18.3 Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 3
> +-18.4 Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 4
> \-18.5 Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 5
>
>
> # lspci -s 03:00 -vvvxxxx
> 03:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network Adapter (rev 01)
[snip]
>
>
> I'm not sure what you mean with "parent root port". Could it be this:

No, it's 00:05.0
Andreas Hartmann
2014-10-22 16:20:46 UTC
Permalink
This post might be inappropriate. Click to display it.
Loading...