[PATCH] PCI: update device mps when doing pci hotplug

--
1.7.1

Alex Williamson

2014-07-29 16:18:07 UTC

Apologies if we rehash some previously discussed topics while I try to
cover for Bjorn while he's out. By "normally", do you mean "optimally"?
The device should be functional with a lower mps setting, right?

Post by Yijing Wang
This issue was found in huawei 5885 server
and Dell R620 server. And if we run the platform with windows,
this problem is gone. This patch try to update the hot added
device mps equal to its parent mps, if device mpss < parent mps,
print warning.
References: https://bugzilla.kernel.org/show_bug.cgi?id=60671
---
drivers/pci/probe.c | 39 +++++++++++++++++++++++++++++++++++++++
1 files changed, 39 insertions(+), 0 deletions(-)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index e3cf8a2..583ca52 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
dev_err(&dev->dev, "MRRS was unable to be configured with a safe value. If problems are experienced, try running with pci=pcie_bus_safe\n");
}
+/**
+ * pcie_bus_update_set - update device mps when device doing hot-add
+ *
+ * After device hot add, mps will be set to default(128B), But the
+ * upstream port device's mps may be larger than 128B which was set
+ * by firmware during system bootup. Then we should update the device
+ * mps to equal to its parent mps, Or the device can not work normally.
+ */
+static void pcie_bus_update_set(struct pci_dev *dev)
+{
+ int mps, p_mps, mpss;
+ struct pci_dev *parent;
+
+ if (!pci_is_pcie(dev) || !dev->bus->self
+ || !dev->bus->self->is_hotplug_bridge)
+ return;
+
+ parent = dev->bus->self;
+ mps = pcie_get_mps(dev);
+ p_mps = pcie_get_mps(parent);
+
+ if (mps >= p_mps)
+ return;
+
+ mpss = 128 << dev->pcie_mpss;
+ if (mpss < p_mps) {
+ dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
+ "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
+ mpss, p_mps);
+ return;
+ }
+
+ pcie_write_mps(dev, p_mps);
+ dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
+ pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
+}

So if the device mps is less than the parent mps and the device supports
the parent mps, we update the device. If the device cannot support the
parent mps, warn. Why do we bypass the opportunity to reduce the device
mps if it exceeds the parent mps?

Post by Yijing Wang
+
static void pcie_bus_detect_mps(struct pci_dev *dev)
{
struct pci_dev *bridge = dev->bus->self;
@@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
return 0;
if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
+ pcie_bus_update_set(dev);
pcie_bus_detect_mps(dev);
return 0;
}

pcie_bus_update_set() and pcie_bus_detect_mps() have a lot of
redundancy, can't we merge this new functionality into the existing
function? Also, we're in the PCIE_BUS_TUNE_OFF branch, but we seem to
be adding code which would imply PCIE_BUS_PERFORMANCE since we're
bringing the device up to an optimal mps to match the parent. Is there
a simpler solution to simply downgrade the dev_warn in
pcie_bus_detect_mps() to dev_info and change the text? Thanks,

Alex

Keith Busch

2014-07-29 16:30:58 UTC

You'd think so, but some platforms don't work. A pci-e trace showed
TLPs exceeding MPS when parent device at 256B and the end device left
at 128B. Even if that's a platform bug, I think we still want it to work.

Alex Williamson

2014-07-29 16:42:58 UTC

But if it's a platform bug for a non-compliant device, should it be
handled as a quirk rather than standard configuration? Thanks,

Alex

Keith Busch

2014-07-29 19:04:19 UTC

But if it's a platform bug for a non-compliant device, should it be
handled as a quirk rather than standard configuration? Thanks,

I'm not even sure it is a platform bug, but that was just a guess. The
way the devices are configured, it appears they behave inline with the
spec (from table 7-13):

"
Max_Payload_size -- This field sets maximum TLP payload size for the
Function. As a Receiver, the Function must handle TLPs as large as the set
value. As a Transmitter, the Function must not generate TLPs exceeding the
set value.
"

It sounds like it allows a transmitter to generate a TLP that the receiver
can't handle, but I don't know if that was the intent. :)

Yijing Wang

2014-07-30 03:35:59 UTC

Apologies if we rehash some previously discussed topics while I try=

cover for Bjorn while he's out. By "normally", do you mean "optima=

lly"?

The device should be functional with a lower mps setting, right?

You'd think so, but some platforms don't work. A pci-e trace showed
TLPs exceeding MPS when parent device at 256B and the end device lef=

Post by Keith Busch
at 128B. Even if that's a platform bug, I think we still want it to =

work.

Post by Alex Williamson
=20
But if it's a platform bug for a non-compliant device, should it be
handled as a quirk rather than standard configuration? Thanks,

This is not a platform bug, in PCIe 3.0 spec 7.8.4, page 618, it said
Software can control this to achieve some purposes.

IMPLEMENTATION NOTE
Use of Max_Payload_Size
The Max_Payload_Size mechanism allows software to control the maximum p=
ayload in packets sent
by Endpoints to balance latency versus bandwidth trade-offs, particular=
ly for isochronous traffic.
If software chooses to program the Max_Payload_Size of various System E=
lements to non-default
values, it must take care to ensure that each packet does not exceed th=
e Max_Payload_Size
parameter of any System Element along the packet's path. Otherwise, th=
e packet will be rejected by 20
the System Element whose Max_Payload_Size parameter is too small.
Discussion of specific algorithms used to configure Max_Payload_Size to=
meet this requirement is
beyond the scope of this specification, but software should base its al=
gorithm upon factors such as
the following:
=C2=89 the Max_Payload_Size capability of each System Element within a=
hierarchy 25
=C2=89 awareness of when System Elements are added or removed through =
Hot-Plug operations
=C2=89 knowing which System Elements send packets to each other, what =
type of traffic is carried, what
type of transactions are used, or if packet sizes are constrained by ot=
her mechanisms

=46or the case of firmware that configures System Elements in preparati=
on for running legacy
operating system environments, the firmware may need to avoid programmi=
ng a Max_Payload_Size
above the default of 128 bytes, which is the minimum supported by Endpo=
ints.
=46or example, if the operating system environment does not comprehend =
PCI Express, firmware
probably should not program a non-default Max_Payload_Size for a hierar=
chy that supports Hot- 5
Plug operations. Otherwise, if no software is present to manage Max_Pa=
yload_Size settings when a
new element is added, improper operation may result. Note that a newly=
added element may not
even support a Max_Payload_Size setting as large as the rest of the hie=
rarchy, in which case software
may need to deny enabling the new element or reduce the Max_Payload_Siz=
e settings of other
elements.

Post by Alex Williamson
=20
Alex
=20
=20
=20
.
=20

--=20
Thanks!
Yijing

Yijing Wang

2014-07-30 03:27:56 UTC

=20
Apologies if we rehash some previously discussed topics while I try t=

cover for Bjorn while he's out. By "normally", do you mean "optimall=

y"?

The device should be functional with a lower mps setting, right?

No, the device can not work, because some pcie tlp packets will be disc=
arded.
Sorry for my poor English.

=20

ev)

Post by Yijing Wang
dev_err(&dev->dev, "MRRS was unable to be configured with a safe =

value. If problems are experienced, try running with pci=3Dpcie_bus_sa=
fe\n");

Post by Yijing Wang
}
=20
+/**
+ * pcie_bus_update_set - update device mps when device doing hot-ad=

Post by Yijing Wang
+ *=20
+ * After device hot add, mps will be set to default(128B), But the=20
+ * upstream port device's mps may be larger than 128B which was set=

=20

Post by Yijing Wang
+ * by firmware during system bootup. Then we should update the devi=

Post by Yijing Wang
+ * mps to equal to its parent mps, Or the device can not work norma=

lly.

Post by Yijing Wang
+ */
+static void pcie_bus_update_set(struct pci_dev *dev)
+{
+ int mps, p_mps, mpss;
+ struct pci_dev *parent;
+
+ if (!pci_is_pcie(dev) || !dev->bus->self=20
+ || !dev->bus->self->is_hotplug_bridge)
+ return;
+=09
+ parent =3D dev->bus->self;
+ mps =3D pcie_get_mps(dev);
+ p_mps =3D pcie_get_mps(parent);
+
+ if (mps >=3D p_mps)
+ return;
+
+ mpss =3D 128 << dev->pcie_mpss;
+ if (mpss < p_mps) {
+ dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
+ "If necessary, use \"pci=3Dpcie_bus_safe\" boot parameter to av=

oid this problem\n",

Post by Yijing Wang
+ mpss, p_mps);
+ return;
+ }
+
+ pcie_write_mps(dev, p_mps);
+ dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",=

=20

Post by Yijing Wang
+ pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
+}

=20
So if the device mps is less than the parent mps and the device suppo=

rts

the parent mps, we update the device. If the device cannot support t=

parent mps, warn. Why do we bypass the opportunity to reduce the dev=

ice

mps if it exceeds the parent mps?

Exactly, the device's mps will never larger than its parent's. That's u=
nexpected, so we leave it.

=20

Post by Yijing Wang
+
static void pcie_bus_detect_mps(struct pci_dev *dev)
{
struct pci_dev *bridge =3D dev->bus->self;
@@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_d=

ev *dev, void *data)

Post by Yijing Wang
return 0;
=20
if (pcie_bus_config =3D=3D PCIE_BUS_TUNE_OFF) {
+ pcie_bus_update_set(dev);
pcie_bus_detect_mps(dev);
return 0;
}

=20
pcie_bus_update_set() and pcie_bus_detect_mps() have a lot of
redundancy, can't we merge this new functionality into the existing
function?=20

OK=EF=BC=8C will do.

Also, we're in the PCIE_BUS_TUNE_OFF branch, but we seem to
be adding code which would imply PCIE_BUS_PERFORMANCE since we're
bringing the device up to an optimal mps to match the parent. Is the=

a simpler solution to simply downgrade the dev_warn in
pcie_bus_detect_mps() to dev_info and change the text? Thanks,

We just to adjust the device's mps to make the device can work, not for=
an
optimal mps.

=20
Alex
=20
=20
.
=20

--=20
Thanks!
Yijing

Ethan Zhao

2014-07-30 03:33:38 UTC

Yijing,
Seems you want to workaround platform bug in generic PCIe hotplug
code, while not specific to some platforms or devices, that is not
safe/fair to other vendor's platform.
and the updating to MPS of device is out of PCIe specification.
So the best way is to fix within platform, at least, any platform
specific way in Linux code.

Thanks,
Ethan

Post by Yijing Wang
Currently we don't update device's mps value when doing
pci device hot-add. The hot-added device's mps will be set
to default value (128B). But the upstream port device's mps
may be larger than 128B which was set by firmware during
system bootup. In this case the new added device may not
work normally. This issue was found in huawei 5885 server
and Dell R620 server. And if we run the platform with windows,
this problem is gone. This patch try to update the hot added
device mps equal to its parent mps, if device mpss < parent mps,
print warning.
References: https://bugzilla.kernel.org/show_bug.cgi?id=60671
---
drivers/pci/probe.c | 39 +++++++++++++++++++++++++++++++++++++++
1 files changed, 39 insertions(+), 0 deletions(-)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index e3cf8a2..583ca52 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
dev_err(&dev->dev, "MRRS was unable to be configured with a safe value. If problems are experienced, try running with pci=pcie_bus_safe\n");
}
+/**
+ * pcie_bus_update_set - update device mps when device doing hot-add
+ *
+ * After device hot add, mps will be set to default(128B), But the
+ * upstream port device's mps may be larger than 128B which was set
+ * by firmware during system bootup. Then we should update the device
+ * mps to equal to its parent mps, Or the device can not work normally.
+ */
+static void pcie_bus_update_set(struct pci_dev *dev)
+{
+ int mps, p_mps, mpss;
+ struct pci_dev *parent;
+
+ if (!pci_is_pcie(dev) || !dev->bus->self
+ || !dev->bus->self->is_hotplug_bridge)
+ return;
+
+ parent = dev->bus->self;
+ mps = pcie_get_mps(dev);
+ p_mps = pcie_get_mps(parent);
+
+ if (mps >= p_mps)
+ return;
+
+ mpss = 128 << dev->pcie_mpss;
+ if (mpss < p_mps) {
+ dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
+ "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
+ mpss, p_mps);
+ return;
+ }
+
+ pcie_write_mps(dev, p_mps);
+ dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
+ pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
+}
+
static void pcie_bus_detect_mps(struct pci_dev *dev)
{
struct pci_dev *bridge = dev->bus->self;
@@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
return 0;
if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
+ pcie_bus_update_set(dev);
pcie_bus_detect_mps(dev);
return 0;
}
--
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
More majordomo info at http://vger.kernel.org/majordomo-info.html

Yijing Wang

2014-07-30 03:42:11 UTC

Post by Ethan Zhao
Yijing,
Seems you want to workaround platform bug in generic PCIe hotplug
code, while not specific to some platforms or devices, that is not
safe/fair to other vendor's platform.
and the updating to MPS of device is out of PCIe specification.
So the best way is to fix within platform, at least, any platform
specific way in Linux code.

No, this is not a platform bug, and this is safe, you can refer to my last reply
to Alex.

Post by Ethan Zhao
Thanks,
Ethan

--
Thanks!
Yijing

Ethan Zhao

2014-07-30 03:58:08 UTC

=D4=DA 2014=C4=EA7=D4=C230=C8=D5=A3=AC=C9=CF=CE=E711:42=A3=ACYijing W=
=20

=20
No, this is not a platform bug, and this is safe, you can refer to my=

last reply

to Alex.

if is not a platform bug, why Didn't other platforms that can do hotplu=
g report such issue ?
If it was safe, is it possible that you have all other vendors verify i=
t ?=20

=20

Post by Ethan Zhao
=20
Thanks,
Ethan
=20

dev)

Post by Yijing Wang
dev_err(&dev->dev, "MRRS was unable to be configured=

with a safe value. If problems are experienced, try running with pci=3D=
pcie_bus_safe\n");

Post by Yijing Wang
}
=20
+/**
+ * pcie_bus_update_set - update device mps when device doing hot-a=

Post by Yijing Wang
+ *
+ * After device hot add, mps will be set to default(128B), But the
+ * upstream port device's mps may be larger than 128B which was se=

Post by Yijing Wang
+ * by firmware during system bootup. Then we should update the dev=

ice

Post by Yijing Wang
+ * mps to equal to its parent mps, Or the device can not work norm=

ally.

Post by Yijing Wang
+ */
+static void pcie_bus_update_set(struct pci_dev *dev)
+{
+ int mps, p_mps, mpss;
+ struct pci_dev *parent;
+
+ if (!pci_is_pcie(dev) || !dev->bus->self
+ || !dev->bus->self->is_hotplug_bridge)
+ return;
+
+ parent =3D dev->bus->self;
+ mps =3D pcie_get_mps(dev);
+ p_mps =3D pcie_get_mps(parent);
+
+ if (mps >=3D p_mps)
+ return;
+
+ mpss =3D 128 << dev->pcie_mpss;
+ if (mpss < p_mps) {
+ dev_warn(&dev->dev, "MPSS %d smaller than upstream =

MPS %d\n"

Post by Yijing Wang
+ "If necessary, use \"pci=3Dpcie_bus=

_safe\" boot parameter to avoid this problem\n",

Post by Yijing Wang
+ mpss, p_mps);
+ return;
+ }
+
+ pcie_write_mps(dev, p_mps);
+ dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %=

4d)\n",

Post by Yijing Wang
+ pcie_get_mps(dev), 128 << dev->pcie_mpss, m=

ps);

Post by Yijing Wang
+}
+
static void pcie_bus_detect_mps(struct pci_dev *dev)
{
struct pci_dev *bridge =3D dev->bus->self;
@@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_=

dev *dev, void *data)

Post by Yijing Wang
return 0;
=20
if (pcie_bus_config =3D=3D PCIE_BUS_TUNE_OFF) {
+ pcie_bus_update_set(dev);
pcie_bus_detect_mps(dev);
return 0;
}
--
1.7.1
=20
--
To unsubscribe from this list: send the line "unsubscribe linux-pci=

" in

Post by Yijing Wang
More majordomo info at http://vger.kernel.org/majordomo-info.html

=20
.

=20
=20
--=20
Thanks!
Yijing
=20

Yijing Wang

2014-07-30 04:42:01 UTC

=20
=20

=D4=DA 2014=C4=EA7=D4=C230=C8=D5=A3=AC=C9=CF=CE=E711:42=A3=ACYijing =

Post by Ethan Zhao
Yijing,
Seems you want to workaround platform bug in generic PCIe hotplu=

Post by Ethan Zhao
code, while not specific to some platforms or devices, that is not
safe/fair to other vendor's platform.
and the updating to MPS of device is out of PCIe specification.
So the best way is to fix within platform, at least, any platfor=

Post by Ethan Zhao
specific way in Linux code.

No, this is not a platform bug, and this is safe, you can refer to m=

y last reply

to Alex.

if is not a platform bug, why Didn't other platforms that can do hotp=

lug report such issue ?

If it was safe, is it possible that you have all other vendors verify=

it ?=20

This issue was triggered by firmware set the device mps to 256B. It's n=
ot related to platform
hardware.

I think this is safe, because I only modified the device newly hot-adde=
d, this device is not enabled yet.

Thanks!
Yijing.

*dev)

Post by Yijing Wang
dev_err(&dev->dev, "MRRS was unable to be configure=

d with a safe value. If problems are experienced, try running with pci=
=3Dpcie_bus_safe\n");

Post by Yijing Wang
}
+/**
+ * pcie_bus_update_set - update device mps when device doing hot-=

add

Post by Yijing Wang
+ *
+ * After device hot add, mps will be set to default(128B), But th=

Post by Yijing Wang
+ * upstream port device's mps may be larger than 128B which was s=

Post by Yijing Wang
+ * by firmware during system bootup. Then we should update the de=

vice

Post by Yijing Wang
+ * mps to equal to its parent mps, Or the device can not work nor=

mally.

Post by Yijing Wang
+ */
+static void pcie_bus_update_set(struct pci_dev *dev)
+{
+ int mps, p_mps, mpss;
+ struct pci_dev *parent;
+
+ if (!pci_is_pcie(dev) || !dev->bus->self
+ || !dev->bus->self->is_hotplug_bridge)
+ return;
+
+ parent =3D dev->bus->self;
+ mps =3D pcie_get_mps(dev);
+ p_mps =3D pcie_get_mps(parent);
+
+ if (mps >=3D p_mps)
+ return;
+
+ mpss =3D 128 << dev->pcie_mpss;
+ if (mpss < p_mps) {
+ dev_warn(&dev->dev, "MPSS %d smaller than upstream=

MPS %d\n"

Post by Yijing Wang
+ "If necessary, use \"pci=3Dpcie_bu=

s_safe\" boot parameter to avoid this problem\n",

Post by Yijing Wang
+ mpss, p_mps);
+ return;
+ }
+
+ pcie_write_mps(dev, p_mps);
+ dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was =

%4d)\n",

Post by Yijing Wang
+ pcie_get_mps(dev), 128 << dev->pcie_mpss, =

mps);

Post by Yijing Wang
+}
+
static void pcie_bus_detect_mps(struct pci_dev *dev)
{
struct pci_dev *bridge =3D dev->bus->self;
@@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci=

_dev *dev, void *data)

Post by Yijing Wang
return 0;
if (pcie_bus_config =3D=3D PCIE_BUS_TUNE_OFF) {
+ pcie_bus_update_set(dev);
pcie_bus_detect_mps(dev);
return 0;
}
--
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-pc=

i" in

Post by Yijing Wang
More majordomo info at http://vger.kernel.org/majordomo-info.html

--=20
Thanks!
Yijing

=20
.
=20

--=20
Thanks!
Yijing

Ethan Zhao

2014-07-30 06:26:30 UTC

http://marc.info/?l=e1000-devel&m=134182518500774&w=2

The bug was reported by ***@oracle.com. the related hardware
is SUN FIRE X2270 M2, it is not a hotplug sever except SAS HDD,
and the the issue is definitely a BIOS bug. not related to the hotplug
code path. was fixed by BIOS.

Thanks,
Ethan

Post by Yijing Wang
---
drivers/pci/probe.c | 39 +++++++++++++++++++++++++++++++++++++++
1 files changed, 39 insertions(+), 0 deletions(-)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index e3cf8a2..583ca52 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
dev_err(&dev->dev, "MRRS was unable to be configured with a safe value. If problems are experienced, try running with pci=pcie_bus_safe\n");
}
+/**
+ * pcie_bus_update_set - update device mps when device doing hot-add
+ *
+ * After device hot add, mps will be set to default(128B), But the
+ * upstream port device's mps may be larger than 128B which was set
+ * by firmware during system bootup. Then we should update the device
+ * mps to equal to its parent mps, Or the device can not work normally.
+ */
+static void pcie_bus_update_set(struct pci_dev *dev)
+{
+ int mps, p_mps, mpss;
+ struct pci_dev *parent;
+
+ if (!pci_is_pcie(dev) || !dev->bus->self
+ || !dev->bus->self->is_hotplug_bridge)
+ return;
+
+ parent = dev->bus->self;
+ mps = pcie_get_mps(dev);
+ p_mps = pcie_get_mps(parent);
+
+ if (mps >= p_mps)
+ return;
+
+ mpss = 128 << dev->pcie_mpss;
+ if (mpss < p_mps) {
+ dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
+ "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
+ mpss, p_mps);
+ return;
+ }
+
+ pcie_write_mps(dev, p_mps);
+ dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
+ pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
+}
+
static void pcie_bus_detect_mps(struct pci_dev *dev)
{
struct pci_dev *bridge = dev->bus->self;
@@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
return 0;
if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
+ pcie_bus_update_set(dev);
pcie_bus_detect_mps(dev);
return 0;
}
--
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
More majordomo info at http://vger.kernel.org/majordomo-info.html

Yijing Wang

2014-07-30 06:57:24 UTC

http://marc.info/?l=e1000-devel&m=134182518500774&w=2
is SUN FIRE X2270 M2, it is not a hotplug sever except SAS HDD,
and the the issue is definitely a BIOS bug. not related to the hotplug
code path. was fixed by BIOS.

Yes, that issue is BIOS bug, the mps setting is wrong after system boot up.
But that issue is not same as this one, Keith and Jordan found the issue
after hot-plug. And my patch only modify the hotplug slot connected device.

In my idea, make the device work is important, because these platforms with windows
can run happy, why linux leave this issue to BIOS.

Anyway, this is my personal opinion.

Thanks!
Yijing.

--
Thanks!
Yijing

Ethan Zhao

2014-07-30 07:17:01 UTC

Yes, that issue is BIOS bug, the mps setting is wrong after system boot up.
But that issue is not same as this one, Keith and Jordan found the issue
after hot-plug. And my patch only modify the hotplug slot connected device.
In my idea, make the device work is important, because these platforms with windows
can run happy, why linux leave this issue to BIOS.

That is a reason to make it works with Linux, but does your
platform have _HPX from ACPI for those hot-added back devices ? if it
has, maybe windows could apply _HPX to configure those devices and
work well.

Maybe you should check with _HPX . That is a generic way to import
configuration from firmware when hot-add device.

Please see also
chapter 6.2.8 _HPX (Hot Plug Parameter Extensions) of ACPI spec 4&5

Thanks,
Ethan

Post by Yijing Wang
Anyway, this is my personal opinion.
Thanks!
Yijing.

--
Thanks!
Yijing

Yijing Wang

2014-07-30 08:13:54 UTC

Post by Yijing Wang
Yes, that issue is BIOS bug, the mps setting is wrong after system boot up.
But that issue is not same as this one, Keith and Jordan found the issue
after hot-plug. And my patch only modify the hotplug slot connected device.
In my idea, make the device work is important, because these platforms with windows
can run happy, why linux leave this issue to BIOS.

I checked DSDT table exported from my server, but no "_HPX" found.
Further more, kernel use pciehp first to support pcie hotplug device. And in pciehp,
driver won't touch ACPI methods like "_HPX".

Thanks!
Yijing.

--
Thanks!
Yijing

Ethan Zhao

2014-07-30 08:38:43 UTC

I checked DSDT table exported from my server, but no "_HPX" found.

You might check the wrong place...

Post by Yijing Wang
Further more, kernel use pciehp first to support pcie hotplug device. And in pciehp,
driver won't touch ACPI methods like "_HPX".

That is not true, please read the code in pciehp_pci.c

pciehp_configure_device()
pci_configure_slot()
pci_get_hp_params()
acpi_run_hpx()
acpi_evaluate_object(handle, "_HPX",...)
...

Native pciehp also does acpi parameter importing ... ...

Thank,s
Ethan

Post by Yijing Wang
Thanks!
Yijing.

--
Thanks!
Yijing

Yijing Wang

2014-07-30 09:17:13 UTC

I checked DSDT table exported from my server, but no "_HPX" found.

You might check the wrong place...

Post by Yijing Wang
Further more, kernel use pciehp first to support pcie hotplug device. And in pciehp,
driver won't touch ACPI methods like "_HPX".

That is not true, please read the code in pciehp_pci.c
pciehp_configure_device()
pci_configure_slot()
pci_get_hp_params()
acpi_run_hpx()
acpi_evaluate_object(handle, "_HPX",...)
...

Sorry, I make a mistake. You are right, Use the _HPX can fix the issue from BIOS.
But what confused me is why linux fail to reconfigure mps after hotplug, but windows did it well.

Maybe we can cache the device mps into pci_dev, every time we call pcie_bus_configure_settings(),
if we find the device mps is not equal to the cached value, and the device is disabled, restore it.

/* PCI Express Setting Record (Type 2) */
struct hpp_type2 {
u32 revision;
u32 unc_err_mask_and;
u32 unc_err_mask_or;
u32 unc_err_sever_and;
u32 unc_err_sever_or;
u32 cor_err_mask_and;
u32 cor_err_mask_or;
u32 adv_err_cap_and;
u32 adv_err_cap_or;
u16 pci_exp_devctl_and;
u16 pci_exp_devctl_or;
u16 pci_exp_lnkctl_and;
u16 pci_exp_lnkctl_or;
u32 sec_unc_err_sever_and;
u32 sec_unc_err_sever_or;
u32 sec_unc_err_mask_and;
u32 sec_unc_err_mask_or;
};

Post by Ethan Zhao
Native pciehp also does acpi parameter importing ... ...
Thank,s
Ethan

Post by Yijing Wang
Thanks!
Yijing.

--
Thanks!
Yijing

J***@Dell.com

2014-07-30 19:41:41 UTC

One concern with using _HPX is there doesn't appear to be a failure mechanism...
What happens if inserting a device where its maximum possible MPS is < the MPS of the parent device.

The _HPX AML would have to read the parent device MPS which would be pretty ugly.

--jordan hargrave
Dell Enterprise Linux Engineering
________________________________________
From: Yijing Wang [***@huawei.com]
Sent: Wednesday, July 30, 2014 4:17 AM
To: Ethan Zhao
Cc: Bjorn Helgaas; linux-pci; Hargrave, Jordan; <***@intel.com>; Jon Mason
Subject: Re: [PATCH] PCI: update device mps when doing pci hotplug

I checked DSDT table exported from my server, but no "_HPX" found.

You might check the wrong place...

Post by Yijing Wang
Further more, kernel use pciehp first to support pcie hotplug device. And in pciehp,
driver won't touch ACPI methods like "_HPX".

That is not true, please read the code in pciehp_pci.c
pciehp_configure_device()
pci_configure_slot()
pci_get_hp_params()
acpi_run_hpx()
acpi_evaluate_object(handle, "_HPX",...)
...

Post by Ethan Zhao
Native pciehp also does acpi parameter importing ... ...
Thank,s
Ethan

Post by Yijing Wang
Thanks!
Yijing.

--
Thanks!
Yijing

Bjorn Helgaas

2014-09-03 19:20:15 UTC

Post by J***@Dell.com
One concern with using _HPX is there doesn't appear to be a failure mechanism...
What happens if inserting a device where its maximum possible MPS is < the MPS of the parent device.
The _HPX AML would have to read the parent device MPS which would be pretty ugly.

_HPX should be applied when a device is added via any hotplug driver,
including pciehp, but I don't think it is a reasonable way to fix
this. The spec (ACPI r5.0, sec 6.2.8.3) allows a Type 2 record to set
arbitrary values in the Device Control register (including MPS), and
that's what the current Linux implementation does.

However, the spec allows OSPM to override any bits deemed necessary.
MPS is a property that is controlled by OSPM and it depends on other
devices in the system, so I don't think it is safe to put an MPS value
from _HPX directly into Device Control. OSPM should first ensure that
the value is compatible with how it has configured the rest of the
system.

Post by J***@Dell.com
________________________________________
Sent: Wednesday, July 30, 2014 4:17 AM
To: Ethan Zhao
Subject: Re: [PATCH] PCI: update device mps when doing pci hotplug

I checked DSDT table exported from my server, but no "_HPX" found.

You might check the wrong place...

Post by Yijing Wang
Further more, kernel use pciehp first to support pcie hotplug device. And in pciehp,
driver won't touch ACPI methods like "_HPX".

That is not true, please read the code in pciehp_pci.c
pciehp_configure_device()
pci_configure_slot()
pci_get_hp_params()
acpi_run_hpx()
acpi_evaluate_object(handle, "_HPX",...)
...

Sorry, I make a mistake. You are right, Use the _HPX can fix the issue from BIOS.
But what confused me is why linux fail to reconfigure mps after hotplug, but windows did it well.
Maybe we can cache the device mps into pci_dev, every time we call pcie_bus_configure_settings(),
if we find the device mps is not equal to the cached value, and the device is disabled, restore it.
/* PCI Express Setting Record (Type 2) */
struct hpp_type2 {
u32 revision;
u32 unc_err_mask_and;
u32 unc_err_mask_or;
u32 unc_err_sever_and;
u32 unc_err_sever_or;
u32 cor_err_mask_and;
u32 cor_err_mask_or;
u32 adv_err_cap_and;
u32 adv_err_cap_or;
u16 pci_exp_devctl_and;
u16 pci_exp_devctl_or;
u16 pci_exp_lnkctl_and;
u16 pci_exp_lnkctl_or;
u32 sec_unc_err_sever_and;
u32 sec_unc_err_sever_or;
u32 sec_unc_err_mask_and;
u32 sec_unc_err_mask_or;
};

Post by Ethan Zhao
Native pciehp also does acpi parameter importing ... ...
Thank,s
Ethan

Post by Yijing Wang
Thanks!
Yijing.

--
Thanks!
Yijing

Bjorn Helgaas

2014-09-03 22:42:01 UTC

Part of this looks redundant because pcie_bus_configure_set() already
checks pci_is_pcie(). And I don't know why we need to test
is_hotplug_bridge here; MPS settings need to be consistent regardless of
whether the upstream bridge supports hotplug.

Post by Yijing Wang
+ return;
+
+ parent = dev->bus->self;
+ mps = pcie_get_mps(dev);
+ p_mps = pcie_get_mps(parent);
+
+ if (mps >= p_mps)
+ return;
+
+ mpss = 128 << dev->pcie_mpss;
+ if (mpss < p_mps) {
+ dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
+ "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
+ mpss, p_mps);
+ return;

Since we can't configure the new device correctly, we really shouldn't
allow a driver to bind to it. The current design doesn't have much
provision for doing that, so warning is probably all we can do.

Post by Yijing Wang
+ }
+
+ pcie_write_mps(dev, p_mps);
+ dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
+ pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
+}
+
static void pcie_bus_detect_mps(struct pci_dev *dev)
{
struct pci_dev *bridge = dev->bus->self;
@@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
return 0;
if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
+ pcie_bus_update_set(dev);

You're only adding this to the PCIE_BUS_TUNE_OFF path. Can't the same
problem occur for other pcie_bus_config settings?

Post by Yijing Wang
pcie_bus_detect_mps(dev);
return 0;
}

I have some long-term ideas here (below), but to make progress in the short
term, I think we just need to make sure this handles all pcie_bus_config
settings.

Bjorn

Stepping back a long ways, I think the current design is hard to use.
It's set up with the idea that we (1) enumerate all the devices in the
system, and then (2) configure MPS for everything all at once.

That's not a very good fit when we start hotplugging devices, and it's
part of the reason MPS configuration is not well integrated into the PCI
core and doesn't get done at all for most architectures.

What I'd prefer is something that could be done in the core as each device
is enumerated, e.g., in or near pci_device_add(). I know there's tension
between the need to do this before drivers bind to the device and the
desire to enumerate the whole hierarchy before committing to MPS settings.
But we need to handle that tension anyway for hot-added devices, so we
might as well deal with it at boot-time and use the same code path for
both boot-time and hot-add time.

I have in mind something like this:

pcie_configure_mps(struct pci_dev *dev)
{
int ret;

if (!pci_is_pci(dev))
return;

if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) {
/* set my MPS to dev->pcie_mpss (max supported size) */
return;
}

if (dev->pcie_mpss >= upstream bridge MPS) {
/* set my MPS to upstream bridge MPS */
return;
}

ret = pcie_set_hierarchy_mps(pcie_root_port(dev), dev->mpss);
if (ret == failure)
/* emit warning, can't enable this device */
}

struct pci_dev *pcie_root_port(struct pci_dev *dev)
{
if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
return dev;

return pcie_root_port(dev->bus->self);
}

pcie_set_hierarchy_mps(struct pci_dev *root, int mpss)
{
struct pci_bus *secondary;
struct pci_dev *dev;
int ret;

if (root->driver)
return -EINVAL;

secondary = root->subordinate;
if (secondary) {
list_for_each_entry(dev, &secondary->devices, bus_list) {
ret = pcie_set_hierarchy(dev, mpss);
if (ret)
return ret;
}
}

/* set my MPS to mpss */
return 0;
}

Yijing Wang

2014-09-04 06:12:41 UTC

Post by Yijing Wang
+ * pcie_bus_update_set - update device mps when device doing hot-add
+ *
+ * After device hot add, mps will be set to default(128B), But the
+ * upstream port device's mps may be larger than 128B which was set
+ * by firmware during system bootup. Then we should update the device
+ * mps to equal to its parent mps, Or the device can not work normally.
+ */
+static void pcie_bus_update_set(struct pci_dev *dev)
+{
+ int mps, p_mps, mpss;
+ struct pci_dev *parent;
+
+ if (!pci_is_pcie(dev) || !dev->bus->self
+ || !dev->bus->self->is_hotplug_bridge)

Hi Bjorn, I added is_hotplug_bridge() here is mainly to touch the hotplug case only.
It was more like a temporary solution and not perfect one.

Yes, bind a driver to the device which mps is not correctly set will cause another problem.

You're only adding this to the PCIE_BUS_TUNE_OFF path. Can't the same
problem occur for other pcie_bus_config settings?

We only found the problem during PCIE_BUS_TUNE_OFF set. Other mode like PCIE_BUS_SAFE and PCIE_BUS_PEER2PEER.
This issue won't happen.

Post by Yijing Wang
pcie_bus_detect_mps(dev);
return 0;
}

Agree, arch code should not be involved the MPS setting. It's arch independent.

Post by Bjorn Helgaas
What I'd prefer is something that could be done in the core as each device
is enumerated, e.g., in or near pci_device_add(). I know there's tension
between the need to do this before drivers bind to the device and the
desire to enumerate the whole hierarchy before committing to MPS settings.
But we need to handle that tension anyway for hot-added devices, so we
might as well deal with it at boot-time and use the same code path for
both boot-time and hot-add time.
pcie_configure_mps(struct pci_dev *dev)
{
int ret;
if (!pci_is_pci(dev))
return;
if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) {
/* set my MPS to dev->pcie_mpss (max supported size) */
return;
}
if (dev->pcie_mpss >= upstream bridge MPS) {
/* set my MPS to upstream bridge MPS */
return;
}
ret = pcie_set_hierarchy_mps(pcie_root_port(dev), dev->mpss);
if (ret == failure)
/* emit warning, can't enable this device */

If got failure here, should roll back ? What about set hierarchy mps in reverse order(down to top).

Post by Bjorn Helgaas
}
struct pci_dev *pcie_root_port(struct pci_dev *dev)
{
if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
return dev;
return pcie_root_port(dev->bus->self);
}
pcie_set_hierarchy_mps(struct pci_dev *root, int mpss)
{
struct pci_bus *secondary;
struct pci_dev *dev;
int ret;
if (root->driver)
return -EINVAL;

Maybe it's not safe enough, change device's mps has risk unless all its children devices have no driver bound(disabled).
A root port may has no pcieport driver bound, if pcieport driver probe failed. But its children device can work normally.

Post by Bjorn Helgaas
secondary = root->subordinate;
if (secondary) {
list_for_each_entry(dev, &secondary->devices, bus_list) {
ret = pcie_set_hierarchy(dev, mpss);
if (ret)
return ret;
}
}
/* set my MPS to mpss */
return 0;
}
.

--
Thanks!
Yijing

Bjorn Helgaas

2014-09-04 13:16:15 UTC

Hi Bjorn, I added is_hotplug_bridge() here is mainly to touch the hotplug case only.
It was more like a temporary solution and not perfect one.

Yes, bind a driver to the device which mps is not correctly set will cause another problem.

You're only adding this to the PCIE_BUS_TUNE_OFF path. Can't the same
problem occur for other pcie_bus_config settings?

We only found the problem during PCIE_BUS_TUNE_OFF set. Other mode like PCIE_BUS_SAFE and PCIE_BUS_PEER2PEER.
This issue won't happen.

Sorry, I can't parse this. Are you saying the problem won't happen in
the other modes? Why not?

Post by Yijing Wang
pcie_bus_detect_mps(dev);
return 0;
}

Yijing Wang

2014-09-05 01:27:26 UTC

Post by Yijing Wang
if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
+ pcie_bus_update_set(dev);

You're only adding this to the PCIE_BUS_TUNE_OFF path. Can't the same
problem occur for other pcie_bus_config settings?

We only found the problem during PCIE_BUS_TUNE_OFF set. Other mode like PCIE_BUS_SAFE and PCIE_BUS_PEER2PEER.
This issue won't happen.

Sorry, I can't parse this. Are you saying the problem won't happen in
the other modes? Why not?

Hi Bjorn, when in PCIE_BUS_SAFE mode, pcie_find_smpss() will find the largest available mpss in a pcie path.
Then call pcie_bus_configure_set() to set all devices' mps to the largest available mps in this path, so
all devices in the path will have the same mps. When in PCIE_BUS_PEER2PEER, all devices' mps will be set to 128B
for safety. And to the PCIE_BUS_PERFORMANCE mode, I found Jon's comment in pcie_write_mps(),

/* For "Performance", the assumption is made that
* downstream communication will never be larger than
* the MRRS. So, the MPS only needs to be configured
* for the upstream communication. This being the case, <------
* walk from the top down and set the MPS of the child
* to that of the parent bus.

So I think the problem won't happen in other modes.

Thanks!
Yijing.

Post by Yijing Wang
pcie_bus_detect_mps(dev);
return 0;
}

--
Thanks!
Yijing

Keith Busch

2014-09-05 14:37:11 UTC

Post by Yijing Wang
So I think the problem won't happen in other modes.

Yes, you're right; we currently recommend end-users set one of the other
bus tuning options because the problem doesn't happen there.

Keith Busch

2014-09-24 22:41:09 UTC

Just poking this thread to make sure it's not dead. :)

I tested Yijing's proposal and it is successful on our Intel server
platforms; hoping either this or something that derives similar behavior
will be applied so we can remove bus tuning kernel parameters.

Post by Yijing Wang
if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
+ pcie_bus_update_set(dev);

You're only adding this to the PCIE_BUS_TUNE_OFF path. Can't the same
problem occur for other pcie_bus_config settings?

We only found the problem during PCIE_BUS_TUNE_OFF set. Other mode like PCIE_BUS_SAFE and PCIE_BUS_PEER2PEER.
This issue won't happen.

Sorry, I can't parse this. Are you saying the problem won't happen in
the other modes? Why not?

Hi Bjorn, when in PCIE_BUS_SAFE mode, pcie_find_smpss() will find the largest available mpss in a pcie path.
Then call pcie_bus_configure_set() to set all devices' mps to the largest available mps in this path, so
all devices in the path will have the same mps. When in PCIE_BUS_PEER2PEER, all devices' mps will be set to 128B
for safety. And to the PCIE_BUS_PERFORMANCE mode, I found Jon's comment in pcie_write_mps(),
/* For "Performance", the assumption is made that
* downstream communication will never be larger than
* the MRRS. So, the MPS only needs to be configured
* for the upstream communication. This being the case, <------
* walk from the top down and set the MPS of the child
* to that of the parent bus.
So I think the problem won't happen in other modes.
Thanks!
Yijing.

Post by Yijing Wang
pcie_bus_detect_mps(dev);
return 0;
}

--
Thanks!
Yijing

Bjorn Helgaas

2014-09-24 23:30:39 UTC

Post by Keith Busch
Just poking this thread to make sure it's not dead. :)
I tested Yijing's proposal and it is successful on our Intel server
platforms; hoping either this or something that derives similar behavior
will be applied so we can remove bus tuning kernel parameters.

Oops, thanks for poking me, because this was indeed dead.

My main objection was to testing "is_hotplug_bridge". That doesn't
seem right, because this issue really isn't specific to hotplug. I
didn't see a resolution of that, but let me know if I missed it.

Bjorn

Post by Yijing Wang
if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
+ pcie_bus_update_set(dev);

You're only adding this to the PCIE_BUS_TUNE_OFF path. Can't the same
problem occur for other pcie_bus_config settings?

We only found the problem during PCIE_BUS_TUNE_OFF set. Other mode like
PCIE_BUS_SAFE and PCIE_BUS_PEER2PEER.
This issue won't happen.

Sorry, I can't parse this. Are you saying the problem won't happen in
the other modes? Why not?

Hi Bjorn, when in PCIE_BUS_SAFE mode, pcie_find_smpss() will find the
largest available mpss in a pcie path.
Then call pcie_bus_configure_set() to set all devices' mps to the largest
available mps in this path, so
all devices in the path will have the same mps. When in
PCIE_BUS_PEER2PEER, all devices' mps will be set to 128B
for safety. And to the PCIE_BUS_PERFORMANCE mode, I found Jon's comment in
pcie_write_mps(),
/* For "Performance", the assumption is made that
* downstream communication will never be larger than
* the MRRS. So, the MPS only needs to be configured
* for the upstream communication. This being the case, <------
* walk from the top down and set the MPS of the child
* to that of the parent bus.
So I think the problem won't happen in other modes.
Thanks!
Yijing.

Post by Yijing Wang
pcie_bus_detect_mps(dev);
return 0;
}

--
Thanks!
Yijing

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
More majordomo info at http://vger.kernel.org/majordomo-info.html

Yijing Wang

2014-09-25 01:23:07 UTC

Oops, thanks for poking me, because this was indeed dead.
My main objection was to testing "is_hotplug_bridge". That doesn't
seem right, because this issue really isn't specific to hotplug. I
didn't see a resolution of that, but let me know if I missed it.

Why I introduced "is_hotplug_bridge" is to avoid to touch the MPS which is not
in hotplug case when pcie_bus_config == PCIE_BUS_TUNE_OFF.

It's so sad that PCIe spec doesn't give a detailed guide to configure MPS.

I'd like to refactor current MPS framework, but now there are still some puzzles
to me. I need to have a deeper understanding of pcie mps. I read Jon's mps patch
log from git, I found he turn off all this MPS config, because some issues were found
in some platforms, but no platforms detailed info and no bugzilla records.

Post by Bjorn Helgaas
Bjorn

Post by Yijing Wang
if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
+ pcie_bus_update_set(dev);

You're only adding this to the PCIE_BUS_TUNE_OFF path. Can't the same
problem occur for other pcie_bus_config settings?

We only found the problem during PCIE_BUS_TUNE_OFF set. Other mode like
PCIE_BUS_SAFE and PCIE_BUS_PEER2PEER.
This issue won't happen.

Sorry, I can't parse this. Are you saying the problem won't happen in
the other modes? Why not?

Hi Bjorn, when in PCIE_BUS_SAFE mode, pcie_find_smpss() will find the
largest available mpss in a pcie path.
Then call pcie_bus_configure_set() to set all devices' mps to the largest
available mps in this path, so
all devices in the path will have the same mps. When in
PCIE_BUS_PEER2PEER, all devices' mps will be set to 128B
for safety. And to the PCIE_BUS_PERFORMANCE mode, I found Jon's comment in
pcie_write_mps(),
/* For "Performance", the assumption is made that
* downstream communication will never be larger than
* the MRRS. So, the MPS only needs to be configured
* for the upstream communication. This being the
case, <------
* walk from the top down and set the MPS of the child
* to that of the parent bus.
So I think the problem won't happen in other modes.
Thanks!
Yijing.

Post by Yijing Wang
pcie_bus_detect_mps(dev);
return 0;
}

--
Thanks!
Yijing

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Thanks!
Yijing

Keith Busch

2014-09-25 16:46:58 UTC

Oops, thanks for poking me, because this was indeed dead.
My main objection was to testing "is_hotplug_bridge". That doesn't
seem right, because this issue really isn't specific to hotplug. I
didn't see a resolution of that, but let me know if I missed it.

Why I introduced "is_hotplug_bridge" is to avoid to touch the MPS which is not
in hotplug case when pcie_bus_config == PCIE_BUS_TUNE_OFF.
It's so sad that PCIe spec doesn't give a detailed guide to configure MPS.
I'd like to refactor current MPS framework, but now there are still some puzzles
to me. I need to have a deeper understanding of pcie mps. I read Jon's mps patch
log from git, I found he turn off all this MPS config, because some issues were found
in some platforms, but no platforms detailed info and no bugzilla records.

Just my opinion, I thought the hotplug check was a good idea: it addresses
a known issue, and does not mess with current unknowns. Outside a hotplug
scenario, I think we expect platform f/w to handle MPS settings and the
kernel can stay out of the way because of the unknown platform issues. If
it is hotplug, having the kernel set device's MPS to match the parent
couldn't make things worse off than doing nothing, right?

On the side, I'll see if I can ping some comrades on PCI-SIG to propose
an ECN to clarify configuring MPS. They usually ignore me though, so no
promises. :)

Yijing Wang

2014-09-26 03:22:41 UTC

Oops, thanks for poking me, because this was indeed dead.
My main objection was to testing "is_hotplug_bridge". That doesn't
seem right, because this issue really isn't specific to hotplug. I
didn't see a resolution of that, but let me know if I missed it.

Why I introduced "is_hotplug_bridge" is to avoid to touch the MPS which is not
in hotplug case when pcie_bus_config == PCIE_BUS_TUNE_OFF.
It's so sad that PCIe spec doesn't give a detailed guide to configure MPS.
I'd like to refactor current MPS framework, but now there are still some puzzles
to me. I need to have a deeper understanding of pcie mps. I read Jon's mps patch
log from git, I found he turn off all this MPS config, because some issues were found
in some platforms, but no platforms detailed info and no bugzilla records.

Yes, I think so, but it all decided by Bjorn. I think he want a better solution,
and current patch just still is a temporary fix.

Post by Keith Busch
On the side, I'll see if I can ping some comrades on PCI-SIG to propose
an ECN to clarify configuring MPS. They usually ignore me though, so no
promises. :)

Thanks in advance for your help :)

Post by Keith Busch
.

--
Thanks!
Yijing

J***@Dell.com

2014-10-02 15:31:39 UTC

\My original proposed patch to handle MPS only lived in the hotplug code path, it wasn't part of the initial PCI configuration.
This is the code that we have had to insert into our drivers (MTIP32XX, NVME) as the current default setting doesn't work for hotplug.

http://www.spinics.net/lists/hotplug/msg04728.html

--jordan hargrave
Dell Enterprise Linux Engineering
________________________________________
From: Yijing Wang [***@huawei.com]
Sent: Thursday, September 25, 2014 10:22 PM
To: Keith Busch
Cc: Bjorn Helgaas; linux-***@vger.kernel.org; Hargrave, Jordan; Jon Mason
Subject: Re: [PATCH] PCI: update device mps when doing pci hotplug