diff mbox

netxen: Fix a sleep-in-atomic bug in netxen_nic_pci_mem_access_direct

Message ID 1497840533-4894-1-git-send-email-baijiaju1990@163.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Jia-Ju Bai June 19, 2017, 2:48 a.m. UTC
The driver may sleep under a spin lock, and the function call path is:
netxen_nic_pci_mem_access_direct (acquire the lock by spin_lock)
  ioremap --> may sleep

To fix it, the lock is released before "ioremap", and the lock is 
acquired again after this function.

Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
---
 drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c |    2 ++
 1 file changed, 2 insertions(+)

Comments

David Miller June 20, 2017, 5:35 p.m. UTC | #1
From: Jia-Ju Bai <baijiaju1990@163.com>
Date: Mon, 19 Jun 2017 10:48:53 +0800

> The driver may sleep under a spin lock, and the function call path is:
> netxen_nic_pci_mem_access_direct (acquire the lock by spin_lock)
>   ioremap --> may sleep
> 
> To fix it, the lock is released before "ioremap", and the lock is 
> acquired again after this function.
> 
> Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>

This style of change you are making is really starting to be a
problem.

You can't just drop locks like this, especially without explaining
why it's ok, and why the mutual exclusion this code was trying to
achieve is still going to be OK afterwards.

In fact, I see zero analysis of the locking situation here, why
it was needed in the first place, and why your change is OK in
that context.

Any locking change is delicate, and you must put the greatest of
care and consideration into it.

Just putting "unlock/lock" around the sleeping operation shows a
very low level of consideration for the implications of the change
you are making.

This isn't like making whitespace fixes, sorry...
Kalle Valo June 21, 2017, 6:11 a.m. UTC | #2
David Miller <davem@davemloft.net> writes:

> From: Jia-Ju Bai <baijiaju1990@163.com>
> Date: Mon, 19 Jun 2017 10:48:53 +0800
>
>> The driver may sleep under a spin lock, and the function call path is:
>> netxen_nic_pci_mem_access_direct (acquire the lock by spin_lock)
>>   ioremap --> may sleep
>> 
>> To fix it, the lock is released before "ioremap", and the lock is 
>> acquired again after this function.
>> 
>> Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
>
> This style of change you are making is really starting to be a
> problem.
>
> You can't just drop locks like this, especially without explaining
> why it's ok, and why the mutual exclusion this code was trying to
> achieve is still going to be OK afterwards.
>
> In fact, I see zero analysis of the locking situation here, why
> it was needed in the first place, and why your change is OK in
> that context.
>
> Any locking change is delicate, and you must put the greatest of
> care and consideration into it.
>
> Just putting "unlock/lock" around the sleeping operation shows a
> very low level of consideration for the implications of the change
> you are making.
>
> This isn't like making whitespace fixes, sorry...

We already tried to explain this to Jia-Ju during review of a wireless
patch:

https://patchwork.kernel.org/patch/9756585/

Jia-Ju, you should listen to feedback. If you continue submitting random
patches like this makes it hard for maintainers to trust your patches
anymore.
Jia-Ju Bai June 21, 2017, 6:33 a.m. UTC | #3
On 06/21/2017 02:11 PM, Kalle Valo wrote:
> David Miller<davem@davemloft.net>  writes:
>
>> From: Jia-Ju Bai<baijiaju1990@163.com>
>> Date: Mon, 19 Jun 2017 10:48:53 +0800
>>
>>> The driver may sleep under a spin lock, and the function call path is:
>>> netxen_nic_pci_mem_access_direct (acquire the lock by spin_lock)
>>>    ioremap -->  may sleep
>>>
>>> To fix it, the lock is released before "ioremap", and the lock is
>>> acquired again after this function.
>>>
>>> Signed-off-by: Jia-Ju Bai<baijiaju1990@163.com>
>> This style of change you are making is really starting to be a
>> problem.
>>
>> You can't just drop locks like this, especially without explaining
>> why it's ok, and why the mutual exclusion this code was trying to
>> achieve is still going to be OK afterwards.
>>
>> In fact, I see zero analysis of the locking situation here, why
>> it was needed in the first place, and why your change is OK in
>> that context.
>>
>> Any locking change is delicate, and you must put the greatest of
>> care and consideration into it.
>>
>> Just putting "unlock/lock" around the sleeping operation shows a
>> very low level of consideration for the implications of the change
>> you are making.
>>
>> This isn't like making whitespace fixes, sorry...
> We already tried to explain this to Jia-Ju during review of a wireless
> patch:
>
> https://patchwork.kernel.org/patch/9756585/
>
> Jia-Ju, you should listen to feedback. If you continue submitting random
> patches like this makes it hard for maintainers to trust your patches
> anymore.
>
Hi,

I am quite sorry for my incorrect patches, and I will listen carefully 
to your advice.
In fact, for some bugs and patches which I have reported before, I have 
not received the feedback of them, so I resent them a few days ago, 
including this patch.
Sorry for my mistake again.

Thanks,
Jia-Ju Bai
Kalle Valo June 21, 2017, 1:40 p.m. UTC | #4
Jia-Ju Bai <baijiaju1990@163.com> writes:

> On 06/21/2017 02:11 PM, Kalle Valo wrote:
>> David Miller<davem@davemloft.net>  writes:
>>
>>> From: Jia-Ju Bai<baijiaju1990@163.com>
>>> Date: Mon, 19 Jun 2017 10:48:53 +0800
>>>
>>>> The driver may sleep under a spin lock, and the function call path is:
>>>> netxen_nic_pci_mem_access_direct (acquire the lock by spin_lock)
>>>>    ioremap -->  may sleep
>>>>
>>>> To fix it, the lock is released before "ioremap", and the lock is
>>>> acquired again after this function.
>>>>
>>>> Signed-off-by: Jia-Ju Bai<baijiaju1990@163.com>
>>> This style of change you are making is really starting to be a
>>> problem.
>>>
>>> You can't just drop locks like this, especially without explaining
>>> why it's ok, and why the mutual exclusion this code was trying to
>>> achieve is still going to be OK afterwards.
>>>
>>> In fact, I see zero analysis of the locking situation here, why
>>> it was needed in the first place, and why your change is OK in
>>> that context.
>>>
>>> Any locking change is delicate, and you must put the greatest of
>>> care and consideration into it.
>>>
>>> Just putting "unlock/lock" around the sleeping operation shows a
>>> very low level of consideration for the implications of the change
>>> you are making.
>>>
>>> This isn't like making whitespace fixes, sorry...
>> We already tried to explain this to Jia-Ju during review of a wireless
>> patch:
>>
>> https://patchwork.kernel.org/patch/9756585/
>>
>> Jia-Ju, you should listen to feedback. If you continue submitting random
>> patches like this makes it hard for maintainers to trust your patches
>> anymore.
>>
> Hi,
>
> I am quite sorry for my incorrect patches, and I will listen carefully
> to your advice. In fact, for some bugs and patches which I have
> reported before, I have not received the feedback of them, so I resent
> them a few days ago, including this patch.

Yeah, it is likely that some of your reports will not get any response.
For that I only suggest being persistent and providing more information
about the issue and suggestions how it might be possible to fix it. Also
Dan Carpenter (Cced) might have some suggestions.

But trying to "fix" it by just silencing the warning without proper
analysis is totally the wrong approach, you do more harm than good.

What tool do you use to find these issues? Is it publically available?
Jia-Ju Bai June 21, 2017, 2:32 p.m. UTC | #5
On 2017/6/21 21:40, Kalle Valo wrote:

> Jia-Ju Bai <baijiaju1990@163.com> writes:
>
>> On 06/21/2017 02:11 PM, Kalle Valo wrote:
>>> David Miller<davem@davemloft.net>  writes:
>>>
>>>> From: Jia-Ju Bai<baijiaju1990@163.com>
>>>> Date: Mon, 19 Jun 2017 10:48:53 +0800
>>>>
>>>>> The driver may sleep under a spin lock, and the function call path is:
>>>>> netxen_nic_pci_mem_access_direct (acquire the lock by spin_lock)
>>>>>     ioremap -->  may sleep
>>>>>
>>>>> To fix it, the lock is released before "ioremap", and the lock is
>>>>> acquired again after this function.
>>>>>
>>>>> Signed-off-by: Jia-Ju Bai<baijiaju1990@163.com>
>>>> This style of change you are making is really starting to be a
>>>> problem.
>>>>
>>>> You can't just drop locks like this, especially without explaining
>>>> why it's ok, and why the mutual exclusion this code was trying to
>>>> achieve is still going to be OK afterwards.
>>>>
>>>> In fact, I see zero analysis of the locking situation here, why
>>>> it was needed in the first place, and why your change is OK in
>>>> that context.
>>>>
>>>> Any locking change is delicate, and you must put the greatest of
>>>> care and consideration into it.
>>>>
>>>> Just putting "unlock/lock" around the sleeping operation shows a
>>>> very low level of consideration for the implications of the change
>>>> you are making.
>>>>
>>>> This isn't like making whitespace fixes, sorry...
>>> We already tried to explain this to Jia-Ju during review of a wireless
>>> patch:
>>>
>>> https://patchwork.kernel.org/patch/9756585/
>>>
>>> Jia-Ju, you should listen to feedback. If you continue submitting random
>>> patches like this makes it hard for maintainers to trust your patches
>>> anymore.
>>>
>> Hi,
>>
>> I am quite sorry for my incorrect patches, and I will listen carefully
>> to your advice. In fact, for some bugs and patches which I have
>> reported before, I have not received the feedback of them, so I resent
>> them a few days ago, including this patch.
> Yeah, it is likely that some of your reports will not get any response.
> For that I only suggest being persistent and providing more information
> about the issue and suggestions how it might be possible to fix it. Also
> Dan Carpenter (Cced) might have some suggestions.
>
> But trying to "fix" it by just silencing the warning without proper
> analysis is totally the wrong approach, you do more harm than good.
>
> What tool do you use to find these issues? Is it publically available?
>

Hi,

Thanks a lot for your advice. And I am very glad to see that you may be 
interested in my work :)
This static tool is written by myself, instead of using or improving 
existing tools. A reason why I write it is that I have encountered some 
sleep-in-atomic bugs in my driver development :( .
However, due to preliminary implementation, this tool still has some 
limitations which can produce some false positives or negatives, and it 
may be not very easy to use. Thus, I am still improving this tool, 
checking more code and collecting results now. By the way, I apologize 
again for my incorrect patches of trying to "fix" the detected bugs.
In fact, I am very glad to make this tool available to effectively and 
conveniently check more system code. After I finish the improvements and 
perform more evaluation, I will make it publicly available.
If you have any suggestion or comment on my work, please feel free to 
contact me :)

Thanks,
Jia-Ju Bai
Bo YU June 21, 2017, 5:44 p.m. UTC | #6
Hi,
On Wed, Jun 21, 2017 at 02:33:03PM +0800, Jia-Ju Bai wrote:
>On 06/21/2017 02:11 PM, Kalle Valo wrote:
>>David Miller<davem@davemloft.net>  writes:
>>
>>>From: Jia-Ju Bai<baijiaju1990@163.com>
>>>Date: Mon, 19 Jun 2017 10:48:53 +0800
>>>
>>>>The driver may sleep under a spin lock, and the function call path is:
>>>>netxen_nic_pci_mem_access_direct (acquire the lock by spin_lock)
>>>>   ioremap -->  may sleep
>>>>
>>>>To fix it, the lock is released before "ioremap", and the lock is
>>>>acquired again after this function.
>>>>
>>>>Signed-off-by: Jia-Ju Bai<baijiaju1990@163.com>
>>>This style of change you are making is really starting to be a
>>>problem.
>>>
>>>You can't just drop locks like this, especially without explaining
>>>why it's ok, and why the mutual exclusion this code was trying to
>>>achieve is still going to be OK afterwards.
>>>
>>>In fact, I see zero analysis of the locking situation here, why
>>>it was needed in the first place, and why your change is OK in
>>>that context.
>>>
>>>Any locking change is delicate, and you must put the greatest of
>>>care and consideration into it.
>>>
>>>Just putting "unlock/lock" around the sleeping operation shows a
>>>very low level of consideration for the implications of the change
>>>you are making.
>>>
>>>This isn't like making whitespace fixes, sorry...
>>We already tried to explain this to Jia-Ju during review of a wireless
>>patch:
>>
>>https://patchwork.kernel.org/patch/9756585/
>>
>>Jia-Ju, you should listen to feedback. If you continue submitting random
>>patches like this makes it hard for maintainers to trust your patches
>>anymore.
>>
>Hi,
>
>I am quite sorry for my incorrect patches, and I will listen carefully
>to your advice.
>In fact, for some bugs and patches which I have reported before, I
>have not received the feedback of them, so I resent them a few days
>ago, including this patch.
>Sorry for my mistake again.

Once your patch be accepted, maintainer will reply you by mail sent by
automatic or themselves.But for your patch(es),i think most of them will
be dropped silently, because (un)lock related operations is very
criticality, especially in kernel code. Maintainers will not accept
unsafe (un)lock code.

Best Regards
>
>Thanks,
>Jia-Ju Bai
>
Dan Carpenter June 22, 2017, 6:08 a.m. UTC | #7
We should probably add a might_sleep() to ioremap() to prevent these
bugs in the future.

This bug is eight years old.  You can report it, but it's going to hard
to get anyone to fix it.  I sometimes ignore ancient bugs.  On the other
hand, netxen is fairly well supported so it doesn't hurt to try.

I try to report bugs as soon as they are introduced.  I report it to
the author and CC the relevant list.  If people don't respond to my
email after a month then I complain again.

regards,
dan carpenter
Jia-Ju Bai June 22, 2017, 10:52 a.m. UTC | #8
On 2017/6/22 14:08, Dan Carpenter wrote:
> We should probably add a might_sleep() to ioremap() to prevent these
> bugs in the future.
I think it is right to do this.
And it will be very useful to summarize common kernel interface 
functions which may sleep into a list. When writing a new driver, the 
developer can refer to this list to reduce or avoid sleep-in-atomic bugs.

>
> This bug is eight years old.  You can report it, but it's going to hard
> to get anyone to fix it.  I sometimes ignore ancient bugs.  On the other
> hand, netxen is fairly well supported so it doesn't hurt to try.
>
> I try to report bugs as soon as they are introduced.  I report it to
> the author and CC the relevant list.  If people don't respond to my
> email after a month then I complain again.
>
> regards,
> dan carpenter
>

Thanks for your helpful advice.

Thanks,
Jia-Ju Bai
diff mbox

Patch

diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c
index a996801..5ea553e 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c
@@ -1419,7 +1419,9 @@  static u32 netxen_nic_io_read_2M(struct netxen_adapter *adapter,
 
 		mem_base = pci_resource_start(adapter->pdev, 0) +
 					(start & PAGE_MASK);
+		spin_unlock(&adapter->ahw.mem_lock);
 		mem_ptr = ioremap(mem_base, PAGE_SIZE);
+		spin_lock(&adapter->ahw.mem_lock);
 		if (mem_ptr == NULL) {
 			ret = -EIO;
 			goto unlock;