[리눅스 커널] PM - Runtime PM 1

Linux/kernel 2023. 8. 7. 19:28

글의 참고

- https://lwn.net/Articles/347573/

- https://elinux.org/images/0/08/ELC-2010-Hilman-Runtime-PM.pdf

- https://www.kernel.org/doc/html/v4.14/driver-api/pm/devices.html

- https://www.kernel.org/doc/Documentation/power/runtime_pm.txt

- http://www.wowotech.net/pm_subsystem/rpm_overview.html

- https://elixir.bootlin.com/linux/latest/source/include/linux/pm_runtime.h

- https://elixir.bootlin.com/linux/latest/source/drivers/base/power/runtime.c

- Mastering Linux Device Driver Development: Write custom device drivers to support computer peripherals in Linux operating systems by John Madieu

글의 전제

- 내가 글을 쓰다가 궁금한 점은 파란색 볼드체로 표현했다. 나도 모르기 때문에 나중에 알아봐야 할 내용이라는 뜻이다.

- 밑줄로 작성된 글은 좀 더 긴 설명이 필요해서 친 것이다. 그러므로, 밑 줄 처친 글이 이해가 안간다면 링크를 따라서 관련 내용을 공부하자.

글의 내용

- Overview

" Runtime Power Management(RPM) 은 System Power Management(SPM) 보다 SW 측면에서 훨씬 유연함을 제공해준다. RPM 은 시스템 전체가 SLEEP 에 들어가는게 아니라, 시스템안에 존재하는 각 장치들이 특정 조건이 맞춰지면 개별적으로 SLEEP에 들어간다. 그렇기 때문에 소프트웨어에서 요구하는 기본적인 높은 응집력 & 낮은 결합력을 충족시키는 기술이라고 할 수 있다. 그래서 많은 driver engineers 들은 SPM 보다는 RPM 을 더 선호한다.

- Two Models for Device Power Management

: 드라이버는 아래의 2가지 모드를 모두 사용하거나 혹은 하나만 사용해서 디바이스를 저전력 모드로 전환시킨다.

1. System Sleep model :
" 시스템 전체가 sleep 상태로 들어가는 것을 의미한다. `suspend-to-RAM` 으로 알려져 있는 system suspend, `suspend-to-DISK` 로 알려져 있는 hibernation 과 `suspend-to-idle` 로 알려져 있는 system idle 등이 있다.

" 이 방식은 모든 device driver, bus driver, class driver 들의 suspend/resume callback 함수가 호출된다. 즉, 모든 드라이버들은 각자의 역할을 가지고 있고, 드라이버들 마다 suspend/resume 안에 내용은 모두 다르다. 이 드라이버들이 모두 협력하여 진행해야 system sleep이 되는거다. 이 때 중요한 것은 데이터 손실없이 깔끔하게 power-down과 reactive가 되어야 한다는 것이다. 즉, suspend시에는 save context를 진행하고, resume시에는 restore context를 진행하여 이전 작업들을 그대로 복원시켜야 한다는 것이다.

" 일부 디바이스들은 시스템을 wakeup 시킬 수 있는 기능을 가지고 있다. 이런 디바이스들은 `/sys/devices/.../power/wakeup` 파일에 `enabled`이라고 작성된다. 즉, `시스템을 wakeup 시킬 수 있는 능력을 가진 디바이스`라는 뜻이다.

2. Runtime Power Management model:

" 시스템은 동작하고 있는데, 즉, 시스템 파워 상태와는 별개로 디바이스 하나만 low-power state로 전환시킬 수 있다. 이걸 `Runtime PM`이라고 한다. 그러나, 서로 연관이 있는 장치들끼리에서는 개별적으로 low-power 모드로 전환될 수 없다. 즉, 모든 하위(자식) 디바이스들이 low-power가 아니면, 상위(부모) 디바이스는 low-power 모드로 전환되지 않는다는 것이다. 그리고 부모 디바이스 뿐만 아니라, 디바이스가 붙어있는 버스도 `low-power` 모드에 관련이 있다.

" 그리고 system-wide power trasitions(suspend or hibernation) 중에 특정 디바이스가 이미 runtime PM을 통해 low-power 라면 추가적인 코드 작업이 더 필요할 수도 있다. 예를 들어, runtime PM 디바이스들은 `usage count`를 기반으로 파워 상태가 변경되는데, `system suspend`시에 `usage count`가 0이 아니라면 어떻게 할것인가?

" Rumtime PM에서도 System PM과 같이, 각각의 드라이버들(device, bus, class)은 자신만의 suspend/resume을 수행한다. 이 때 중요한 것은 데이터 손실없이 깔끔하게 power-down과 reactive가 되어야 한다는 것이다. 즉, suspend시에는 save context를 진행하고, resume시에는 restore context를 진행하여 이전 작업들을 그대로 복원시켜야 한다는 것이다.

" 재미있는 것은 많은 수의 디바이스들이 런타임에 low-power 모드로 전환을 하게되면, 이건 시스템이 suspend 되었을 때와 동일 전류 소모를 보일 수 있다는 것이다. 즉, 그만큼 전력 소모는 개별 디바이스에 따른 전력 소모를 얼마나 세세하게 다루는 냐가 중요하다는 말이다.

" 일반적으로 디바이스가 `suspend` 되었다는 것은 외부 인터럽트를 아예 받지 못한다는 것을 의미한다(wakeup event를 예외). 그런데, 슬립에 들어간다는 것이 생각처럼 단순한 작업이 아니다. 만약, 슬립 요청을 받은 디바이스가 제일 하위에 있는 디바이스라면 바로 슬립에 들어갈 수 있겠지만, `브릿지`, `버스` 같은 디바이스라면 하위 디바이스들이 전부 슬립에 들어가지 않는 이상 슬립에 들어가지 않는다.

" 하드웨어 wakeup events 예시는 다음과 같다.

1" alarm from a real time clock. -> 예를 들어, 핸드폰 알람. RTC는 시스템에 전원 공급이 안되더라도 시간을 계속 트래킹 해야하기 때문에, 백업 배터리로 동작을 하게 된다. 평균 3 ~ 5년 정도 보장을 해준다고 하는 것 같다.

2" network wake-on-LAN packets. -> 예를 들어, 차량 원격 시동. `LAN`인데 왠 원격 시동? 차량에 고속 통신 프로토콜에 이더넷이 사용된다. 즉, 차량 모뎀이 무선에서 원격 패킷을 받아서, 차량 내 이더넷 라인을 통해 주변 전장 기기들을 웨이크-업시킨다. 즉, 시동을 건다.

3" keyboard or mouse activity -> 예를 들어, 데스크탑 절전 모드에서 마우스 움직이거나 엔터 누르면 화면 다시 켜지는게 이 상황.

4" media insertion or removal (for PCMCIA, MMC/SD, USB, and so on). -> 예를 들어, 핸드폰 가만히 내비두다가 충전하려고 USB 꽂으면 화면 다시 켜지는 것처럼. 사실, 핸드폰은 절대 잠에 들지 않는다. 핸드폰 같은 경우는 AP와 MODEM이 SoC에 같이 들어가있다. 여기서 SoC 자체가 슬립에 드는 경우는 아마도 없을것이다. 여기서 AP는 금방 슬립에 들어가지만, MODEM은 거의 슬립에 안들어간다고 봐도 무방하다. MODEM의 슬립에 기준이 조금 애매할 수 있지만, MODEM이 아예 SUSPEND 되는 경우는 없다고 보는게 무방하다. MODEM은 `LOW-POWER 모드` 같은게 존재한다. `DRX 사이클`을 구글에 검색해보자.

- Each sysfs control file on two models

1. /sys/devices/.../power/wakeup with System Power Management

" 모든 디바이스들은 system wakeup events(하드웨어 시그널을 통해 강제로 system을 wakeup을 컨트롤할 수 있다) 몇 가지 중요한 함수와 변수들을 알아보자. device_set_wakeup_capable() 함수를 통해 /sys/devices/.../power/wakeup 파일을 초기화할 수 있다.

" struct dev_pm_info.can_wakeup : 해당 디바이스가 물리적 시스템을 wakeup 시킬 수 있는지 여부를 알려준다. `device_set_wakeup_capable()` 함수는 `struct dev_pm_info.can_wakeup` 변수의 값을 설정한다.

" struct dev_pm_info.wakeup : `struct wakeup_source` 를 참조하는 변수다. 이 변수를 통해서 PM Core 에게 wakeup event 를 보내서 시스템을 wakeup 시킬 수 있다. 이 변수는 오직 시스템을 wakeup 시킬 수 있는 디바이스들에게만 할당되어 있다.

" device_may_wakeup : `struct dev_pm_info.wakeup` 가 존재하고, `power/wakeup` 파일이 `enabled` 일 때만, true 를 반환한다.

" `wakeup-capable device` 가 시스템을 wakeup 시킬 지 말지는 정책에 따라 다르다. 대개, 이 정책이 유저 스페이스의 sysfs 를 통해 관리된다. 이 방식은 `wake_lock` 보다는 간접적인 방식이다. wake_lock 같은 경우는 직접적으로 유저 스페이스에서 시스템을 깨우는 기회를 제공하지만, `echo enabled > power/wakeup` 은 시스템을 직접적으로 wake-up 을 시키지는 않는다. 즉,

device driver 에서 실제 시스템을 깨우는 코드가 있긴 하지만, 이 코드의 활성화 여부를 유저 스페이스에서 책임진다는 것이다.

2. /sys/devices/.../power/control with Runtime Power Management

" 이 파일은 사용자가 특정 디바이스의 Runtime PM 의 동작 여부를 설정할 수 있게 한다. 예를 들어, 사용자가 이 파일에 `auto` 을 쓰면, 내부적으로 pm_runtime_allow() 함수가 호출되서 해당 디바이스의 Runtime PM 을 활성화 시킨다. rpm_drop_usage_count() 함수의 자세한 내용은 이 글을 참고하자. 이 함수가 양수를 반환하면, device 는 running 상태임을 나타낸다. 만약, 0 을 반환하면, dev->power.usage_count 가 0 이라는 뜻이고, 현재 해당 device 를 사용하는 곳이 없으므로, Runtime IDLE 로 들어간도 된다는 것을 의미한다(`rpm_idle()` 함수 호출).

// drivers/base/power/runtime.c - v6.5
/**
 * pm_runtime_allow - Unblock runtime PM of a device.
 * @dev: Device to handle.
 *
 * Decrease the device's usage count and set its power.runtime_auto flag.
 */
void pm_runtime_allow(struct device *dev)
{
	int ret;

	....

	dev->power.runtime_auto = true;
	ret = rpm_drop_usage_count(dev);
	if (ret == 0)
		rpm_idle(dev, RPM_AUTO | RPM_ASYNC);
	else if (ret > 0)
		trace_rpm_usage(dev, RPM_AUTO | RPM_ASYNC);

 	....
 }

" `on` 을 쓰면, 내부적으로 pm_runtime_forbid() 함수가 호출되서 해당 디바이스가 full-power mode 로 동작하게 한다(만약, device 가 이 시점에 low-power mode 있으면, full-power mode 만든다). 그리고, 해당 device`s RPM 을 disabled 한다(dev->power.runtime_auto = false).

// drivers/base/power/runtime.c - v6.5
/**
 * pm_runtime_forbid - Block runtime PM of a device.
 * @dev: Device to handle.
 *
 * Increase the device's usage count and clear its power.runtime_auto flag,
 * so that it cannot be suspended at run time until pm_runtime_allow() is called
 * for it.
 */
void pm_runtime_forbid(struct device *dev)
{
	spin_lock_irq(&dev->power.lock);
	if (!dev->power.runtime_auto)
		goto out;

	dev->power.runtime_auto = false;
	atomic_inc(&dev->power.usage_count);
	rpm_resume(dev, 0);

 out:
	spin_unlock_irq(&dev->power.lock);
}

" `/power/control` 파일은 디바이스의 `struct dev_pm_info.runtime_auto` 플래그와 매핑되어 있다. 이 말은 pm_runtime_allow() 함수를 호출하면, `.runtime_auto` 플래그를 SET(1) 하고, pm_runtime_forbid() 함수를 호출하면, `runtime_auto(0)` 플래그가 CLEAR 된다는 소리다.

unsigned int runtime_auto;

- if set, indicates that the user space has allowed the device driver to power manage the device at run time via the `/sys/devices/.../power/control` interface; it may only be modified with the help of the `pm_runtime_allow()` and `pm_runtime_forbid()` helper functions

- 참고 : https://www.kernel.org/doc/Documentation/power/runtime_pm.txt

" `.runtime_auto` 가 CLEAR 면, 해당 디바이스는 full-power mode 로 동작한다고 했는데, 이 디바이스는 system wide-sleep 상황에서도 sleep 에 안들어가고 full-power mode 로 동작할까? 그렇지 않다. 왜냐면, `rumtime_auto` 플래그는 `system wide-sleep` 에 영향을 미치지 않는다. 그래서, 유저가 만약에 "나는 시스템이 sleep 들어가는게 싫으니깐 A 디바이스의 control 파일을 `on` 으로 설정해야 겠다" 라고 생각했다면, 이건 예상대로 동작하지 않을 것이다.

The device’s `runtime_auto` flag has no effect on the handling of system-wide power transitions. In particular, the device can (and in the majority of cases should and will) be put into a low-power state during a system-wide transition to a sleep state even though its `runtime_auto` flag is clear.

- 참고 : https://www.kernel.org/doc/html/v4.19/driver-api/pm/devices.html

- Calling Sequence Gurantees

" `System PM suspend` 과정은 `bottom-up` 순서로 진행된다. 왜? 위에서도 계속 언급했지만, PCI 버스에 3개의 디바이스가 붙어있다. 그런데, 여기서 PCI 버스 드라이버가 먼저 `suspend` 됬다. 이렇면, 어떻게 될까? 당연히, PCI 버스에 붙어있던 3개의 디바이스들과 통신을 못학 된다. 왜? 버스가 슬립인데, 어떻게 버스에 연결되어 있는 디바이스들과 통신을 할 수 있나? 그런데, `resume`은 그 반대 과정으로 동작한다. 즉, `top-down` 순서로 `resume` 된다. 왜? 버스가 정상적으로 동작해야 버스에 붙어 있는 디바이스들과 통신할 수 있을 것 아닌가.

When the system goes into a sleep state, each device’s driver is asked to suspend the device by putting it into a state compatible with the target system state. That’s usually some version of “off”, but the details are system-specific. Also, wakeup-enabled devices will usually stay partly functional in order to wake the system.
...

To ensure that bridges and similar links needing to talk to a device are available when the device is suspended or resumed, the device hierarchy is walked in a bottom-up order to suspend devices. A top-down order is used to resume those devices.

The ordering of the device hierarchy is defined by the order in which devices get registered: a child can never be registered, probed or resumed before its parent; and can’t be removed or suspended after that parent.

The policy is that the device hierarchy should match hardware bus topology. [Or at least the control bus, for devices which use multiple busses.] In particular, this means that a device registration may fail if the parent of the device is suspending (i.e. has been chosen by the PM core as the next device to suspend) or has already suspended, as well as after all of the other devices have been suspended. Device drivers must be prepared to cope with such situations.

- 참고 : https://www.kernel.org/doc/html/v4.19/driver-api/pm/devices.html

" 이것도 당연한 얘기겠지만, 자식 디바이스들은 부모 디바이스들의 `register`, `probe`, `resume` 콜백 함수가 호출 되기 전에는 먼저 앞에 열거 된 콜백 함수들이 호출될 수 없다. 왜? 부모보다 자식이 먼저 살아있을 수 없기 때문이다. 앞 에 나열된 함수들은 다시 active 되는 함수들이다. 그래서 부모가 반드시 먼저 호출되어야 한다. `remove`, `suspend` 콜백은 반드시 자식이 먼저 호출되어야 한다. 부모에서 `remove`, `suspend` 가 먼저 호출되면, 자식한테 `remove`, `suspend` 함수들이 호출될 기회가 없어지기 때문이다.

" 리눅스 커널에서는 논리적인 `device hierarchy`가 실제 물리적 구조인 `bus topoloty`와 동일하게 맞추는 것을 추천한다. 어려워 보일 것 같지만, 디바이스 트리를 통해 하드웨어 물리 구조를 작성하면 커널이 알아서 해당 구조를 부팅 시점에 파싱해서 알아서 논리적인 `device hierarchy`를 만든다.

- RPM Callbacks

" RPM도 결국 `struct dev_pm_ops`에 선언되어 있다. `struct dev_pm_ops` 구조체에서 제공하는 콜백 함수들을 통해서 모든 파워 컨트롤 디바이스들은 단순히 기능만 구현하고 해당 함수들이 호출되는 상황 및 흐름은 커널에게 맡기게 된다.

/include/linux/pm.h - v6.2.2
struct dev_pm_ops {
    ...
	int (*runtime_suspend)(struct device *dev);
	int (*runtime_resume)(struct device *dev);
	int (*runtime_idle)(struct device *dev);
};

" 리눅스 커널에서는 런타임 콜백을 쉽게 등록할 수 있게 하기 위해 헬퍼 매크로들을 제공하고 있다(`SET_RUNETIME_PM_OPS`, `DEFINE_RUNTIME_DEV_PM_OPS`).

//drivers/crypto/gemini/sl3516-ce-core.c - v6.5
static const struct dev_pm_ops sl3516_ce_pm_ops = {
	SET_RUNTIME_PM_OPS(sl3516_ce_pm_suspend, sl3516_ce_pm_resume, NULL)
};
...
//include/linux/pm.h - v6.5
#define RUNTIME_PM_OPS(suspend_fn, resume_fn, idle_fn) \
	.runtime_suspend = suspend_fn, \
	.runtime_resume = resume_fn, \
	.runtime_idle = idle_fn,
...
...

#define SET_RUNTIME_PM_OPS(suspend_fn, resume_fn, idle_fn) \
	RUNTIME_PM_OPS(suspend_fn, resume_fn, idle_fn)

" 간단히 RPM 3개의 콜백에 대해 알아보자.

1. runtime_suspend
" runtime_suspend() 콜백 함수는 실제 하드웨어 디바이스가 suspended state 가 되어야 한다는 것을 의미하지는 않는다. 이 함수의 진짜 목적은 이 함수를 호출한 디바이스가 시스템에 존재하는 다른 디바이스들과(i.e CPU or Memory) 와 더 이상 통신을 하지 못하도록 하는데 있다[참고1].

2. runtime_resume
" runtime_suspend() 콜백 함수에서 했던 일을 반대로 한다고 생각하면 된다. 즉, 디바이스가 다시 정상적으로 동작할 수 있게 power-up 을 하거나, register 를 복구하거나 하는 등의 작업을 진행한다.

3. runtime_idle
" device usage counter 가 0이거나, active childrens 가 없으면 호출된다. 그러나, 사실 이 콜백 함수는 거의 사용되지 않는다. 사용하더라도 대개는 runtime_idle 함수안에서 runtime_suspend, pm_schedule_suspend, pm_runtime_autosuspend 함수들 중에서 상황에 맞게 선택해서 호출한다. runetime_idle 콜백이 등록 자체가 되어있지 않으면, 즉각적으로, runtime_suspend 콜백이 호출된다.

- RPM 동작

" PM Core는 기본적으로 각 디바이스의 2가지 카운터를 계속해서 모니터링한다.

power.usage_count" 해당 디바이스가 외부에서 얼마나 사용되는지를 나타낸다. 흔히, `file open API`로 해당 디바이스를 열고 있다던가, 혹은 내부적으로 일정 기간동안 해당 디바이스를 active 상태로 두는 경우가 있다.

power.child_count" 해당 디바이스의 자식들중에서 active 되어있는 자식의 개수를 의미한다.

" PM Core는 위의 2개의 카운터를 체크해서 해당 디바이스를 `active / idle` 로 전환시킬지를 결정하게 된다. 만약, 2가지 조건이 모두 충족되면 `idle notification`이 발생해서 PM Core는 `runtime_idle` 콜백을 호출하게 된다. `idle notification` 발생과정은 다음과 같다.

1" `idle notification` 발생할 상황이 되면, PM 코어는 `dev->power.idle_notification`을 true로 세팅한다. 그리고 `bus type/class/device`의 .runtime_idle 콜백을 호출한다. 그리고 다시 `dev->power.idle_notificaation`을 false로 재설정한다. 사실 .runtime_idle 콜백의 주목적은 해당 디바이스가 지금 suspend 상태로 들어가도 되는지를 판별한다.

2" 만약, 해당 디바이스의 .runtime_idle 콜백이 등록되어 있지 않거나 .runtime_idle 콜백이 `0`을 반환하면 PM 코어는 즉각적으로 .runtime_suspend 콜백을 호출해서 해당 디바이스를 suspend 상태로 보내버린다. 그 후에, `dev->power.runtime_status`를 RPM_SUSPENDED로 설정한다. 이렇게 되면, 해당 디바이스가 SUSPENDED 상태라는 것을 의미한다.

- RPM 초기화

" 초기 시점에는 모든 디바이스들의 RPM은 disabled 되어있다. RPM은 디바이스마다 별도 enable을 해줘야 하는 구조를 가지고 있다. 그래서, RPM을 enable 시키고 싶은 디바이스는 `pm_runtime_enable` 함수를 통해서 RPM을 enable 시켜야 한다. 그런데, 초기에 디바이스의 RPM을 활성화 시키는데도 약간의 주의 사항이 있다.

/drivers/base/power/runtime.c - v6.2.2
void pm_runtime_enable(struct device *dev)
{
	unsigned long flags;

	spin_lock_irqsave(&dev->power.lock, flags);

	if (!dev->power.disable_depth) {
		dev_warn(dev, "Unbalanced %s!\n", __func__);
		goto out;
	}

	if (--dev->power.disable_depth > 0)
		goto out;

	dev->power.last_status = RPM_INVALID;
	dev->power.accounting_timestamp = ktime_get_mono_fast_ns();

	if (dev->power.runtime_status == RPM_SUSPENDED &&
	    !dev->power.ignore_children &&
	    atomic_read(&dev->power.child_count) > 0)
		dev_warn(dev, "Enabling runtime PM for inactive device with active children\n");

out:
	spin_unlock_irqrestore(&dev->power.lock, flags);
}
EXPORT_SYMBOL_GPL(pm_runtime_enable);

" 일단, 제일 먼저 `pm_runtime_set_active` 함수를 통해 디바이스의 RPM을 RPM_ACTIVE 상태로 변경한다. 그리고 나서, `pm_runtime_get_noresume` 함수를 통해 디바이스의 usage count를 증가시킨다. 그 후에 `pm_runtime_enable` 함수를 호출해서 최종적으로 디바이스의 RPM을 활성화시킨다.

" 그런데, 초기 시점에 왜 `pm_runtime_get_noresume` 함수를 사용할까? `pm_runtime_get` 함수를 호출하면 안되나 ? 만약, `pm_runtime_get` 함수를 호출하면 아예 아무 동작도 못할 수 가 있다. 왜냐면, RPM의 모든 헬퍼 함수들이 `power.disable_depth` 변수를 기준으로 동작하기 때문이다. 이 값이 1이상이면 RPM 함수들은 즉각적으로 에러를 리턴해버린다. 이 값을 0으로 만들어주는 함수가 `pm_runtime_enable` 함수다. 그런데, `pm_runtime_enable` 함수를 `pm_runtime_get` 함수앞쪽에서 호출하면 바로 SUSPENDED 상태로 진입해버릴 것이다. 그래서 나온 것이 `pm_runtime_get_noresume` 함수다. 아래 코드를 보면 알겠지만, 이 함수는 `power.disable_depth`의 영향을 받지 않고 디바이스의 usage count를 증가시킨다. 이러한 이유로 RPM 초기 시점에 `pm_runtime_get_noresume` 함수를 사용하는 것이다(`__pm_runtime_resume` 함수도 내부적으로 power.disable_depth`의 영향을 받는다).

//include/linux/pm_runtime.h - v6.5
/**
 * pm_runtime_get - Bump up usage counter and queue up resume of a device.
 * @dev: Target device.
 *
 * Bump up the runtime PM usage counter of @dev and queue up a work item to
 * carry out runtime-resume of it.
 */
static inline int pm_runtime_get(struct device *dev)
{
	return __pm_runtime_resume(dev, RPM_GET_PUT | RPM_ASYNC);
}

/**
 * pm_runtime_get_sync - Bump up usage counter of a device and resume it.
 * @dev: Target device.
 *
 * Bump up the runtime PM usage counter of @dev and carry out runtime-resume of
 * it synchronously.
 *
 * The possible return values of this function are the same as for
 * pm_runtime_resume() and the runtime PM usage counter of @dev remains
 * incremented in all cases, even if it returns an error code.
 * Consider using pm_runtime_resume_and_get() instead of it, especially
 * if its return value is checked by the caller, as this is likely to result
 * in cleaner code.
 */
static inline int pm_runtime_get_sync(struct device *dev)
{
	return __pm_runtime_resume(dev, RPM_GET_PUT);
}

...
...

/**
 * pm_runtime_get_noresume - Bump up runtime PM usage counter of a device.
 * @dev: Target device.
 */
static inline void pm_runtime_get_noresume(struct device *dev)
{
	atomic_inc(&dev->power.usage_count);
}

" 위 3개의 함수에 대한 차이점을 알고가자.

pm_runtime_get" 비동기다. 즉, RESUME 조건 체크 + RESUME 조건이 만족하면, PM 코어에게 RESUME 작업을 요청을 하고 자기 할일을 한다.

pm_runtime_get_sync" 동기다. 즉, RESME 조건 체크 + RESUME 조건이 만족하면, 해당 시점에서 반드시 처리를 완료하고 복귀한다.

pm_runtime_get_noresume" 위의 함수들은 모두 실제 RESUME 동작을 트리거시키는 함수들이다. 그러나, 이 함수는 RESUME 조건을 체크하지 않고, 단순히 usage count 만 증가시킨다. 즉, 직접적으로 PM 코어에게 RESUME 작업을 처리하거나 요청하지 않는다.

- RPM 부모 / 자식

" active children은 usage count와 마찬가지로 RPM의 idle notification의 발생 여부에 큰 영향을 미친다. 일반적으로 하위 디바이스에 접근하기 위해서는 반드시 상위 디바이스에 먼저 액세스 해야 한다. 그렇기 때문에, 하위 디바이스에 액세스하는 시점에 상위 디바이스가 idle 이라고해서 절전 모드 상태로 보내는 것은 바람직하지 못하다.

" `pm_suspend_ignore_children` 함수는 부모 디바이스가 자식 디바이스 때문에 절전 모드에 들어가지 못하게 하는 것을 막는다. 즉, 자식 디바이스중 active 디바이스가 있더라도 부모 디바이스가 SUSPEND 상태로 진입하게 해준다.

//include/linux/pm_runtime.h - v6.5
/**
 * pm_suspend_ignore_children - Set runtime PM behavior regarding children.
 * @dev: Target device.
 * @enable: Whether or not to ignore possible dependencies on children.
 *
 * The dependencies of @dev on its children will not be taken into account by
 * the runtime PM framework going forward if @enable is %true, or they will
 * be taken into account otherwise.
 */
static inline void pm_suspend_ignore_children(struct device *dev, bool enable)
{
	dev->power.ignore_children = enable;
}

" `pm_suspend_ignore_children` 함수는 내부적으로 `power.ignore_children` 변수를 설정한다. 해당 변수의 내용은 아래와 같다.

unsigned int ignore_children" 이 값이 1로 설정되면, 부모 디바이스와 자식 디바이스는 독립적으로 Runtime PM을 동작시킬 수 있다. 좀 더 코드적으로 얘기하면 위의 dev->power.child_count는 무시된다(그러나 child_count는 값은 계속 updated 된다). 아래 코드에서 `!dev->power.ignore_children` 부분을 잘 봐야한다. 모든 디바이스는 자식이면서 부모일 수 있다.

- RPM 동기 / 비동기

" PM 코어는 RPM 동작 관련해서 `동기` 및 `비동기` 작업이 가능하다. RPM에서 제공해주는 동기 함수들은 즉각적으로 처리가 되는 것을 의미한다. 비동기 함수들은 즉각적인 처리가 아닌, PC 코어에게 작업을 request 하면, PM 코어는 해당 요청을 스케줄링을 통해서 언젠가 근 미래에 실행하게 된다. 즉, `비동기`는 언제 실행될 지 알 수 없는 상황이 되는 것을 의미한다. 그래서, 우리는 비동기 작업들에 대한 집중적으로 분석하려고 한다.

: PM 코어에서 RPM 비동기 작업을 처리하는 흐름은 다음과 같다.

1" PM 코어는 요청을 받으면, 제일 먼저 해당 디바이스의 `dev->power.request`를 설정한다. 이 값은 `enum rpm_request` 타입으로 정의가 되어있다. idle notification이면 `RPM_REQ_IDLE`, suspend request면 `RPM_REQ_SUSPEND`, autosuspend request면 `RPM_REQ_AUTOSUSPEND`를 설정한다. 나중에 PM 코어는 해당 요청을 처리할 수 있는 시점이 오면, `dev->power.request`를 확인해서 어떤 action handler를 실행시킬지를 결정한다.

2" PM 코어는 해당 요청을 지금 즉시 처리할 수 없으니 `dev->power.request_pending`을 true로 설정한다

3" PM 코어는 앞에서 진행한 요청의 설정 작업들을 끝나면, 이제 해당 요청을 스케줄링 할 수 있도록 해야한다. PM 코어는 `dev->power.work`를 PM 코어가 전역적으로 관리하는 wq에 집어넣는다. 이 시점부터 스케줄링이 시작되는 것이다. (`pm_runtime_init`에서 PM 코어가 디바이스들의 work들을 dispatch 하기 위해서 `pm_runtime_work`라는 핸들러를 PM 워크큐에 등록한다)

4" 해당 디바이스가 요청한 작업(요청)을 처리해야 할 시점이 되면, 즉, 스케줄링을 통해 이제 작업을 처리해야 할 타이밍이 되면, PM Core는 `pm_wq`에서 처리해야 할 `work`를 꺼내서 `pm_runtime_work` 함수에 전달한다.

" `rpm_idle`, `rpm_suspend`, `rpm_resume` 함수들은 RPM의 핵심 함수들이다. 이 함수들은 각 요청(`idle, suspend, resume`)에 따른 최종적으로 호출되는 함수들이다. 아래 코드를 보면, 비동기 요청이 올 경우 `dev->power.request`과 `dev->power.request_pending`을 세팅한다. 그리고, `queue_work` 함수를 통해서 PM Core가 관리하는 글로벌 워크-큐(`pm_wq`)에 요청한 작업(`dev->power.work`)을 푸쉬한다.

// drivers/base/power/runtime.c - v6.5
static int rpm_idle(struct device *dev, int rpmflags)
{
	....
	/* Carry out an asynchronous or a synchronous idle notification. */
	if (rpmflags & RPM_ASYNC) {
		dev->power.request = RPM_REQ_IDLE;
		if (!dev->power.request_pending) {
			dev->power.request_pending = true;
			queue_work(pm_wq, &dev->power.work);
		}
		trace_rpm_return_int(dev, _THIS_IP_, 0);
		return 0;
	}
    ....
}
....

static int rpm_suspend(struct device *dev, int rpmflags)
	__releases(&dev->power.lock) __acquires(&dev->power.lock)
{
	....
    /* Carry out an asynchronous or a synchronous suspend. */
	if (rpmflags & RPM_ASYNC) {
		dev->power.request = (rpmflags & RPM_AUTO) ?
		    RPM_REQ_AUTOSUSPEND : RPM_REQ_SUSPEND;
		if (!dev->power.request_pending) {
			dev->power.request_pending = true;
			queue_work(pm_wq, &dev->power.work);
		}
		goto out;
	}
    ....
}
....

static int rpm_resume(struct device *dev, int rpmflags)
	__releases(&dev->power.lock) __acquires(&dev->power.lock)
{
	....
    /* Carry out an asynchronous or a synchronous resume. */
	if (rpmflags & RPM_ASYNC) {
		dev->power.request = RPM_REQ_RESUME;
		if (!dev->power.request_pending) {
			dev->power.request_pending = true;
			queue_work(pm_wq, &dev->power.work);
		}
		retval = 0;
		goto out;
	}
    ....
}

: `pm_runtime_work` 함수는 비동기 Runtime PM 관련 작업 요청을 하면 호출되는 함수다. 이 함수는 Runtime PM Core가 사용하는 워크큐 디폴트 핸들러로 등록이 되어 있다.

// include/linux/pm_runtime.h - v6.5
#ifdef CONFIG_PM
extern struct workqueue_struct *pm_wq;
....
#endif /* CONFIG_PM */ 

//drivers/base/power/runtime.c - v6.5
/**
 * pm_runtime_work - Universal runtime PM work function.
 * @work: Work structure used for scheduling the execution of this function.
 *
 * Use @work to get the device object the work is to be done for, determine what
 * is to be done and execute the appropriate runtime PM function.
 */
static void pm_runtime_work(struct work_struct *work)
{
	struct device *dev = container_of(work, struct device, power.work);
	enum rpm_request req;

	spin_lock_irq(&dev->power.lock);

	if (!dev->power.request_pending) // --- 1
		goto out;

	req = dev->power.request; // --- 2
	dev->power.request = RPM_REQ_NONE;
	dev->power.request_pending = false;

	switch (req) { // --- 3
	case RPM_REQ_NONE:
		break;
	case RPM_REQ_IDLE:
		rpm_idle(dev, RPM_NOWAIT);
		break;
	case RPM_REQ_SUSPEND:
		rpm_suspend(dev, RPM_NOWAIT);
		break;
	case RPM_REQ_AUTOSUSPEND:
		rpm_suspend(dev, RPM_NOWAIT | RPM_AUTO);
		break;
	case RPM_REQ_RESUME:
		rpm_resume(dev, RPM_NOWAIT);
		break;
	}

 out:
	spin_unlock_irq(&dev->power.lock);
}
....

/**
 * pm_runtime_init - Initialize runtime PM fields in given device object.
 * @dev: Device object to initialize.
 */
void pm_runtime_init(struct device *dev)
{
	....
	INIT_WORK(&dev->power.work, pm_runtime_work);
	....
}

1. 비동기 작업을 요청할 경우, `dev->power.request_pending = true`로 설정된다. 이 시점에 `dev->power.request_pending`이 `false`라면, 해당 요청은 이미 처리된 것을 의미한다. 그러므로, 함수를 종료한다.

2. `dev->power.reuqest` / `dev->power.request_pending` 변수들은 비동기 RPM 요청이 있을 때만, 의미가 있다. 즉, 앞 에 값들이 모두 `0` 라면, 해당 디바이스에 대한 비동기 RPM 요청이 없다는 것을 뜻한다. 여기서 해당 값들을 초기화하는 이유는 이제 곧 요청을 처리할 것이기 때문이다.

3. 이전 비동기 작업을 요청할 때는, `rpm_xxx(yyy, RPM_ASYNC)` 형태로 함수를 호출했다. 그런데, `pm_runtime_work` 함수가 호출되었다는 것은 이제 진짜로 실행을 해야한다는 것을 의미한다. 그런데, 동일한 함수를 사용한다. 즉, `rpm_xxx(yyy, RPM_NOWAIT)` 형태의 함수를 사용하지만, 두 번째 인자에 전달되는 플래그 값이 다른 것을 확인할 수 있다. `RPM_NOWAIT`은 해당 요청을 즉각적으로 처리하겠다는 뜻이다.

: `pm_runtime_barrier` 함수는 인자로 전달된 디바이스에 대해 3가지 작업을 진행한다.

1. 해당 디바이스에 `pending request`가 있는데, 그 요청이 `RPM_REQ_RESUME`라면, 일단 디바이스를 `동기` 방식(즉각적)으로 RESUME 시킨다.

2. 그리고, 해당 디바이스 RPM 관련 모든 `pending requests`를 `pm_wq`에서 제거한다.

3. `pending request`는 하나도 없지만, 현재 진행중인 RPM 작업이 있을 수 있다. 이럴 경우, 작업이 끝날 때 까지 기다린다. 이 때, `wait queue` 방식을 이용한다.

// drivers/base/power/runtime.c - v6.5 
/**
 * __pm_runtime_barrier - Cancel pending requests and wait for completions.
 * @dev: Device to handle.
 *
 * Flush all pending requests for the device from pm_wq and wait for all
 * runtime PM operations involving the device in progress to complete.
 *
 * Should be called under dev->power.lock with interrupts disabled.
 */
static void __pm_runtime_barrier(struct device *dev)
{
	pm_runtime_deactivate_timer(dev);

	if (dev->power.request_pending) { // --- 2
		dev->power.request = RPM_REQ_NONE;
		spin_unlock_irq(&dev->power.lock);

		cancel_work_sync(&dev->power.work);

		spin_lock_irq(&dev->power.lock);
		dev->power.request_pending = false;
	}

	if (dev->power.runtime_status == RPM_SUSPENDING || // --- 3
	    dev->power.runtime_status == RPM_RESUMING ||
	    dev->power.idle_notification) {
		DEFINE_WAIT(wait);

		/* Suspend, wake-up or idle notification in progress. */
		for (;;) {
			prepare_to_wait(&dev->power.wait_queue, &wait,
					TASK_UNINTERRUPTIBLE);
			if (dev->power.runtime_status != RPM_SUSPENDING
			    && dev->power.runtime_status != RPM_RESUMING
			    && !dev->power.idle_notification)
				break;
			spin_unlock_irq(&dev->power.lock);

			schedule();

			spin_lock_irq(&dev->power.lock);
		}
		finish_wait(&dev->power.wait_queue, &wait);
	}
}


/**
 * pm_runtime_barrier - Flush pending requests and wait for completions.
 * @dev: Device to handle.
 *
 * Prevent the device from being suspended by incrementing its usage counter and
 * if there's a pending resume request for the device, wake the device up.
 * Next, make sure that all pending requests for the device have been flushed
 * from pm_wq and wait for all runtime PM operations involving the device in
 * progress to complete.
 *
 * Return value:
 * 1, if there was a resume request pending and the device had to be woken up,
 * 0, otherwise
 */
int pm_runtime_barrier(struct device *dev)
{
	int retval = 0;

	pm_runtime_get_noresume(dev);
	spin_lock_irq(&dev->power.lock);

	if (dev->power.request_pending // --- 1
	    && dev->power.request == RPM_REQ_RESUME) {
		rpm_resume(dev, 0);
		retval = 1;
	}

	__pm_runtime_barrier(dev);

	spin_unlock_irq(&dev->power.lock);
	pm_runtime_put_noidle(dev);

	return retval;
}
EXPORT_SYMBOL_GPL(pm_runtime_barrier);

: `pm_runtime_barrier` 함수는 언제 그리고 왜 사용할까? 이 함수는 `system-wide power trasition`이 발생할 때, 특정 디바이스의 동기화를 맞추기 위해 사용한다. 예를 들어, 특정 디바이스가 `system-wide suspend`를 진입하는 도중에 `RPM resume` 요청이 펜딩되어 있었다고 치자. 이럴 경우, `System PM`과 `Runtime PM`이 해당 디바이스에 대해 서로 `power operation`을 진행하기 위해 `race condition`이 발생할 수 있다. 그래서 이 문제를 해결하기 위해, 즉, `System PM`과 `Runtime PM` 에 동기화 문제를 해결하기 위해, RPM pending requests를 제거하게 된다.

- RPM Autosuspend

: 일반적으로 디바이스는 꾀나 오랜시간 동안 저전력 상태를 유지할 것이라고 판단할 만한 이유가 있는 경우에만 저전력 상태에 놓아야 한다. 이 말은 잠깐 동안 사용되지 않는 디바이스는 low-power로 진입시키지 않고, 그냥 사용되지 않는 채로 남겨둔다는 것이다. 왜 디바이스를 low-power 모드로 함부로 넣으면 안될까? low-power 상태와 full-power 상태 사이에서 너무 빠르게 `bouncing` 되는 것을 방지하기 위해서다. 이렇게 되면 디바이스가 고장날 수 도 있다.

: RPM에서 `autosuspend`는 자동 슬립을 의미하는게 아니다. autosuspend는 즉각적으로 suspend로 진입하기 보다는 일정 시간 inactive 상태를 유지한 뒤에 suspend로 진입하는 것을 의미한다. 좀 더 구체적으로 말하자면, `dev->power.suspend_timer`에 설정된 타이머가 설정된 시간이 초과되면, PM 코어에게 suspend request를 보낸다(`비동기`). autosuspend는 power on/off 시에 비용이 많으드는 디바이스에 아주 효과적이다.

: autosuspend를 사용하려면, 먼저 `pm_runtime_use_autosuspend` 함수를 호출해야 한다. 이 함수를 호출하면, `dev->power.use_autosuspend` 필드가 true가 된다.

///drivers/base/power/runtime.c - v6.5
/**
 * pm_suspend_timer_fn - Timer function for pm_schedule_suspend().
 * @timer: hrtimer used by pm_schedule_suspend().
 *
 * Check if the time is right and queue a suspend request.
 */
static enum hrtimer_restart  pm_suspend_timer_fn(struct hrtimer *timer)
{
	struct device *dev = container_of(timer, struct device, power.suspend_timer);
	unsigned long flags;
	u64 expires;

	spin_lock_irqsave(&dev->power.lock, flags);

	expires = dev->power.timer_expires;
	/*
	 * If 'expires' is after the current time, we've been called
	 * too early.
	 */
	if (expires > 0 && expires < ktime_get_mono_fast_ns()) {
		dev->power.timer_expires = 0;
		rpm_suspend(dev, dev->power.timer_autosuspends ?
		    (RPM_ASYNC | RPM_AUTO) : RPM_ASYNC);
	}

	spin_unlock_irqrestore(&dev->power.lock, flags);

	return HRTIMER_NORESTART;
}

...
...

/**
 * __pm_runtime_use_autosuspend - Set a device's use_autosuspend flag.
 * @dev: Device to handle.
 * @use: New value for use_autosuspend.
 *
 * Set the device's power.use_autosuspend flag, and allow or prevent runtime
 * suspends as needed.
 */
void __pm_runtime_use_autosuspend(struct device *dev, bool use)
{
	int old_delay, old_use;

	spin_lock_irq(&dev->power.lock);
	old_delay = dev->power.autosuspend_delay;
	old_use = dev->power.use_autosuspend;
	dev->power.use_autosuspend = use;
	update_autosuspend(dev, old_delay, old_use);
	spin_unlock_irq(&dev->power.lock);
}
EXPORT_SYMBOL_GPL(__pm_runtime_use_autosuspend);

...
...

/**
 * pm_runtime_init - Initialize runtime PM fields in given device object.
 * @dev: Device object to initialize.
 */
void pm_runtime_init(struct device *dev)
{
    ...
	dev->power.timer_expires = 0;
	hrtimer_init(&dev->power.suspend_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
	dev->power.suspend_timer.function = pm_suspend_timer_fn;
	...
}

: RPM 서브시스템을 초기화하는 시점에 RPM suspend timer를 등록한다. RPM suspend timer는 `pm_suspend_timer_fn` 함수가 맡고 있다. 내부적으로 `rpm_suspend` 함수를 호출하는 것을 볼 수 있다.

: `pm_runtime_put_autosuspend` 함수를 호출하면 타이머가 시작되는데, 이 때 `pm_runtime_set_autosuspend_delay` 함수에서 설정했던 timeout 값이 카운트된다. 이 값은 `dev->power.autosuspend_delay` 변수에 저장된다. 별개의 얘기로 `pm_schedule_suspend` 함수도 인자로 delay를 받는데, 이 때 타이머가 사용될 수 있다. 그런데, 주의 점이 있다. 원래 `dev->power.suspend_timer`는 기본적으로 `dev->power.autosuspend_delay`값을 사용하는데, `pm_schedule_suspend` 함수의 경우에는 두 번째 인자 `delay`가 타이머의 timeout 값으로 사용된다.

//include/linux/pm_runtime.h - v6.5
/**
 * pm_runtime_put_autosuspend - Drop device usage counter and queue autosuspend if 0.
 * @dev: Target device.
 *
 * Decrement the runtime PM usage counter of @dev and if it turns out to be
 * equal to 0, queue up a work item for @dev like in pm_request_autosuspend().
 */
static inline int pm_runtime_put_autosuspend(struct device *dev)
{
	return __pm_runtime_suspend(dev,
	    RPM_GET_PUT | RPM_ASYNC | RPM_AUTO);
}
...
...

//drivers/base/power/runtime.c - v6.5
/**
 * __pm_runtime_suspend - Entry point for runtime put/suspend operations.
 * @dev: Device to suspend.
 * @rpmflags: Flag bits.
 *
 * If the RPM_GET_PUT flag is set, decrement the device's usage count and
 * return immediately if it is larger than zero (if it becomes negative, log a
 * warning, increment it, and return an error).  Then carry out a suspend,
 * either synchronous or asynchronous.
 *
 * This routine may be called in atomic context if the RPM_ASYNC flag is set,
 * or if pm_runtime_irq_safe() has been called.
 */
int __pm_runtime_suspend(struct device *dev, int rpmflags)
{
	unsigned long flags;
	int retval;

	if (rpmflags & RPM_GET_PUT) {
		retval = rpm_drop_usage_count(dev);
		if (retval < 0) {
			return retval;
		} else if (retval > 0) {
			trace_rpm_usage(dev, rpmflags);
			return 0;
		}
	}

	might_sleep_if(!(rpmflags & RPM_ASYNC) && !dev->power.irq_safe);

	spin_lock_irqsave(&dev->power.lock, flags);
	retval = rpm_suspend(dev, rpmflags);
	spin_unlock_irqrestore(&dev->power.lock, flags);

	return retval;
}
EXPORT_SYMBOL_GPL(__pm_runtime_suspend);

: `__pm_runtime_suspend` 함수 또한 내부적으로 `rpm_suspend` 함수를 호출한다.

: `pm_schedule_suspend` 함수를 보면, 두 번째 인자로 `delay` 값을 조작해서 새로운 expires를 생성한다. 그리고 해당 값으로 타이머를 실행시킨다.

//drivers/base/power/runtime.c - v6.5
/**
 * pm_schedule_suspend - Set up a timer to submit a suspend request in future.
 * @dev: Device to suspend.
 * @delay: Time to wait before submitting a suspend request, in milliseconds.
 */
int pm_schedule_suspend(struct device *dev, unsigned int delay)
{
	unsigned long flags;
	u64 expires;
	int retval;

	spin_lock_irqsave(&dev->power.lock, flags);

	if (!delay) {
		retval = rpm_suspend(dev, RPM_ASYNC);
		goto out;
	}

	retval = rpm_check_suspend_allowed(dev);
	if (retval)
		goto out;

	/* Other scheduled or pending requests need to be canceled. */
	pm_runtime_cancel_pending(dev);

	expires = ktime_get_mono_fast_ns() + (u64)delay * NSEC_PER_MSEC;
	dev->power.timer_expires = expires;
	dev->power.timer_autosuspends = 0;
	hrtimer_start(&dev->power.suspend_timer, expires, HRTIMER_MODE_ABS);

 out:
	spin_unlock_irqrestore(&dev->power.lock, flags);

	return retval;
}
EXPORT_SYMBOL_GPL(pm_schedule_suspend);

: autosuspend를 사용할 때는, 타이머를 매번 갱신해줘야 한다. 타이머의 타임아웃은 `timeout = 현재 시간 + autosuspend delay`으로 설정된다. 예를 들어, autosuspend delay가 2초라고 가정하고, 시스템이 부팅 후 1000초 에서 `pm_runtime_put_suspend` 함수가 호출됬다. 그러면, 대략 1002초에 RPM suspend가 호출된다. 그리고, 저 2432초에 `pm_runtime_put_suspend` 함수가 호출되면, 대략 2434초에 RPM suspend가 호출된다. 그런데, 2초는 고정인데 앞에 초는 무엇이며 어디서 가져오는 것일까? 저 앞에 값은 `현재 시간`이다. 즉, autosuspend는 `현재 시간 + 2초`로 동작을 하는 것이다. 여기서 현재 시간이 `dev->power.last_busy`에 저장된다. 이 변수는 `pm_runtime_mark_last_busy` 함수를 통해서 현재 시간으로 갱신된다.

//drivers/base/power/runtime.c - v6.5
/*
 * pm_runtime_autosuspend_expiration - Get a device's autosuspend-delay expiration time.
 * @dev: Device to handle.
 *
 * Compute the autosuspend-delay expiration time based on the device's
 * power.last_busy time.  If the delay has already expired or is disabled
 * (negative) or the power.use_autosuspend flag isn't set, return 0.
 * Otherwise return the expiration time in nanoseconds (adjusted to be nonzero).
 *
 * This function may be called either with or without dev->power.lock held.
 * Either way it can be racy, since power.last_busy may be updated at any time.
 */
u64 pm_runtime_autosuspend_expiration(struct device *dev)
{
	int autosuspend_delay;
	u64 expires;

	if (!dev->power.use_autosuspend)
		return 0;

	autosuspend_delay = READ_ONCE(dev->power.autosuspend_delay);
	if (autosuspend_delay < 0)
		return 0;

	expires  = READ_ONCE(dev->power.last_busy);
	expires += (u64)autosuspend_delay * NSEC_PER_MSEC;
	if (expires > ktime_get_mono_fast_ns())
		return expires;	/* Expires in the future */

	return 0;
}
EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration);

...
...

//include/linux/pm_runtime.h - v6.5
/**
 * pm_runtime_mark_last_busy - Update the last access time of a device.
 * @dev: Target device.
 *
 * Update the last access time of @dev used by the runtime PM autosuspend
 * mechanism to the current time as returned by ktime_get_mono_fast_ns().
 */
static inline void pm_runtime_mark_last_busy(struct device *dev)
{
	WRITE_ONCE(dev->power.last_busy, ktime_get_mono_fast_ns());
}

: 그래서, `pm_runtime_put_autosuspend` 함수를 호출하기 전에 항상 반드시 `pm_runtime_mark_last_busy` 함수가 호출된다.

- Case Study

: 실제 케이스를 통해서 RPM이 어떻게 이용되는지 알아보자. 케이스는 `bh1780` 모델을 사용할 것이다. 먼저, bh1780 디바이스가 어떻게 RPM을 초기화 하는지 부터 살펴본다. `get_noresume`, `set_active`, `enable` 함수를 순서대로 호출하면서, 해당 디바이스 RPM을 활성화한다. 중요한 부분은 디바이스의 초기화가 완료되면, 곧 바로 SUSPEND에 들어간다는 것이다.

static int bh1780_probe(struct i2c_client *client)
{
	...
    ...
	
    /* Power up the device */
	ret = bh1780_write(bh1780, BH1780_REG_CONTROL, BH1780_PON);
	if (ret < 0)
		return ret;
	msleep(BH1780_PON_DELAY);
	pm_runtime_get_noresume(&client->dev);
	pm_runtime_set_active(&client->dev);
	pm_runtime_enable(&client->dev);

	ret = bh1780_read(bh1780, BH1780_REG_PARTID);
	if (ret < 0)
		goto out_disable_pm;
	dev_info(&client->dev,
		 "Ambient Light Sensor, Rev : %lu\n",
		 (ret & BH1780_REVMASK));

	/*
	 * As the device takes 250 ms to even come up with a fresh
	 * measurement after power-on, do not shut it down unnecessarily.
	 * Set autosuspend to a five seconds.
	 */
	pm_runtime_set_autosuspend_delay(&client->dev, 5000);
	pm_runtime_use_autosuspend(&client->dev);
	pm_runtime_put(&client->dev);
    
    ...
    ...
    
	return 0;

out_disable_pm:
	pm_runtime_put_noidle(&client->dev);
	pm_runtime_disable(&client->dev);
	return ret;
}

: 아래 코드만 보고는 알 수 없지만, 사실 `bh1780_read_raw` 함수는 IIO 프레임워크에 의해서 sysfs에 export 되어 있는 함수다. 이 함수는 bh1780의 특정 채널에서 데이터를 읽어가는 함수다. 그런데, 문제는 이 함수가 호출될 때, 실제 하드웨어 디바이스는 Power Off 이거나 Low Power Mode 일 수가 있다. 이 함수는 소프트웨어이기 때문에 호출이 되지만, 물리적으로는 아닐 수 도 있다는 뜻이다. 그렇기 때문에, 디바이스와 통신이 필요한 부분에서는 `pm_runtime_get_sync` 함수를 호출해서 물리적으로 디바이스를 깨우게 된다. 그리고 작업이 완료되면, `pm_runtime_last_busy` 함수를 통해 RPM 타이머의 현재 시간을 갱신해주고, `pm_runtime_put_autosuspend` 로 PM 코어에 suspend request를 보낸다.

//drivers/iio/light/bh1780.c - v6.5
static int bh1780_read_raw(struct iio_dev *indio_dev,
			   struct iio_chan_spec const *chan,
			   int *val, int *val2, long mask)
{
	struct bh1780_data *bh1780 = iio_priv(indio_dev);
	int value;

	switch (mask) {
	case IIO_CHAN_INFO_RAW:
		switch (chan->type) {
		case IIO_LIGHT:
			pm_runtime_get_sync(&bh1780->client->dev);
			value = bh1780_read_word(bh1780, BH1780_REG_DLOW);
			if (value < 0)
				return value;
			pm_runtime_mark_last_busy(&bh1780->client->dev);
			pm_runtime_put_autosuspend(&bh1780->client->dev);
			*val = value;

			return IIO_VAL_INT;
		default:
			return -EINVAL;
		}
	case IIO_CHAN_INFO_INT_TIME:
		*val = 0;
		*val2 = BH1780_INTERVAL * 1000;
		return IIO_VAL_INT_PLUS_MICRO;
	default:
		return -EINVAL;
	}
}

: 일반적으로 platform_driver.remove 콜백은 드라이버를 디바이스에서 제거할 때, 호출된다. 즉, 해당 디바이스는 더 이상 컨트롤하지 않겠다는 소리다. 그러므로, 디바이스를 아예 Off 시킬 필요가 있다. `bh1780_remove` 콜백에서도 보면 먼저 디바이스와 통신을 해야하므로, 먼저 `pm_runtime_get_sync` 함수를 통해 깨운다. 그 후에, `pm_runtime_put_noidle` 함수를 통해 bh1780 디바이스의 usage count를 1 감소시킨다. 왜 `pm_runtime_put` 함수를 사용하지 않을까? `bh1780_remove` 콜백 함수 자체가 bh1780 디바이스를 종료시키는 함수다. 그렇기 때문에, 별도의 종료 루틴을 실행할 필요가 없다(`pm_runtime_put` 함수를 실행하면 `.runtime_suspend` 함수가 실행될 것 이기 때문에). 해당, 디바이스의 RPM을 비활성화한다. 이 함수는 `power.disable_depth`를 증가시킨다. 이 때, 증가시키전에 `power.disable_depth` 값이 0이 었다면, 현재 pending runtime PM 요청을 모두 취소한다. 그리고, 현재 처리중인 runtme PM 작업이 있다면, 작업이 완료될 때 까지 대기한다.

//drivers/iio/light/bh1780.c - v6.5
static void bh1780_remove(struct i2c_client *client)
{
	struct iio_dev *indio_dev = i2c_get_clientdata(client);
	struct bh1780_data *bh1780 = iio_priv(indio_dev);
	int ret;

	iio_device_unregister(indio_dev);
	pm_runtime_get_sync(&client->dev);
	pm_runtime_put_noidle(&client->dev);
	pm_runtime_disable(&client->dev);
	ret = bh1780_write(bh1780, BH1780_REG_CONTROL, BH1780_POFF);
	if (ret < 0)
		dev_err(&client->dev, "failed to power off (%pe)\n",
			ERR_PTR(ret));
}

2. RPM Helper function

: 리눅스 커널 `Runtime Power Management Framework for I/O Devices` 문서 이름에서 볼 수 있다시피, runtime PM은 `I/O` 디바이스를 위한 전원 관리 프레임워크다.

: 리눅스 커널 파워 매니지먼트는 `runtime PM`과 `system PM`의 동기화를 맞추기 위해 전역적으로 하나의 워크 큐만 사용하고 있다. 그게 바로 `pm_wq`다. `pm_wq`에는 `runtime PM`과 `system PM` 관련 작업들이 모두 들어간다.

: `runtime PM`은 딱 하나의 구조체로 정의되어 있지 않다. 2개의 구조체에 나눠져서 관리되고 있다.

" runtime PM 콜백 함수 : struct dev_pm_ops
" runtime PM 변수 : struct dev_pm_info
" runtime PM 헬퍼 함수 : driver/base/power/runtime.c

: runtime PM은 헬퍼 함수 라는 개념이 존재하는데, 애네들은 2가지 역할을 한다.

" 동기화 : runtime PM은 system PM과 겹치는 부분이 존재하기 때문에, 반드시 동기화가 필요하다. 이런 동기화를 드라이버 개발자가 직접 수행하기에는 리스크가 너무 크다. 그래서, runtime PM은 헬퍼 함수를 제공해서 동기화를 프레임워크 레벨에게 위임하게 한다.

" 콜백 함수 호출 : runtime PM은 실제 기능은 없다. 실제 runtime PM의 핵심 기능은 위에서 말한 3개의 콜백이 한다. 그런데, 이 콜백을 호출하기 위해서는 헬퍼 함수를 통해서만 가능하다. runtime PM은 `usage counter`라는 개념을 이용하는데, 헬퍼 함수가 이 `usage counter`를 컨트롤한다.

- Runtime PM limitation

: `Runtime PM`이 `System PM`보다 훨씬 flexible 하더라도, 분명히 단점은 존재한다. ARM의 `Power Control System Architecture` 문서를 보면, 디바이스 관련해서 `Power state`가 고작 2개 뿐이다. Runtime PM은 이렇게 파워 상태가 2개로 정의 가능한 디바이스에서 최적의 성능을 발휘한다. 그렇다면, 만약에 GPU, NIC 같은 고성능 디바이스들의 파워 매니지먼트 컨트롤은 어떻게 해야 할까? 이 때는 `Runtime PM` 보다는 `Devfreq`가 더 작합하다고 볼 수 있다.

Power Control System Architecture

: 디바이스 파워 매니지먼트 관련 내용을 좀 더 알고 싶다면, PSCI와 SCMI보다는 ACPI에 더 좋은 정보가 많다(arm도 서버 시장에서는 ACPI를 사용하기 때문에, ACPI를 x86의 전유물로 봐서는 안된다).

2. Device Runtime PM Callbacks

: `strcut dev_pm_ops` 구조체안에는 runtime PM 관련 3개의 callbacks이 존재한다.

struct dev_pm_ops {
	...
	int (*runtime_suspend)(struct device *dev);
	int (*runtime_resume)(struct device *dev);
	int (*runtime_idle)(struct device *dev);
	...
};

: 다음 중 하나의 경우에 속할 경우 PM core는 device`subsystem의 runtime_suspend(), runtime_resume(), runtime_idle()을 호출한다. 아래의 내용중 하나라도 속하면 PM core는 해당 디바이스의 서브시스템 runtime PM 콜백을 실행시킨다. 응? 왜 해당 디바이스가 아닌, 해당 디바이스가 소속되어 있는 서브시스템의 runtime PM을 호출할까? 뒤에서 더 살펴보겠지만, runtime PM은 해당 디바이스에게 전원 관리를 강요하지 않는다. 해당 디바이스를 포괄하고 있는 서브시스템에게 전원 관리를 요청하는 구조다. 해당 내용은 뒤에서 더 살펴보도록 한다.

PM domain of the device, if the device's PM domain object, dev->pm_domain, is present.
Device type of the device, if both dev->type and dev->type->pm, are present.
Device class of the device, if both dev->class and dev->class->pm, are present.
Bus type of the device, if both dev->bus and dev->bus->pm are present.

: 위의 규칙을 적용된 서브시스템이 있다. 그런데, 해당 서브시스템이 runtime PM을 구현하지 않았다면, 그제서야 해당 디바이스 드라이버가 구현한 runtime PM을 호출한다(dev->driver->pm). PM Core에게는 각 서브시스템들의 우선 순위가 존재한다. PM Core는 이 우선 순위를 가지고 해당 디바이스의 runtime PM 콜백이 호출되어야 하는 시점에, 해당 디바이스의 서브시스템들을 우선 순위를 기준으로 확인한다. 그 우선순위는 다음과 같다.

PM domain
device type
class
bus type

: 항시 우선순위가 높은 콜백이 낮은 콜백보다 먼처 처리된다.

: runtime PM은 기본적으로 항시 `프로세스 컨택스트`에서 호출된다. 즉, Runtime PM은 인터럽트가 활성화된 상태로 동작한다. 그러나, `pm_runtime_irq_safe()` 헬퍼 함수를 사용하면 runtime PM callbacks들은 모두 atomic context에서 실행되게 만들어준다.

: `Susystem runtime suspend` 콜백이 존재하면 device의 suspend는 전적으로 subsystem이 관리한다. 즉, PM core 관점에서는 `subsystem runtime suspend` 콜백이 해당 디바이스의 suspend를 제대로만 처리해주면, 굳이 device driver의 runtime_suspend callback은 필요가 없다는 뜻이다.

The subsystem-level suspend callback, if present, is `_entirely_ _responsible_` for handling the suspend of the device as appropriate, which may, but need not include executing the device driver's own `->runtime_suspend()` callback (from the PM core's point of view it is not necessary to implement a `->runtime_suspend()` callback in a device driver as long as the subsystem-level suspend callback knows what to do to handle the device).

Subsystem-level suspend callback 혹 드라이버의 suspend callback이 정상적으로 호출되서 마무리가 잘 될경우, PM core는 해당 디바이스를 `suspended` 상태로 간주한다. 그런데 앞에 말이 디바이스를 low-power 상태로 바꿨다는 뜻은 아니다. 정리하면, runtime_suspend callback이 정상적으로 마무리가 되면 디바이스는 데이터 처리를 하지 못할 뿐만 아니라 CPU(s)와 RAM과도 통신하지 못하게 된다. 이러한 상황은 resume callback이 실행되기 전까지 계속된다. 결국 runtime PM status는 suspend callback이 성공적으로 실행되고 난 후, `suspended` 상태로 바뀌게 된다.
suspend callback이 -EBUSY 또는 -EAGAIN을 return하는 경우 device`s runtim PM 상태는 'active'으로 유지되며, which means that the device must be fully operational afterwards.
만약 suspend callback이 -EBUSY와 -EAGAIN이 아닌 다른 error code를 return하는 경우, PM core는 이 상황을 fatal error라고 여기고 내부적으로 runtime PM helper function이 실행되는 것을 막는다. 이 상태는 디바이스의 runtime PM status가 `active` 혹 `suspended`가 될 때까지 지속된다(그래서 PM core는 fatal error를 `active` 혹 `suspended`로 전환할 수 있게 special runtime PM helper functions을 제공한다).

In particular, if the driver requires remote wakeup capability (i.e. hardware mechanism allowing the device to request a change of its power state, such as PCI PME) for proper functioning and device_can_wakeup() returns 'false' for the device, then ->runtime_suspend() should return -EBUSY. On the other hand, if device_can_wakeup() returns 'true' for the device and the device is put into a low-power state during the execution of the suspend callback, it is expected that remote wakeup will be enabled for the device. Generally, remote wakeup should be enabled for all input devices put into low-power states at run time.

Susystem-level resume callback이 존재하면 device의 resume는 전적으로 subsystem이 관리한다. 즉, PM core 관점에서는 subsystem-level resume callback이 해당 디바이스의 resume를 제대로만 처리해주면, 굳이 device driver의 runtime_resume callback은 필요가 없다는 뜻이다.
- Subsystem-level resume callback 혹 드라이버의 resume callback이 정상적으로 호출되서 마무리가 잘 될경우, PM core는 해당 디바이스를 `fully operational` 상태로 간주한다. 디바이스의 runtime PM status는 `active`가 된다.
- 만약 resume callback이 -EBUSY와 -EAGAIN이 아닌 다른 error code를 return하는 경우, PM core는 이 상황을 fatal error라고 여기고 내부적으로 runtime PM helper function이 실행되는 것을 막는다. 이 상태는 디바이스의 runtime PM status가 `active` 혹 `suspended`가 될 때까지 지속된다(그래서 PM core는 fatal error를 `active` 혹 `suspended`로 전환할 수 있게 special runtime PM helper functions을 제공한다).
PM Core의 특정 디바이스가 idle로 가면 해당 디바이스의 idle callback의 호출조건이 있다. 그런데 디바이스가 idle 상태로 갔다는 판단여부는 어떻게 할까? PM Core는 2가지 usage를 통해 디바이스의 idle 여부를 판단한다(참고로 idle callback 또한 subsystem-level idle callback이 존재하면 그걸 호출하고 아니면 driver쪽 runtime_idle callback을 직접 호출할 수도 있다).
- The device`s usage counter - 특정 디바이스의 사용 카운터
- The counter of `active` children of device - 특정 디바이스에서 `active` 상태인 자식 디바이스의 개수
runtime PM helper function을 통해서 위의 카운터들을 값을 감소시킬 수 있다. 위의 2개의 카운터가 모두 0이 되어야 idle callback을 실행한다.

idle callback에서 수행하는 작업은 subsystem 혹 driver에 따라 다르지만, 추척하는 방식은 다음과 같다.
- device가 suspended 상태로 갈 수 있는 조건들을 만족하면 suspend request를 queue up 해라.
그런데 idle callback이 존재하지 않거나, 혹은 idle callback은 있는데 0을 반환할 경우, PM core는 autosuspend가 설정된 device의 runtime_suspend를 호출 할 것이다. 이 말은 사실 pm_runtime_autosuspend()를 호출한다는 뜻이다(do note that drivers needs to update the device last busy mark, pm_runtime_mark_last_busy(), to control the delay under this circumstance).
위와 같은 상황을 막고 싶다면(추측컨데, autosuspend가 호출되는 상황?), the routine must return a non-zero value. -> `the routine`이 말하는게 idle callback을 말하는건가? autosuspend를 말하는건가? 글의 뉘앙스는 idle callback 느낌인데 말이지...
음수값을 반환하면 PM core는 무시한다.

The helper functions provided by the PM core, described in Section 4, guarantee that the following constraints are met with respect to runtime PM callbacks for one device:
1. RPM callback 상호 배재적이여야 한다(예를 들어, runtime_suspend()가 실행되면서 병렬적으로 runtime_resume() 혹 동일 디바이스에서 runtime_suspend()가 호출되는 것은 안된다). 그러나 예외는 존재한다. runtime_suspend()와 runtime_resume()은 runtime_idle()과 병렬로 실행될 수가 있다. 왜냐면, runtime_idle()은 다른 콜백들이 실행될 동안에는 시작될 수 없기 때문이다(우선순위에서 밀린다).
2. runtime_idle()과 runtime_suspend()는 `active` device에서만 실행되어야 한다.
3. runtime_idle()과 runtime_suspend()는 usage counter가 0이여야 하고 `active` children device가 0이거나 `power.ignore_children`이 1이인 경우에만 호출된다.
4. runtime_resume()은 `suspended` device에서만 실행될 수 있다.
Additionally, the helper functions provided by the PM core obey the following rules:
- If runtime_suspend() is about to be executed or there's a pending request to execute it, runtime_idle() will not be executed for the same device. -> 여튼 runtime_suspend()가 runtime_idle() 보다 쌔다는 소리.
- runtime_suspend() 실행 요청이 들어오면, 동일 디바이스의 모든 runtime_idle() pending reuqests를 취소한다.
- If runtime_resume() is about to be executed or there's a pending request to execute it, the other callbacks will not be executed for the same device.
- runtime_resume() 실행 요청이 들어오면, scheduled autosuspends만 제외하고 동일 디바이스의 모든 RPM callback에 대한 pending or scheduled request 모두 취소된다.
즉, 우선 순위는 `runtime_resume > runtime_suspend > runtime_idle` 이 되겠다.

3. Runtime PM Device Fields

struct timer_list suspend_timer - timer used for scheduling (delayed) suspend and autosuspend requests
unsigned long timer_expires - timer expiration time, in jiffies (if this is different from zero, the timer is running and will expire at that time, otherwise the timer is not running)
unsigned int disable_depth - 이 값이 0일 때만, RPM helper function을 사용할 수 있다. 예를 들어, depth_depth가 1이 이상이면 suspend/resume/get/put 함수들은 바로 에러를 리턴한다. 이 값은 pm_runtime_init()에 의해서 초기값이 1로 세팅된다. 즉, 드라이버 개발자는 적절한 시점에 반드시 명시적으로 이 값을 0으로 재설정 해야한다. 아래 함수를 보면 RPM 초기화 시점에 disable_depth을 1로 초기화하는 것을 볼 수 있다. pm_runtime_enable()을 통해서 disable_depth 값을 1 감소시키는 것을 볼 수 있다.

/drivers/base/power/runtime.c - v6.2.2
void pm_runtime_enable(struct device *dev)
{
	unsigned long flags;

	spin_lock_irqsave(&dev->power.lock, flags);

	if (!dev->power.disable_depth) {
		dev_warn(dev, "Unbalanced %s!\n", __func__);
		goto out;
	}

	if (--dev->power.disable_depth > 0)
		goto out;

	dev->power.last_status = RPM_INVALID;
	dev->power.accounting_timestamp = ktime_get_mono_fast_ns();

	if (dev->power.runtime_status == RPM_SUSPENDED &&
	    !dev->power.ignore_children &&
	    atomic_read(&dev->power.child_count) > 0)
		dev_warn(dev, "Enabling runtime PM for inactive device with active children\n");

out:
	spin_unlock_irqrestore(&dev->power.lock, flags);
}
EXPORT_SYMBOL_GPL(pm_runtime_enable);


---------------------------------------------------------------
/drivers/base/power/runtime.c - v6.2.2
void pm_runtime_init(struct device *dev)
{
	dev->power.runtime_status = RPM_SUSPENDED;
	dev->power.last_status = RPM_INVALID;
	dev->power.idle_notification = false;

	dev->power.disable_depth = 1;
	atomic_set(&dev->power.usage_count, 0);

	dev->power.runtime_error = 0;

	atomic_set(&dev->power.child_count, 0);
	pm_suspend_ignore_children(dev, false);
	dev->power.runtime_auto = true;

	dev->power.request_pending = false;
	dev->power.request = RPM_REQ_NONE;
	dev->power.deferred_resume = false;
	dev->power.needs_force_resume = 0;
	INIT_WORK(&dev->power.work, pm_runtime_work);

	dev->power.timer_expires = 0;
	hrtimer_init(&dev->power.suspend_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
	dev->power.suspend_timer.function = pm_suspend_timer_fn;

	init_waitqueue_head(&dev->power.wait_queue);
}

atomic_t child_count - 부모 디바이스가 갖고 있는 `active` children device의 개수. 예를 들어, 아래 코드에서 dev->power.child_count 가 0보다 크다는 것은 현재 자신이 가지고 있는 자식 디바이스들중에 active가 된 디바이스가 1개 이상이라는 소리다.

if (dev->power.runtime_status == RPM_SUSPENDED &&
	    !dev->power.ignore_children &&
	    atomic_read(&dev->power.child_count) > 0)
		dev_warn(dev, "Enabling runtime PM for inactive device with active children\n")

atomic_t usage_count - the usage counter of the device. 아래의 예시를 보자. resume 호출 시에는 dev->power.usage_count을 증가시키는 코드를 확인할 수 있다.

int __pm_runtime_resume(struct device *dev, int rpmflags)
{
	unsigned long flags;
	int retval;

	might_sleep_if(!(rpmflags & RPM_ASYNC) && !dev->power.irq_safe &&
			dev->power.runtime_status != RPM_ACTIVE);

	if (rpmflags & RPM_GET_PUT)
		atomic_inc(&dev->power.usage_count);

	spin_lock_irqsave(&dev->power.lock, flags);
	retval = rpm_resume(dev, rpmflags);
	spin_unlock_irqrestore(&dev->power.lock, flags);

	return retval;
}
EXPORT_SYMBOL_GPL(__pm_runtime_resume);

unsigned int ignore_children - 이 값이 1로 설정되면, 부모 디바이스와 자식 디바이스는 독립적으로 Runtime PM을 동작시킬 수 있다. 좀 더 코드적으로 얘기하면 위의 dev->power.child_count는 무시된다(그러나 child_count는 값은 계속 updated 된다). 아래 코드에서 `!dev->power.ignore_children` 부분을 잘 봐야한다. 모든 디바이스는 자식이면서 부모일 수 있다.

/drivers/base/power/runtime.c - V6.2.2.
void pm_runtime_enable(struct device *dev)
{
	unsigned long flags;

	spin_lock_irqsave(&dev->power.lock, flags);

	if (!dev->power.disable_depth) {
		dev_warn(dev, "Unbalanced %s!\n", __func__);
		goto out;
	}

	if (--dev->power.disable_depth > 0)
		goto out;

	dev->power.last_status = RPM_INVALID;
	dev->power.accounting_timestamp = ktime_get_mono_fast_ns();

	if (dev->power.runtime_status == RPM_SUSPENDED &&
	    !dev->power.ignore_children &&
	    atomic_read(&dev->power.child_count) > 0)
		dev_warn(dev, "Enabling runtime PM for inactive device with active children\n");

out:
	spin_unlock_irqrestore(&dev->power.lock, flags);
}
EXPORT_SYMBOL_GPL(pm_runtime_enable);

int runtime_error - 이 값이 1로 설정되면, fatal error를 의미한다(RPM callbacks들에서 특정 에러 코드를 반환하면 runtime_error가 1리 세팅된다). 요 값이 1로 설정되면, RPM helper functions들이 동작하지 않는다. 요 값이 clear될 때, RPM helper functions들이 정상동작 한다. 정리하면, 이 값이 0이 아니라면, RPM callbacks에서 특정 에러 코드를 반환한 것으로 생각하면 된다.
enum runtime_status - 이름 그대로 device의 RPM status를 나타낸다. 초기값은 RPM_SUSPENDED기 때문에 초기에 PM core에게 해당 디바이스는 `suspended` 상태로 인식된다. 아래 pm_runtime_init()에서 runtime_status 가 RPM_SUSPENDED로 초기화되는 것을 볼 수 있다.

/drivers/base/power/runtime.c - v6.2.2
/**
 * pm_runtime_init - Initialize runtime PM fields in given device object.
 * @dev: Device object to initialize.
 */
void pm_runtime_init(struct device *dev)
{
	dev->power.runtime_status = RPM_SUSPENDED;
	dev->power.last_status = RPM_INVALID;
	dev->power.idle_notification = false;

	dev->power.disable_depth = 1;
	atomic_set(&dev->power.usage_count, 0);

	dev->power.runtime_error = 0;

	atomic_set(&dev->power.child_count, 0);
	pm_suspend_ignore_children(dev, false);
	dev->power.runtime_auto = true;

	dev->power.request_pending = false;
	dev->power.request = RPM_REQ_NONE;
	dev->power.deferred_resume = false;
	dev->power.needs_force_resume = 0;
	INIT_WORK(&dev->power.work, pm_runtime_work);

	dev->power.timer_expires = 0;
	hrtimer_init(&dev->power.suspend_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
	dev->power.suspend_timer.function = pm_suspend_timer_fn;

	init_waitqueue_head(&dev->power.wait_queue);
}

unsigned int irq_safe - 이 값이 set되면, runtime_suspemd()외 runtime_resume()은 인터럽트는 비활성화 상태이고 스핀락에 의해 동기화된 상태로 호출된다.
unsigned int use_autosuspend - delayd autosuspend를 사용여부를 나타낸다. 이 값을 `dev->power.use_autosuspend = 1` 처럼 바꾸면 안된다. 반드시 pm_runtime{_dont}_use_suspend() helper function을 이용해서 값을 수정해야 한다.
unsigned int timer_autosuspends - RPM에서는 suspend가 2개다. 일반 suspend와 autosuspend가 존재한다. 이 값이 설정되면, PM Core에서 일반 suspend가 아닌, autosuspend가 호출되어야 한다는 것을 알린다.
int autosuspend_delay - autosuspend에 사용되는 delay time 값을 의미한다. 단위는 ms.
unsigned long last_bisy -

6. Runtime PM and System Sleep

Runtime PM과 system PM은 서로 상호작한다. 만약 시스템 전체가 suspend로 가려고 할 때, 특정 디바이스가 `active`면 모든 것은 순조로울 것이다. 그런데 디바이스가 이미 `suspended`면 어떨까?
디바이스는 runtime PM과 system PM에 대한 wake-up 설정이 다를 수 있다. 예를 들어, runtime suspend에 대해 remote wake-up이 enabled 되어 있는데, system PM에 대해서는 disabled 되어 있을 수 있다(device_may_wakeup(dev)가 `false`를 리턴).
On some systems, system sleep is not entered through a global firmware or hardware operation. Instead, all hardware components are put into low-power states directly by the kernel in a coordinated way. Then, the system sleep state effectively follows from the states the hardware components end up in and the system is woken up from that state by a hardware interrupt or a similar mechanism entirely under the kernel's control. As a result, the kernel never gives control away and the states of all devices during resume are precisely known to it. If that is the case and none of the situations listed above takes place (in particular, if the system is not waking up from hibernation), it may be more efficient to leave the devices that had been suspended before the system suspend began in the suspended state.
이를 위해 PM core는 서로 다른 디바이스 계층간의 협력할 수 있는 메커니즘을 제공해야 한다. -> 서로 다른 디바이스 계층`은 system PM과 runtime PM을 의미하는 것 같은데... 좀 더 공부 필요.
즉, `서로 다른 디바이스 계층간의 협력할 수 있는 메커니즘`이란 결국 특정 디바이스에서 system suspend .prepare callback이 양수를 반환하면, 이건 PM core한테 `해당 디바이스가 runtime-suspended 상태인 거 같다.` 말하는 것을 의미한다. 결국 PM core는 아래의 2가지 조건을 충족하는 하에서 해당 디바이스를 runtime-suspended 상태로 유지한다.
1. 해당 디바이스가 runtime-suspended 상태.
2. 해당 디바이스의 모든 자식 디바이스들도 runtime-suspended 상태.
결국 위와 같은 상황이 발생하면, PM core는 모든 system suspend / resume callback을 실행하지 않는다. 근데 빡치게 또 2가지 예외가 있다...
1. 이 상황에서도 system suspend `complete` callback은 실행이 될 수 있다.
2. `hibernation`과 관련이 없는 system suspend transitions(suspend-to-RAM 같은 경우)에만 적용된다. 즉, hiberantion system suspend transition 에서는 위의 내용이 적용되지 않는다.

PM core는 RPM callback과 SPM callback 사이에 발생하는 race conditions 발생 여부를 줄이기 위해, 다음 작업들을 수행한다.
- During system suspend, pm_runtime_get_noresume() is called for every device right before executing the subsystem-level .prepare() callback for it and pm_runtime_barrier() is called for every device right before executing the subsystem-level .suspend() callback for it.
- In addition to that the PM core calls __pm_runtime_disable() with 'false' as the second argument for every device right before executing the subsystem-level .suspend_late() callback for it.

Linux 전원 관리 기능 설명 (11) - Runtime PM (http://www.wowotech.net/pm_subsystem/rpm_overview.html)

3.2 get과 put의 타이밍

이번 장의 본질은 다음과 같다: device idle의 판단 기준은 무엇인가?
`autosleep`에서의 `Opportunistic suspend`에 관한 논의를 떠올려 보면, 'Opportunistic suspend' 경우, 전체 시스템의 운영 환경이 상대적으로 복잡하기 때문에 suspend 타이밍을 판단하기가 상당히 어렵다. 그런데 Runtime PM에서는 전체 시스템이 아닌 하나의 디바이스를 기준으로 idle의 시점을 파악하기 때문에 상대적으로 System PM보다 idle 타이밍을 파악하기가 쉽다. 다시 주제로 돌아와서, device에게 idle이란 무엇인가?
ㄴㅇㄴㅇ
ㄴㅇㄴㅇ
ㄴㅇㄴㅇ
get과 put이 바로 device의 idle 상태의 전환점이기 때문에 겟과 풋의 타이밍을 쉽게 잡을 수 있다.

3.4 Runtime PM 프로세스의 동기화 문제

.runtime_xxx 콜백 함수가 비동기식으로 호출될 수 있다는 점과 System PM와 Runtime PM이 공존하는 현재 상황으로 인해 Runtime PM은 동기화 문제를 주의 깊게 처리해야 한다. 아래의 예시들을 들 수 있다.
- 여러 .runtime_suspend 요청 간의 동기화 -> 하나의 디바이스안에서 .runtume_suspend가 여러번 불릴 수 있다는 소리
- 여러 .runtime_resume 요청 간의 동기화 -> 하나의 디바이스안에서 .runtume_resume가 여러번 불릴 수 있다는 소리
- 여러 .runtime_idle 요청 간의 동기화 -> 하나의 디바이스안에서 .runtume_idle가 여러번 불릴 수 있다는 소리
- .runtime_suspend 요청과 .runtime_resume 요청 간의 동기화
- .runtime_suspend 요청과 system_suspend 간의 동기화
- .runtime_resume 요청과 system resume 간의 동기화
위의 문제말고도 더 있지만, 저 정도면 제대로 해결해도 다른 문제는 이상이 없을 듯 하다.

3.5 Cascade 장치 간 Runtime PM ( 여기서 케스케이드는 `이전 장비에 따라 다음 장비가 동작하도록 설계된 연결 방법`을 의미 혹은 `연쇄적으로 어떤 일이 또 다른 일에 영향을 주는 상황`)

struct device 구조체안에도 부모 디바이스를 가리키는 포인터가 있다. 대개 부모 장치는 버스 혹은 호스트 컨트롤러가 되며 하위(자식) 디바이스의 동작은 상위(부모) 디바이스의 상태에 따라 달라진다. Runtime PM도 이게 그대로 적용된다.
1. parent device 아래 어느 하나의 child device만 active 상태여도 parent device 또한 active 되어야 합니다.
2. parent device 아래에 있는 child device는 idle이 되면 parent에 통보하고 parent는 이를 통해 액티브 상태의 child device 개수를 기록한다.
3. parent device 아래의 모든 child device가 idle되어야 parent device가 idle할 수 있습니다.
위의 동작들은 모두 RPM Core에서 알아서 동작하니께 드라이버 개발자들은 걱정할 필요업다.

저작자표시 비영리 변경금지

'Linux > kernel' 카테고리의 다른 글

[리눅스 커널] PM - Wakeup count (0)	2023.08.29
[리눅스 커널] PM - Autosleep (0)	2023.08.07
[리눅스 커널] PM - Linux Kernel Power Management Overview (0)	2023.08.07
[리눅스 커널] PM - Regulator framework : overview & devicetree (0)	2023.08.07
[리눅스 커널] devicetree overlay (0)	2023.08.07

ABOUT ME

Ease is the greatest threat Ease is the greatest threat

글의 참고

글의 전제

글의 내용

- Overview

- Two Models for Device Power Management

- Each sysfs control file on two models

- Calling Sequence Gurantees

- RPM Callbacks

- RPM 동작

- RPM 초기화

- RPM 부모 / 자식

- RPM 동기 / 비동기

- RPM Autosuspend

- Case Study

2. RPM Helper function

- Runtime PM limitation

2. Device Runtime PM Callbacks

3. Runtime PM Device Fields

6. Runtime PM and System Sleep

Linux 전원 관리 기능 설명 (11) - Runtime PM (http://www.wowotech.net/pm_subsystem/rpm_overview.html)

3.2 get과 put의 타이밍

3.4 Runtime PM 프로세스의 동기화 문제

3.5 Cascade 장치 간 Runtime PM ( 여기서 케스케이드는 `이전 장비에 따라 다음 장비가 동작하도록 설계된 연결 방법`을 의미 혹은 `연쇄적으로 어떤 일이 또 다른 일에 영향을 주는 상황`)

'Linux > kernel' 카테고리의 다른 글

티스토리툴바

ABOUT ME

글의 참고

글의 전제

글의 내용

- Overview

- Two Models for Device Power Management

- Each sysfs control file on two models

- Calling Sequence Gurantees

- RPM Callbacks

- RPM 동작

- RPM 초기화

- RPM 부모 / 자식

- RPM 동기 / 비동기

- RPM Autosuspend

- Case Study

2. RPM Helper function

- Runtime PM limitation

2. Device Runtime PM Callbacks

3. Runtime PM Device Fields

6. Runtime PM and System Sleep

Linux 전원 관리 기능 설명 (11) - Runtime PM (http://www.wowotech.net/pm_subsystem/rpm_overview.html)

3.2 get과 put의 타이밍

3.4 Runtime PM 프로세스의 동기화 문제

3.5 Cascade 장치 간 Runtime PM ( 여기서 케스케이드는 `이전 장비에 따라 다음 장비가 동작하도록 설계된 연결 방법`을 의미 혹은 `연쇄적으로 어떤 일이 또 다른 일에 영향을 주는 상황`)

'Linux > kernel' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바