[리눅스 커널] Timer - timekeeping

Linux/kernel 2023. 10. 3. 15:18

글의 참고

- https://docs.kernel.org/core-api/timekeeping.html

- https://www.kernel.org/doc/Documentation/timers/timekeeping.txt

- http://www.wowotech.net/timer_subsystem/timekeeping.html

- https://www.kernel.org/doc/Documentation/timers/timers-howto.txt

- https://blog.csdn.net/orangeboyye/article/details/125744132

- https://blog.csdn.net/u012294613/article/details/128904491

글의 전제

- 밑줄로 작성된 글은 강조 표시를 의미한다.

- 그림 출처는 항시 그림 아래에 표시했다.

- 커널 버전 v3.14 분석

글의 내용

- Overview

" timer subsystem 에서 timekeeping module 이 time services 를 제공하기 위해 기본적으로 여러 timelines 들을 tracking 하는 역할을 한다. 여기서 timelines 들의 종류로는 real time clock, monotonic clock, monotonic raw clock 등이 있다. 이 때, timeline 은 clock source module 및 tick module 으로 기반으로 유지된다. 이 때, 중요한 점은 tick module 에서 발생시키는 periodic tick event 를 통해서 timeline 이 유지된다는 것이다. 그러나, period tick 은 기본적으로 10ms 간격으로 발생하기 때문에, 그 사이에 존재하는 시간들에(예를 들어, 12ms, 16ms 등) 대해서는 time service 를 받기 힘들다. 그렇기 때문에, 더 정교한 time service 를 제공하기 위해 여기에 clock source module 의 도움이 필요하게된다. 이 글에서는 timekeeping 과 연관된 총 2 가지 주제를 다뤄보려고 한다.

1. 초기화
2. tick module & clock source module 를 통한 timeline 유지

- Data structure

1. struct timekeeper

" old kernel 에서는 다양한 system clocks(real time, monotonic time, boot time 등) 을 관리하기 위해서 각 system clocks 에 대응하는 전역 변수들을 선언했지만, struct timekeeper 구조체가 도입되면서 다양한 system clocks 들을 timekeeper 전역 변수만을 통해서 관리할 수 있게 되었다.

// include/linux/timekeeper_internal.h - v3.14
/* Structure holding internal timekeeping values. */
struct timekeeper {
	/* Current clocksource used for timekeeping. */
	struct clocksource	*clock;
	/* NTP adjusted clock multiplier */
	u32			mult;
	/* The shift value of the current clocksource. */
	u32			shift;
	/* Number of clock cycles in one NTP interval. */
	cycle_t			cycle_interval;
	/* Last cycle value (also stored in clock->cycle_last) */
	cycle_t			cycle_last;
	/* Number of clock shifted nano seconds in one NTP interval. */
	u64			xtime_interval;
	/* shifted nano seconds left over when rounding cycle_interval */
	s64			xtime_remainder;
	/* Raw nano seconds accumulated per NTP interval. */
	u32			raw_interval;

	/* Current CLOCK_REALTIME time in seconds */
	u64			xtime_sec;
	/* Clock shifted nano seconds */
	u64			xtime_nsec;

	/* Difference between accumulated time and NTP time in ntp
	 * shifted nano seconds. */
	s64			ntp_error;
	/* Shift conversion between clock shifted nano seconds and
	 * ntp shifted nano seconds. */
	u32			ntp_error_shift;

	/*
	 * wall_to_monotonic is what we need to add to xtime (or xtime corrected
	 * for sub jiffie times) to get to monotonic time.  Monotonic is pegged
	 * at zero at system boot time, so wall_to_monotonic will be negative,
	 * however, we will ALWAYS keep the tv_nsec part positive so we can use
	 * the usual normalization.
	 *
	 * wall_to_monotonic is moved after resume from suspend for the
	 * monotonic time not to jump. We need to add total_sleep_time to
	 * wall_to_monotonic to get the real boot based time offset.
	 *
	 * - wall_to_monotonic is no longer the boot time, getboottime must be
	 * used instead.
	 */
	struct timespec		wall_to_monotonic;
	/* Offset clock monotonic -> clock realtime */
	ktime_t			offs_real;
	/* time spent in suspend */
	struct timespec		total_sleep_time;
	/* Offset clock monotonic -> clock boottime */
	ktime_t			offs_boot;
	/* The raw monotonic time for the CLOCK_MONOTONIC_RAW posix clock. */
	struct timespec		raw_time;
	/* The current UTC to TAI offset in seconds */
	s32			tai_offset;
	/* Offset clock monotonic -> clock tai */
	ktime_t			offs_tai;

};

- clock : 현재 timekeeper 에서 사용중인 clock source 를 의미한다. 뒤에서 보겠지만, clock sources 는 시스템에 여러 개 존재할 수 있다. timekeeper 는 그중에서 가장 성능이 좋은 clock source 를 사용하려고 하기 때문에, `clock source switching` 이 발생하게 된다.

- xtime_sec & xtime_nsec : 이 필드는 real time, wall time 혹은 RTC time 이라고도 불린다. 실제 세계 시각이라고 보는 것이 좋을 것 같다. 그런데, RTC 가 상대적으로 performance 가 좋지 않다. 예를 들어, 대부분의 RTC 는 milli-seconds 정도의 정확도 밖에 내지 못한다. 만약, internal RTC 가 아닌 external RTC 라면 access 속도가 떨어져서 정확성도 더 안좋아질 것이다. 이러한 이유는 kernel 은 clock source 를 사용해서 별도의 wall time(xtime) 을 유지한다. clock source 는RTC 보다 정확도가 훨씬 높고, 심지어 nano-seconds 까지 지원하는 수준도 있다. xtime_sec 에는 linux epoch 를 기준으로 시간이 얼마나 지났는지를 저장한다. xtime_nsec 를 사용하면 nano-seconds 의 정확도를 지원할 수 있다(물론, 하드웨어에서 nano-seconds 정확도를 지원한다는 것을 가정한다).

- wall_to_monotonic : system clocks 타입 중에서 `CLOCK_MONOTONIC` 타입을 사용하는 필드다. real time(wall time) 은 linux epoch 를 기준으로 현재 시간을 계산하는 것과 달리, monotonic time 은 system boot-on 시점을 기준으로 현재 시간을 계산한다. 그런데, 이상한 건 커널이 real time 과는 달리, monotonic time 을 직접적으로 저장하는 변수는 별도로 두지 않는다. 즉, wall_to_monotinic 필드는 monotonic time 을 저장하는 변수가 아니다. 이 필드는 wall time 과 monotonic time 의 offset 을 저장한다. 예를 들어, monotonic time 이 필요할 경우, xtime 에 wall_to_monotonic 을 더하면 된다. 즉, `monotonic time = xtime + wall_to_monotonic` 인 것이다. 그리고, startup 시점에 monotonic time 은 0 이기 때문에, wall_to_monotonic 은 실질적으로 값이 음수여야 한다. 이 내용은 timekeeping_init() 함수를 참고하자.

- raw_time : system clocks 타입 중에서 `CLOCK_MONOTONIC_RAW` 타입을 사용하는 필드다(`CLOCK_MONOTONIC`과 동일하지만, NTP에 영향을 받지 않는다[참고1]).

- total_sleep_time : 정확한 monotonic time 을 계산하기 위해서는 system sleep 동안에 소모된 시간을 제거해야 한다. 그래서, 커널은 total_sleep_time 에 system sleep 상태에 있었던 시간을 저장한다.

2. global variables

" timekeeper module 은 시스템에 존재하는 모든 clocks 들을 유지 및 관리한다고 했다. time 이란 계속 유지되어야 하며, 어떠한 modules 에서 읽건간에 일정한 값이 읽혀야 한다. 이러한 특성을 고려할 때, 가장 먼저 생각나는 자료 구조는 전역 변수다. 그런데, 일반적으로 전역 변수는 shared resource 라는 특징이 있기 때문에, lock protection 을 반드시 가지고 있어야 한다. timekeeper module 은 timekeeper_lock 과 timekeeper_seq 를 통해서 `static struct timekeeper timekeeper` 를 보호한다.

// kernel/time/timekeeping.c - v3.14
static struct timekeeper timekeeper;
static DEFINE_RAW_SPINLOCK(timekeeper_lock);
static seqcount_t timekeeper_seq;
static struct timekeeper shadow_timekeeper;

/* flag for if timekeeping is suspended */
int __read_mostly timekeeping_suspended;

/* Flag for if there is a persistent clock on this platform */
bool __read_mostly persistent_clock_exist = false;

" `static struct timekeeper shadow_timekeeper` 는 system time 을 update 하는데 사용된다. 아래 update_wall_time() 함수를 보면, shadow_timekeerper(tk) 에 모든 시간 관련 작업을 처리한 뒤, 최종적으로 memcpy() 를 통해서 timekeeper 와 shadow_timekeeper 를 동기화시킨다. 왜 이렇게 할까? 아래 내용을 보면, RCU 처럼 `swtiching pointers` 를 하려고 했지만, 이렇게 하면, RCU 와 마찬가지로 old references 들에 reclaim 문제가 남는다. 그래서, 시간이 좀 더 걸리고 atomic 연산은 아니더라도 `data copy` 를 통해서 old references 들에 대한 reclaim 문제를 해소하고 있다.

// kernel/time/timekeeping.c - v3.14
/**
 * update_wall_time - Uses the current clocksource to increment the wall time
 *
 */
void update_wall_time(void)
{
	....
	struct timekeeper *real_tk = &timekeeper;
	struct timekeeper *tk = &shadow_timekeeper;
	
    	....
    	raw_spin_lock_irqsave(&timekeeper_lock, flags);
    	....

	// 여기서 tk(shadow_timekeeper) 에다가 시간 관련 많은 작업을 진행함. 
	....
    	write_seqcount_begin(&timekeeper_seq);
    
    	....
	/*
	 * Update the real timekeeper.
	 *
	 * We could avoid this memcpy by switching pointers, but that
	 * requires changes to all other timekeeper usage sites as
	 * well, i.e. move the timekeeper pointer getter into the
	 * spinlocked/seqcount protected sections. And we trade this
	 * memcpy under the timekeeper_seq against one before we start
	 * updating.
	 */
	memcpy(real_tk, tk, sizeof(*tk));
	timekeeping_update(real_tk, clock_set);
    	write_seqcount_end(&timekeeper_seq);
out:
	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
	....
}

" timekeeper(real_tk) 와 shadow_timekeeper(tk) 의 동기화를 위해서 2 가지 locks 이 사용된다.

1. timekeeper_lock - 이 lock 은 다른 processes 들에서도 update_wall_time() 함수를 호출할 수 있기 때문에, update_wall_time() 함수를 호출하지 못하도록 막는 역할을 한다(엄밀히 말하면, 함수 호출 자체를 막는 것은 아니지만, 동기화의 범위가 함수의 시작과 끝에 있기 때문에 `함수 자체` 라고 표현했다). 그리고, raw_spin_lock_irqsave() 은 리눅스 커널에서 존재하는 가장 강력한 lock 이다(다른 processor 로 부터 선점을 막고(spinlock) + local CPU interrupts disabled).

2. timekeeper_seq - timekeeper(tk) 가 전역 변수이고 time service 이다 보니, 다른 modules 에서 인기가 많다. 즉, accesses 시도들이 상당히 많다. 그런데, timekeeper module 을 제외하고 나머지 다른 modules 들은 read 만 하는 구조를 가지고 있기 때문에, sequencial lock 을 이용해서 timekeeper 를 보호한다.

- Timekeeping initialization

" timekeeper 의 초기화 코드는 timekeeping_init() 함수에 있다. 이 함수는 BSP 가 커널을 초기화할 때, 즉, start_kernel() 함수에서 호출된다. timekeeper module 은 여러 가지 system clocks 들을 제공한다. 그런데, 이러한 system clocks 들은 RAM 에 저장되기 때문에, powered-off 되면 모든 정보들이 사라져 버린다. 그러므로, powered-on 되면, 현재 시간을 RTC 와 같은 persistent clock 에서 가져와 상황에 따라 다양한 system clocks 들을 초기화한다(RTC 는 별도의 coin battery 를 통해서 전원을 공급받는다. 그래서, system 이 powered-off 되더라도, RTC 는 영향을 받지 않는다). timekeeping_init() 함수는 다음과 같다.

// kernel/time/timekeeping.c - v3.14
/*
 * timekeeping_init - Initializes the clocksource and common timekeeping values
 */
void __init timekeeping_init(void)
{
	struct timekeeper *tk = &timekeeper;
	struct clocksource *clock;
	unsigned long flags;
	struct timespec now, boot, tmp;

	read_persistent_clock(&now); // --- 1

	if (!timespec_valid_strict(&now)) { // --- 2
		pr_warn("WARNING: Persistent clock returned invalid value!\n"
			"         Check your CMOS/BIOS settings.\n");
		now.tv_sec = 0;
		now.tv_nsec = 0;
	} else if (now.tv_sec || now.tv_nsec)
		persistent_clock_exist = true; // --- 3

	read_boot_clock(&boot); // --- 4
	if (!timespec_valid_strict(&boot)) {
		pr_warn("WARNING: Boot clock returned invalid value!\n"
			"         Check your CMOS/BIOS settings.\n");
		boot.tv_sec = 0;
		boot.tv_nsec = 0;
	}

	raw_spin_lock_irqsave(&timekeeper_lock, flags);
	write_seqcount_begin(&timekeeper_seq);
	ntp_init();

	clock = clocksource_default_clock(); // --- 5
	if (clock->enable)
		clock->enable(clock);
	tk_setup_internals(tk, clock); // --- 6

	tk_set_xtime(tk, &now); // --- 7
	tk->raw_time.tv_sec = 0; // --- 7
	tk->raw_time.tv_nsec = 0;
	if (boot.tv_sec == 0 && boot.tv_nsec == 0)
		boot = tk_xtime(tk); // --- 8

	set_normalized_timespec(&tmp, -boot.tv_sec, -boot.tv_nsec); // --- 9
	tk_set_wall_to_mono(tk, tmp); // --- 9

	tmp.tv_sec = 0;
	tmp.tv_nsec = 0;
	tk_set_sleep_time(tk, tmp);

	memcpy(&shadow_timekeeper, &timekeeper, sizeof(timekeeper));

	write_seqcount_end(&timekeeper_seq);
	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
}

(1) read_persistent_clock() 는 architecture-dependent 한 함수다. 즉, SoC 제조사에서 자체 구현해서 제공하는 함수라는 뜻이다. 리눅스 커널에서는 기본적으로 제공하는 read_persistent_clock() 함수는 다음과 같다. 만약, read_persistent_clock() 함수를 호출했는데, ts->tv_sec 와 ts->tv_nsec 가 모두 0 라면, `battery backed persistent clock(Ex. RTC)` 이 지원되지 않는다는 것을 의미한다.

// kernel/time/timekeeping.c - v3.14
/**
 * read_persistent_clock -  Return time from the persistent clock.
 *
 * Weak dummy function for arches that do not yet support it.
 * Reads the time from the battery backed persistent clock.
 * Returns a timespec with tv_sec=0 and tv_nsec=0 if unsupported.
 *
 *  XXX - Do be sure to remove it once all arches implement it.
 */
void __attribute__((weak)) read_persistent_clock(struct timespec *ts)
{
	ts->tv_sec = 0;
	ts->tv_nsec = 0;
}

" read_persistent_clock() 함수는 CPU 제조사쪽 보다는 SoC 제조사쪽으로 구현을 요구하고 있다. ARMv7 같은 경우도 함수만 제공하고 실제 구현은 SoC 제조사에게 위임하는 방식을 취하고 있다. ARMv7 은 default 로 dummy_clock_access() 함수를 통해서 persistent clock 을 받게 되는데, 내부 코드를 보면 단수히 특정 값들을 초기화한 하는 것을 볼 수 있다. dummy_clock_access() 함수를 persistent clock 으로 사용하는 경우는 SoC 제조사에서 persistent clock 을 제공하지 않는 경우다. 만약, SoC 제조사에게 ARMv7 에게 persistent clock 을 제공하려면, register_persistent_clock() 함수를 사용하면 된다.

// arch/arm/kernel/time.c - v3.14
static void dummy_clock_access(struct timespec *ts)
{
	ts->tv_sec = 0;
	ts->tv_nsec = 0;
}

static clock_access_fn __read_persistent_clock = dummy_clock_access;

void read_persistent_clock(struct timespec *ts)
{
	__read_persistent_clock(ts);
}

" register_persistent_clock() 함수는 SoC 제조사에서 아직 persistent clock 이나 boot clock 을 제공하지 않았을 경우에만, default persistent clock(dummy_clock_access) 을 바꿀 수 있게 해준다. 그렇다면, v3.17 에서 register_persistent_clock() 함수를 호출하는 SoC 제조사는 어디가 있을까?

// arch/arm/kernel/time.c - v3.17
int __init register_persistent_clock(clock_access_fn read_boot,
				     clock_access_fn read_persistent)
{
	/* Only allow the clockaccess functions to be registered once */
	if (__read_persistent_clock == dummy_clock_access &&
	    __read_boot_clock == dummy_clock_access) {
		if (read_boot)
			__read_boot_clock = read_boot;
		if (read_persistent)
			__read_persistent_clock = read_persistent;

		return 0;
	}

	return -EINVAL;
}

" NVIDIA 의 tegra20_timer.c 파일 주석에 보면, system suspend(PM) 에서도 disabled 되면 안된다고 명시되어 있다.

static void __init tegra20_init_rtc(struct device_node *np)
{
	....
	register_persistent_clock(NULL, tegra_read_persistent_clock);
}

/*
 * tegra_read_persistent_clock -  Return time from a persistent clock.
 *
 * Reads the time from a source which isn't disabled during PM, the
 * 32k sync timer.  Convert the cycles elapsed since last read into
 * nsecs and adds to a monotonically increasing timespec.
 * Care must be taken that this funciton is not called while the
 * tegra_rtc driver could be executing to avoid race conditions
 * on the RTC shadow register
 */
static void tegra_read_persistent_clock(struct timespec *ts)

(2) timespec_valid_strict() 함수는 인자로 들어온 timespec 이 valid 한지를 판단한다. 근데, RTC 에서 읽어온 timespec 이 valid 하다는 것을 어떻게 판단할까? 3 가지를 판단한다. 첫 번째로, timespec_valid() 함수에서 struct timespec->tv_sec(seconds, 초) 가 0 보다 작으면, 안된다. 그리고, struct timespec->tv_nsec(nano-seconds, 나노초) 가 NSEC_PER_SEC(10^9) 를 넘어서도 안된다(즉, nano-seconds 가 1 초를 넘어가면, 초 단위로(ts->tv_sec) 로 traslate 되어야 함을 의미).

// include/linux/time.h - v3.17
static inline bool timespec_valid_strict(const struct timespec *ts)
{
	if (!timespec_valid(ts))
		return false;
	/* Disallow values that could overflow ktime_t */
	if ((unsigned long long)ts->tv_sec >= KTIME_SEC_MAX)
		return false;
	return true;
}

" timespec_valid_strict() 함수에서는 struct timespec->tv_sec(seconds, 초) 가 KTIME_SEC_MAX 를 넘어서는 안된다. 이렇게 3 개를 모두 통과하면, time 이 valid 하다고 판단한다.

// include/linux/time.h - v3.17
/*
 * Returns true if the timespec is norm, false if denorm:
 */
static inline bool timespec_valid(const struct timespec *ts)
{
	/* Dates before 1970 are bogus */
	if (ts->tv_sec < 0)
		return false;
	/* Can't have more nanoseconds then a second */
	if ((unsigned long)ts->tv_nsec >= NSEC_PER_SEC)
		return false;
	return true;
}

" KTIME_SEC_MAX 는 struct ktime_t 로 표현될 수 있는 maximum seconds 를 의미한다. 아래에서 볼수있다시피, ktime_t 의 사이즈는 64-bit 다. 그런데, 32-bit CPU 와 64-bit CPU 에서는 primitive data type 이 서로 다르기 때문에, ktime_t 의 내부 구조가조금 달라진다. 64-bit 같은 경우는 signed long long 변수를 통해서 표현되고, 32-bit 같은 경우는 64-bit 를 표현하기 위해 2 개의 signed int(seconds 와 nano-seconds) 를 사용된다.

// include/linux/time.h - v3.17
#define KTIME_MAX			((s64)~((u64)1 << 63))
#if (BITS_PER_LONG == 64)
# define KTIME_SEC_MAX			(KTIME_MAX / NSEC_PER_SEC)
#else
# define KTIME_SEC_MAX			LONG_MAX
#endif

// include/linux/ktime.h - v3.14
/*
 * ktime_t:
 *
 * On 64-bit CPUs a single 64-bit variable is used to store the hrtimers
 * internal representation of time values in scalar nanoseconds. The
 * design plays out best on 64-bit CPUs, where most conversions are
 * NOPs and most arithmetic ktime_t operations are plain arithmetic
 * operations.
 *
 * On 32-bit CPUs an optimized representation of the timespec structure
 * is used to avoid expensive conversions from and to timespecs. The
 * endian-aware order of the tv struct members is chosen to allow
 * mathematical operations on the tv64 member of the union too, which
 * for certain operations produces better code.
 *
 * For architectures with efficient support for 64/32-bit conversions the
 * plain scalar nanosecond based representation can be selected by the
 * config switch CONFIG_KTIME_SCALAR.
 */
union ktime {
	s64	tv64;
#if BITS_PER_LONG != 64 && !defined(CONFIG_KTIME_SCALAR)
	struct {
# ifdef __BIG_ENDIAN
	s32	sec, nsec;
# else
	s32	nsec, sec;
# endif
	} tv;
#endif
};

typedef union ktime ktime_t;		/* Kill this */

" 만약, 획득한 RTC time 이 invalid 하다면, 얻어온 값을 다시 0 으로 초기화한다. real time 이 0 으로 초기화 되었다는 것은 linux epoch 로 초기화된다는 것과 같은 의미를 갖는다.

(3) persistent_clock_exist 변수가 true 로 설정되었다는 것은 RTC device 가 존재한다는 것을 의미하고, timekeeping module 이 RTC 로 부터 persistent clock 을 받아올 수 있음을 의미한다(read_persistent_clock() 함수를 호출하는 시점에 SoC 제조사에서 persistent clock module 을 커널에 등록했으면, now->tv_sec or now->tv_nsec 는 0 이 될 수 없다). 예를 들어, system suspend 시에 persistent_clock_exist 변수가 true 로 설정되어 있으면, timekeeping module 이 resume process 에서 RTC 값을 통해서 persistent clock 을 복구해야하기 때문에, RTC driver 는 sleep 에 들어갈 수 없음을 나타낸다.

(4) read_boot_clock() 함수는 read_persistent_clock() 함수와 구조는 동일하나, 대신 기능이 다르다. read_boot_clock() 함수는 시스템이 언제 started 되었는지를 반환한다. v3.14 에서는 SoC 제조사 대부분은 이 함수를 구현하지 않는다. 그러므로, 크게 신경쓸 필요는 없을 듯 하다.

/**
 * read_boot_clock -  Return time of the system start.
 *
 * Weak dummy function for arches that do not yet support it.
 * Function to read the exact time the system has been started.
 * Returns a timespec with tv_sec=0 and tv_nsec=0 if unsupported.
 *
 *  XXX - Do be sure to remove it once all arches implement it.
 */
void __attribute__((weak)) read_boot_clock(struct timespec *ts)
{
	ts->tv_sec = 0;
	ts->tv_nsec = 0;
}

(5) 하나의 시스템에는 여러 개의 clock source(clock provider) 가 있을 수 있다. 그런데, timekeeping module 이 초기화 되는 시점에 best clock source 를 사용할 수 있다는 보장이 없다. 왜냐면, 이 시점에 best clock source 가 초기화 되지 않았을 가능성이 있기 때문이다. 그렇기 때문에, 이 시점에는 kernel 에서 제공하는 default clock source 를 사용한다. 아래에서 볼 수 있다시피, jiffies 를 기반으로 동작하며, `->read()` 시에 jiffies 를 반환한다. 그리고, `__weak` 심볼이 사용되고 있기 때문에, SoC 제조사는 clocksource_default_clock() 함수를 언제든 overwrite 할 수 있다.

// kernel/time/jiffies.c - v3.17
static cycle_t jiffies_read(struct clocksource *cs)
{
	return (cycle_t) jiffies;
}

static struct clocksource clocksource_jiffies = {
	.name		= "jiffies",
	.rating		= 1, /* lowest valid rating*/
	.read		= jiffies_read,
	.mask		= 0xffffffff, /*32bits*/
	.mult		= NSEC_PER_JIFFY << JIFFIES_SHIFT, /* details above */
	.shift		= JIFFIES_SHIFT,
};

struct clocksource * __init __weak clocksource_default_clock(void)
{
	return &clocksource_jiffies;
}

(6) 아직 전역 변수 `static struct timekeeper timekeeper` 를 초기화하지 않았다. 사실, timekeeping_init() 함수의 주목적은 전역 변수 timekeeper 를 초기화하는 것이다. 이 변수를 초기화하기 위해서는 먼저 kernel 에서 사용할 clock source 를 초기화가 되어야 한다. 왜냐면, struct timekeeper 의 필드들이 대부분 clock source 에 의존하기 때문이다. 그런데, defaul_clock_source() 함수에서 초기화가 완료된 clock source 를 받았다. 그러므로, 이제 struct timekeeper 를 초기화할 수 있다. 즉, clock source 기반으로 timekeeper 를 초기화하는 함수가 tk_setup_internals() 함수다. tk_setup_internals() 함수는 old clock source 는 disable 하고, 인자로 전달된 new clock source 를 timekeeper 에 설정한다.

// kernel/time/timekeeping.c - v3.14
/**
 * tk_setup_internals - Set up internals to use clocksource clock.
 *
 * @tk:		The target timekeeper to setup.
 * @clock:		Pointer to clocksource.
 *
 * Calculates a fixed cycle/nsec interval for a given clocksource/adjustment
 * pair and interval request.
 *
 * Unless you're the timekeeping code, you should not be using this!
 */
static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock)

(7) RTC 를 통해서 얻은 persistent clock(ts) 를 통해서 timekeeper 의 real time clock(tk->xtime_*) 을 초기화한다. 그리고, monotonic raw clock 은 default 로 0 으로 설정된다.

// kernel/time/timekeeping.c - v3.14
static void tk_set_xtime(struct timekeeper *tk, const struct timespec *ts)
{
	tk->xtime_sec = ts->tv_sec;
	tk->xtime_nsec = (u64)ts->tv_nsec << tk->shift;
}

(8) SoC 제조사에서 read_boot_clock() 함수를 overwrite 하지 않았을 경우, boot.tv_sec 와 boot.tv_nsec 는 0 이 된다. 이 때는, boot time 을 wall time 과 동일하게 설정한다.

// include/linux/timekeeper_internal.h - v3.14
static inline struct timespec tk_xtime(struct timekeeper *tk)
{
	struct timespec ts;

	ts.tv_sec = tk->xtime_sec;
	ts.tv_nsec = (long)(tk->xtime_nsec >> tk->shift);
	return ts;
}

(9) set_normalized_timespec() 함수는 인자로 받은 sec 와 nsec 를 ts(struct timesepc) 로 convert 한다. 좀 더 구체적으로 설명하면, nsec 를 sec 로 translate 해서 ts 에 저장한다. 예를 들어, 3,120,117,035 나노초(nsec) 를 `3 초(sec) + 120,117,035(nsec) 마이크로초` 로 변환해서 ts 에 저장한다. 이 과정을 하는 이유는 뒤에 호출될 tk_set_wall_to_monotonic() 함수 때문이다.

// kernel/time.c - v3.14
/**
 * set_normalized_timespec - set timespec sec and nsec parts and normalize
 *
 * @ts:		pointer to timespec variable to be set
 * @sec:	seconds to set
 * @nsec:	nanoseconds to set
 *
 * Set seconds and nanoseconds field of a timespec variable and
 * normalize to the timespec storage format
 *
 * Note: The tv_nsec part is always in the range of
 *	0 <= tv_nsec < NSEC_PER_SEC
 * For negative values only the tv_sec field is negative !
 */
void set_normalized_timespec(struct timespec *ts, time_t sec, s64 nsec)
{
	while (nsec >= NSEC_PER_SEC) {
		/*
		 * The following asm() prevents the compiler from
		 * optimising this loop into a modulo operation. See
		 * also __iter_div_u64_rem() in include/linux/time.h
		 */
		asm("" : "+rm"(nsec));
		nsec -= NSEC_PER_SEC;
		++sec;
	}
	while (nsec < 0) {
		asm("" : "+rm"(nsec));
		nsec += NSEC_PER_SEC;
		--sec;
	}
	ts->tv_sec = sec;
	ts->tv_nsec = nsec;
}

" monotonic clock 은 초기 부팅시에는 값이 0 이다. 즉, 0(monotonic time) = xtime + wall_to_monotonic 이 되기 위해서는 wall_to_monotonic 이 음수가 되야한다(함수에 전달되는 wtm 에는 초기 monotonic time 을 0 으로 만들기 위해 음수값들이 저장되어 있음). 결국, system 이 동작중일 때, `real time clock + wall_to_monotonic` 은 CLOCK_MONOTONIC 의미하고, `real time clock + wall_to_monotonic + sleep time` 은 CLOCK_BOOTTIME 을 의미한다.

static void tk_set_wall_to_mono(struct timekeeper *tk, struct timespec wtm)
{
	....
	tk->wall_to_monotonic = wtm;
	....
}

- Three components of timer subsystem in linux kernel [참고1]

" jiffy 는 리눅스에서 사용하는 시간의 단위다. jiffy 를 이해하려면, 먼저 `HZ` 정의를 알아야 한다. HZ 는 `1 초에 jiffies 가 증가하는 횟수` 를 의미한다. 예를 들어, HZ 가 1000 이라면, jiffies 는 1 초에 1000 번이 증가한다. 2 초가 되면, 2000 이 된다. 이 때, 1 번 증가하는 하는 단위를 `tick` 이라고 부른다. 즉, jiffy 가 3 이라면, tick 이 3 번 발생했다는 것을 의미한다. 여기서 tick 이 바로 `타이머 인터럽트` 다. 즉, 타이머 디바이스에서 CPU 로 타이머 인터럽트(tick) 을 발생시 킬 때 마다, 리눅스는 jiffies 를 증가시킨다. 부팅 이후에, 몇 초가 흘렀는지 어떻게 알 수 있을까? HZ 는 1 초에 jiffies 가 증가하는 횟수를 나타낸다고 했다. 그러므로, `jiffies / HZ` 하면, 부팅 이후부터 현재까지 몇 초가 흘렀는지를 알 수 있다. 그렇다면, HZ 는 어떻게 설정해야 할까? 이건 use case 에 따라 달라진다. HZ 가 작으면, 1 초에 발생하는 타이머 인터럽트가 줄어든다. 그렇면, 한 프로세스에서 할당되는 time slice 가 증가하기 때문에, 오래 걸릴 일을 하는 프로세스들에게는 좋을 것이다. 그러나, 실시간 반응성이 떨어져서 사람들이 느끼기에 컴퓨터가 느려졌다고 생각할 수 있다. 만약, HZ 가 높으면, 1 초에 타이머 인터럽트가 많이 발생한다. 이럴 경우, 프로세스가 할당받은 time slice 가 작기 때문에, 여러 프로세스가 마치 동시다발적으로 실행되는 것과 같은 환경을 제공한다. 이럴 경우, 사용자들은 마치 컴퓨터가 빨라진 듯 한 느낌을 받는다. 그러나, 그래픽 및 과학 연산과 같이 시간이 오래 걸리는 일들은 더 느리게 처리된다.

- Jiffies

" 리눅스 커널 타이머를 공부할 때, 가장 먼저 볼 부분은 당연 `jiffies` 다. jiffies 는 리눅스 초기 버전부터 사용되던 친구여서 32비트를 기반으로 한다. 그래서 unsigned long 으로 선언되어 있는 것을 확인할 수 있다. unsigned long 은 32비트 환경에서는 32비트로 선언되고, 64비트 환경에서는 64비트로 사용된다. 문제는 32비트에서 사용될 때, HZ 가 100 ~ 1000 사이에 값이라고 가정할 때, 49.7 일 ~ 497 일 사이밖에 사용하지 못하게 된다. 왜냐면, 10ms ~ 1ms 사이로 jiffies 값이 1씩 증가하다보니 언젠가는 오버플로우를 발생시킬 것이고, 이건 시스템에 문제를 일으키게 된다. 여기서 머리 좋은 사람들이 방법을 고안해냈다.

// include/linux/jiffies.h - v6.6-rc5
/*
 * The 64-bit value is not atomic on 32-bit systems - you MUST NOT read it
 * without sampling the sequence number in jiffies_lock.
 * get_jiffies_64() will do this for you as appropriate.
 *
 * jiffies and jiffies_64 are at the same address for little-endian systems
 * and for 64-bit big-endian systems.
 * On 32-bit big-endian systems, jiffies is the lower 32 bits of jiffies_64
 * (i.e., at address @jiffies_64 + 4).
 * See arch/ARCH/kernel/vmlinux.lds.S
 */
extern u64 __cacheline_aligned_in_smp jiffies_64;
extern unsigned long volatile __cacheline_aligned_in_smp __jiffy_arch_data jiffies;

// arch/arm64/kernel/vmlinux.lds.S - v6.6-rc5
....
OUTPUT_ARCH(aarch64)
ENTRY(_text)

jiffies = jiffies_64;
....

" 바로 jiffies와 jiffies64를 동일 주소로 만드는 것이다. 변수를 따로 선언하는데, 어떻게 주소를 같게할까? C 파일로는 안된다. 컴파일러에게 요청해야 할까? jiffies 하나 때문에, 컴파일러를 수정한다는건 말이 안된다. 컴파일러 확장자를 추가할까? 이것 역시 jiffies 하나 때문에, 추가한다는 건 말이안된다. 간단하게 알 수 있는 방법은 링크스크립트를 이용하면 된다. 먼저, 링크스크립트에서 C 언어 변수를 사용하려면, 반드시 extern 키워드를 사용해야 한다. 그리고, 제일 중요한 건 링크스크립트에서는 변수의 값을 다루지 않는다. 주소만 다룬다. 모든 변수의 대입은 주소가 들어가고, 심지어 받는 놈도 주소가 바뀌게 된다. 그래서, `jiffies = jiffies_64`를 하면, jiffies_64의 시작 주소가 jiffies의 시작 주소와 같아지게 된다. 아래와 같이 말이다(참고로, vmlinux.lds.S 파일은 링커스크립트를 의미한다). 이렇게 함으로써, jiffies 를 증가시키는 것은 jiffies_64를 증가시키는 것과 같은 의미다. 반대도 성립한다. 그리고, 32비트 환경에서도 64비트와 같이 사용할 수 있게 된다.

http://books.gigatux.nl/mirror/kerneldevelopment/0672327201/ch10lev1sec3.html

" 그러나, 값을 읽을 때 주의가 필요하다. 64 비트 환경에서 jiffies 는 32 비트로 빌드되기 때문에 32 비트 단위로 짤라서 읽게 된다. 그래서, jiffies 값이 300 억일 경우, 아래 32 비트만 짤려서 읽히게 된다. 그러므로, 항상 값을 읽을 때는 jiffies_64 를 읽도록 하자.

저작자표시 비영리 변경금지 (새창열림)

'Linux > kernel' 카테고리의 다른 글

[리눅스 커널] Synchronization - sequential locks (0)	2023.10.11
[리눅스 커널] LDM - uevent (0)	2023.10.03
[리눅스 커널] Timer - linux clock (0)	2023.09.30
[리눅스 커널] Timer - high resolution timer (0)	2023.09.30
[리눅스 커널] Timer - tick layer (0)	2023.09.28

ABOUT ME

Ease is the greatest threat Ease is the greatest threat

글의 참고

글의 전제

글의 내용

- Overview

- Data structure

- Timekeeping initialization

- Three components of timer subsystem in linux kernel [참고1]

- Jiffies

'Linux > kernel' 카테고리의 다른 글

티스토리툴바

ABOUT ME

글의 참고

글의 전제

글의 내용

- Overview

- Data structure

- Timekeeping initialization

- Three components of timer subsystem in linux kernel [참고1]

- Jiffies

'Linux > kernel' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바