학습률 스케줄링 (Learning Rate Scheduling)

Notice

Recent Posts

Recent Comments

Link

« 2026/06 »
일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Tags more

Archives

Today

Total

관리 메뉴

심드렁하게 저장

학습률 스케줄링 (Learning Rate Scheduling) 본문

Artificial intelligence/Deep Learning

학습률 스케줄링 (Learning Rate Scheduling)

Ggoosae 2025. 3. 16. 23:46

1.1 학습률 스케쥴링 개요

학습률 스케쥴링(Learning Rate Scheduling)은 훈련 중 학습률을 동적으로 조정하여 최적화 성능을 향상시키는 기법이다.

너무 높은 학습률은 발산할 위험이 있고, 너무 낮은 학습률은 수렴속도가 느려지는 문제가 발생할 수 있다. 따라서, 적절한 학븟률 감소 전략을 사용하면 최적화 성능을 크게 향상할 수 있다. 학습률 스케쥴링의 목표는 다음과 같다:

초반에는 빠른 학습(큰 학습률) -> 후반에는 미세한 조정(작은 학습률)
발산 방지 및 최적점 근처에서의 미세 조정
과적합 방지 및 일반화 성능 향상

1.2 Step Decay (단계적 감소)

Step Decay는 일정한 epoch 간격마다 학습률을 감소시키는 방식으로 학습률을 감소하는 decay factor를 설정하여 적용한다.

$\alpha_{t}$ : 현재 학습률
$\alpha_{0}$: 초기 학습률
$t$: 현재 epoch
$k$: 일정 간격 (예: 10 epochs)
$\gamma$: 감소 비율 (예: 0.1)

import torch

#step decay
class StepDecay:
    def __init__(self,initial_lr=float, step_size=int, gamma=0.1):
        self.initial_lr = initial_lr
        self.step_size = step_size # epoch 간격
        self.gamma = gamma # 감소 비율

    def get_lr(self,epoch) -> float:
        return self.initial_lr * (self.gamma ** (epoch // self.step_size))

# 사용 예시
scheduler = StepDecay(initial_lr=0.1, step_size=10, gamma=0.1)
for epoch in range(30):
    lr = scheduler.get_lr(epoch)
    print(f"Epoch {epoch+1}, Learning Rate: {lr:.6f}")

Epoch 1, Learning Rate: 0.100000
Epoch 2, Learning Rate: 0.100000
Epoch 3, Learning Rate: 0.100000
Epoch 4, Learning Rate: 0.100000
Epoch 5, Learning Rate: 0.100000
Epoch 6, Learning Rate: 0.100000
Epoch 7, Learning Rate: 0.100000
Epoch 8, Learning Rate: 0.100000
Epoch 9, Learning Rate: 0.100000
Epoch 10, Learning Rate: 0.100000
Epoch 11, Learning Rate: 0.010000
Epoch 12, Learning Rate: 0.010000
Epoch 13, Learning Rate: 0.010000
Epoch 14, Learning Rate: 0.010000
Epoch 15, Learning Rate: 0.010000
Epoch 16, Learning Rate: 0.010000
Epoch 17, Learning Rate: 0.010000
Epoch 18, Learning Rate: 0.010000
Epoch 19, Learning Rate: 0.010000
Epoch 20, Learning Rate: 0.010000
Epoch 21, Learning Rate: 0.001000
Epoch 22, Learning Rate: 0.001000
Epoch 23, Learning Rate: 0.001000
Epoch 24, Learning Rate: 0.001000
Epoch 25, Learning Rate: 0.001000
Epoch 26, Learning Rate: 0.001000
Epoch 27, Learning Rate: 0.001000
Epoch 28, Learning Rate: 0.001000
Epoch 29, Learning Rate: 0.001000
Epoch 30, Learning Rate: 0.001000

Step Decay는 학습이 진행될수록 갑자기 학습률이 감소하는 방식
단점: 학습률 변화가 갑작스러워 최적화가 불안정할 수 있음

1.3 Exponential Decay (지수적 감소)

Exponential Decay는 학습률을 지수 함수 형태로 점진적으로 감소시키는 방식이다. Step Decay보다 부드럽게 학습률이 감소하며 SGD, Adam 등의 최적화 알고리즘과 함께 사용한다.

$\lambda$: 감소율, 일반적으로 작은값

import math

class ExponentialDecay:
    def __init__(self, initial_lr, decay_rate):
        self.initial_lr = initial_lr
        self.decay_rate = decay_rate  # 감소율 (0.01~0.1)

    def get_lr(self, epoch):
        return self.initial_lr * math.exp(-self.decay_rate * epoch)

# 사용 예시
scheduler = ExponentialDecay(initial_lr=0.1, decay_rate=0.05)
for epoch in range(30):
    lr = scheduler.get_lr(epoch)
    print(f"Epoch {epoch+1}, Learning Rate: {lr:.6f}")

Epoch 1, Learning Rate: 0.100000
Epoch 2, Learning Rate: 0.095123
Epoch 3, Learning Rate: 0.090484
Epoch 4, Learning Rate: 0.086071
Epoch 5, Learning Rate: 0.081873
Epoch 6, Learning Rate: 0.077880
Epoch 7, Learning Rate: 0.074082
Epoch 8, Learning Rate: 0.070469
Epoch 9, Learning Rate: 0.067032
Epoch 10, Learning Rate: 0.063763
Epoch 11, Learning Rate: 0.060653
Epoch 12, Learning Rate: 0.057695
Epoch 13, Learning Rate: 0.054881
Epoch 14, Learning Rate: 0.052205
Epoch 15, Learning Rate: 0.049659
Epoch 16, Learning Rate: 0.047237
Epoch 17, Learning Rate: 0.044933
Epoch 18, Learning Rate: 0.042741
Epoch 19, Learning Rate: 0.040657
Epoch 20, Learning Rate: 0.038674
Epoch 21, Learning Rate: 0.036788
Epoch 22, Learning Rate: 0.034994
Epoch 23, Learning Rate: 0.033287
Epoch 24, Learning Rate: 0.031664
Epoch 25, Learning Rate: 0.030119
Epoch 26, Learning Rate: 0.028650
Epoch 27, Learning Rate: 0.027253
Epoch 28, Learning Rate: 0.025924
Epoch 29, Learning Rate: 0.024660
Epoch 30, Learning Rate: 0.023457

점진적으로 Learning Rate가 감소함
지수적 감소의 단점은 학습 후반부에 너무 작은 학습률이 될 수도 있다.

1.4 Cosine Annealing (코사인 감소)

코사인 함수를 기반으로 학습률을 점진적으로 감소시키며 학습이 끝날수록 점점 더 천천히 감소하여 미세조정이 가능하다. Warm Restart와 함께 사용이 가능하다.

$\alpha_{max}$ : 초기학습률
$\alpha_{min}$ : 최소학습률
$T$ : 전체 스케줄 주기 (예: 50 epochs)

class CosineAnnealing:
    def __init__(self, initial_lr, min_lr, total_epochs):
        self.initial_lr = initial_lr
        self.min_lr = min_lr
        self.total_epochs = total_epochs

    def get_lr(self, epoch):
        return self.min_lr + 0.5 * (self.initial_lr - self.min_lr) * (1 + math.cos(math.pi * epoch / self.total_epochs))

# 사용 예시
scheduler = CosineAnnealing(initial_lr=0.1, min_lr=0.001, total_epochs=30)
for epoch in range(30):
    lr = scheduler.get_lr(epoch)
    print(f"Epoch {epoch+1}, Learning Rate: {lr:.6f}")

Epoch 1, Learning Rate: 0.100000
Epoch 2, Learning Rate: 0.099729
Epoch 3, Learning Rate: 0.098918
Epoch 4, Learning Rate: 0.097577
Epoch 5, Learning Rate: 0.095721
Epoch 6, Learning Rate: 0.093368
Epoch 7, Learning Rate: 0.090546
Epoch 8, Learning Rate: 0.087286
Epoch 9, Learning Rate: 0.083622
Epoch 10, Learning Rate: 0.079595
Epoch 11, Learning Rate: 0.075250
Epoch 12, Learning Rate: 0.070633
Epoch 13, Learning Rate: 0.065796
Epoch 14, Learning Rate: 0.060792
Epoch 15, Learning Rate: 0.055674
Epoch 16, Learning Rate: 0.050500
Epoch 17, Learning Rate: 0.045326
Epoch 18, Learning Rate: 0.040208
Epoch 19, Learning Rate: 0.035204
Epoch 20, Learning Rate: 0.030367
Epoch 21, Learning Rate: 0.025750
Epoch 22, Learning Rate: 0.021405
Epoch 23, Learning Rate: 0.017378
Epoch 24, Learning Rate: 0.013714
Epoch 25, Learning Rate: 0.010454
Epoch 26, Learning Rate: 0.007632
Epoch 27, Learning Rate: 0.005279
Epoch 28, Learning Rate: 0.003423
Epoch 29, Learning Rate: 0.002082
Epoch 30, Learning Rate: 0.001271

Cosine Annealing은 초반에는 빠르게 감소, 후반에는 천천히 감소하는 특징
딥러닝에서 Warm Restart와 함께 사용하면 효과적

'Artificial intelligence > Deep Learning' 카테고리의 다른 글

Transformer - Multi Head Attention (0)	2025.04.14
가중치 초기화 기법 (Weight Initialization Techniques) (0)	2025.03.19
정규화 - Instance Normalization & AdaIN(Adaptive Instance Normalization) (0)	2025.03.12
정규화 - Layer Normalization (레이어 정규화) (0)	2025.03.12
정규화 - Batch Normalization (배치 정규화) (0)	2025.03.11

'Artificial intelligence/Deep Learning' Related Articles

심드렁하게 저장

학습률 스케줄링 (Learning Rate Scheduling) 본문

학습률 스케줄링 (Learning Rate Scheduling)

1.1 학습률 스케쥴링 개요

1.2 Step Decay (단계적 감소)

1.3 Exponential Decay (지수적 감소)

1.4 Cosine Annealing (코사인 감소)

'Artificial intelligence > Deep Learning' 카테고리의 다른 글

티스토리툴바