Large Language Models

Posted on December 8, 2023   1 minute read ∼ Filed in  : 

LLM

Transformer-based neural network as language model.

Fine-tune is not additive, may break existing knowledge learned.

Promp engineering: few-shot prompting.

RAG pipeline

image-20231207193742319

LLaMA

image-20231207195423513

Difference between LLaMA with Transformer.

  • internal covariate shift make the training slower, thus, we need layer normization to avoid it.

    • Layer norm works since it devided the variance

    • Computing mean is costly, thus Root Mean Square Layer Norm (RMSNorm) avoid that.

  • use relative position representation

    • add a distance between each two tokens

image-20231207202854909





END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.