This repo releasing the code and benchmark datasets for paper "Token Alignment via Character Matching for Subword Completion" in ACL Findings 2024. In our paper, we noticed LLMs usually generate ...
Thanks to AWQ, TinyChat can deliver more efficient responses with LLM/VLM chatbots through 4-bit inference. TinyChat on RTX 4090 (3.4x faster than FP16): TinyChat on Jetson Orin (3.2x faster than FP16 ...