Submissions from github.com/fminference

		FlexGen: Running large language models on a single GPU (github.com/fminference)
		192 points by behnamoh on March 26, 2023 \| past \| 43 comments