Optionalfrequency_OptionalloraName of the LoRA (Low-Rank Adaptation) model to fine-tune the base model.
Optionalmax_The maximum number of tokens to generate in the response.
Optionalpresence_Increases the likelihood of the model introducing new topics.
The input text prompt for the model to generate a response.
OptionalrawIf true, a chat template is not applied and you must adhere to the specific model's expected formatting.
Optionalrepetition_Penalty for repeated tokens; higher values discourage repetition.
Optionalresponse_OptionalseedRandom seed for reproducibility of the generation.
OptionalstreamIf true, the response will be streamed back incrementally using SSE, Server Sent Events.
OptionaltemperatureControls the randomness of the output; higher values produce more random results.
Optionaltop_Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.
Optionaltop_Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.
Decreases the likelihood of the model repeating the same lines verbatim.