Can BERT do the next-word-predict task?Word labeling with TensorflowWhat is the difference between word-based...
Can I write a book of my D&D game?
Why do stocks necessarily drop during a recession?
How to count the characters of jar files by wc
How to say "Brexit" in Latin?
Normalization for two bulk RNA-Seq samples to enable reliable fold-change estimation between genes
How to solve a large system of linear algebra?
A starship is travelling at 0.9c and collides with a small rock. Will it leave a clean hole through, or will more happen?
Why zero tolerance on nudity in space?
Pronunciation of umlaut vowels in the history of German
Am I a Rude Number?
What is the wife of a henpecked husband called?
Dilemma of explaining to interviewer that he is the reason for declining second interview
Difference between `vector<int> v;` and `vector<int> v = vector<int>();`
Strange Sign on Lab Door
Finding a mistake using Mayer-Vietoris
Why would space fleets be aligned?
Why did other German political parties disband so fast when Hitler was appointed chancellor?
Why is working on the same position for more than 15 years not a red flag?
What is the lore based reason that the Spectator has the Create Food and Water trait, instead of simply not requiring food and water?
How much mayhem could I cause as a sentient fish?
How to prevent cleaner from hanging my lock screen in Ubuntu 16.04
How do Chazal know that the descendants of a Mamzer may never marry into the general populace?
what does しにみえてる mean?
What's a good word to describe a public place that looks like it wouldn't be rough?
Can BERT do the next-word-predict task?
Word labeling with TensorflowWhat is the difference between word-based and char-based text generation RNNs?What's the proper Word2vec model to get pre-trained word embedding for a classification task?Is it possible to have variable window size for Continuous Bag of Words method of training word embeddings?Can LSTM have a confidence score for each word predicted?Words as a input to networkis Glove better for word similarity Skip-gram/CBOW?Genarate one hour time interval array using pandas in python (import from csv) to predict next valueWhy ELMo's word embedding can represent the word better than glove?What is the reason for the speedup of transformer-xl?
$begingroup$
As it is bidirectional.
How can we edit BERT to do the next-word-predict task?
neural-network deep-learning attention-mechanism transformer bert
$endgroup$
add a comment |
$begingroup$
As it is bidirectional.
How can we edit BERT to do the next-word-predict task?
neural-network deep-learning attention-mechanism transformer bert
$endgroup$
$begingroup$
Have you seen the original publication? It seems to be addressing prediction at the sentence level, as explained in its section 3.3.2.
$endgroup$
– mapto
20 hours ago
add a comment |
$begingroup$
As it is bidirectional.
How can we edit BERT to do the next-word-predict task?
neural-network deep-learning attention-mechanism transformer bert
$endgroup$
As it is bidirectional.
How can we edit BERT to do the next-word-predict task?
neural-network deep-learning attention-mechanism transformer bert
neural-network deep-learning attention-mechanism transformer bert
asked 20 hours ago
jetjet
1706
1706
$begingroup$
Have you seen the original publication? It seems to be addressing prediction at the sentence level, as explained in its section 3.3.2.
$endgroup$
– mapto
20 hours ago
add a comment |
$begingroup$
Have you seen the original publication? It seems to be addressing prediction at the sentence level, as explained in its section 3.3.2.
$endgroup$
– mapto
20 hours ago
$begingroup$
Have you seen the original publication? It seems to be addressing prediction at the sentence level, as explained in its section 3.3.2.
$endgroup$
– mapto
20 hours ago
$begingroup$
Have you seen the original publication? It seems to be addressing prediction at the sentence level, as explained in its section 3.3.2.
$endgroup$
– mapto
20 hours ago
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
BERT can't be used for next word prediction, at least not with the current state of the research on masked language modeling.
BERT is trained on a masked language modeling task and therefore you cannot "predict the next word". You can only mask a word and ask BERT to predict it given the rest of the sentence (both to the left and to the right of the masked word).
This way, with BERT you can't sample text like if it were a normal autoregressive language model. However, BERT can be seen as a Markov Random Field Language Model and be used for text generation as such. See article BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model for details. The authors released source code and a Google Colab notebook.
$endgroup$
$begingroup$
Thank you very much.
$endgroup$
– jet
3 hours ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46377%2fcan-bert-do-the-next-word-predict-task%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
BERT can't be used for next word prediction, at least not with the current state of the research on masked language modeling.
BERT is trained on a masked language modeling task and therefore you cannot "predict the next word". You can only mask a word and ask BERT to predict it given the rest of the sentence (both to the left and to the right of the masked word).
This way, with BERT you can't sample text like if it were a normal autoregressive language model. However, BERT can be seen as a Markov Random Field Language Model and be used for text generation as such. See article BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model for details. The authors released source code and a Google Colab notebook.
$endgroup$
$begingroup$
Thank you very much.
$endgroup$
– jet
3 hours ago
add a comment |
$begingroup$
BERT can't be used for next word prediction, at least not with the current state of the research on masked language modeling.
BERT is trained on a masked language modeling task and therefore you cannot "predict the next word". You can only mask a word and ask BERT to predict it given the rest of the sentence (both to the left and to the right of the masked word).
This way, with BERT you can't sample text like if it were a normal autoregressive language model. However, BERT can be seen as a Markov Random Field Language Model and be used for text generation as such. See article BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model for details. The authors released source code and a Google Colab notebook.
$endgroup$
$begingroup$
Thank you very much.
$endgroup$
– jet
3 hours ago
add a comment |
$begingroup$
BERT can't be used for next word prediction, at least not with the current state of the research on masked language modeling.
BERT is trained on a masked language modeling task and therefore you cannot "predict the next word". You can only mask a word and ask BERT to predict it given the rest of the sentence (both to the left and to the right of the masked word).
This way, with BERT you can't sample text like if it were a normal autoregressive language model. However, BERT can be seen as a Markov Random Field Language Model and be used for text generation as such. See article BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model for details. The authors released source code and a Google Colab notebook.
$endgroup$
BERT can't be used for next word prediction, at least not with the current state of the research on masked language modeling.
BERT is trained on a masked language modeling task and therefore you cannot "predict the next word". You can only mask a word and ask BERT to predict it given the rest of the sentence (both to the left and to the right of the masked word).
This way, with BERT you can't sample text like if it were a normal autoregressive language model. However, BERT can be seen as a Markov Random Field Language Model and be used for text generation as such. See article BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model for details. The authors released source code and a Google Colab notebook.
edited 8 hours ago
answered 19 hours ago
ncasasncasas
3,6231130
3,6231130
$begingroup$
Thank you very much.
$endgroup$
– jet
3 hours ago
add a comment |
$begingroup$
Thank you very much.
$endgroup$
– jet
3 hours ago
$begingroup$
Thank you very much.
$endgroup$
– jet
3 hours ago
$begingroup$
Thank you very much.
$endgroup$
– jet
3 hours ago
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46377%2fcan-bert-do-the-next-word-predict-task%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Have you seen the original publication? It seems to be addressing prediction at the sentence level, as explained in its section 3.3.2.
$endgroup$
– mapto
20 hours ago