Can BERT do the next-word-predict task?Word labeling with TensorflowWhat is the difference between word-based...

Can I write a book of my D&D game?

Why do stocks necessarily drop during a recession?

How to count the characters of jar files by wc

How to say "Brexit" in Latin?

Normalization for two bulk RNA-Seq samples to enable reliable fold-change estimation between genes

How to solve a large system of linear algebra?

A starship is travelling at 0.9c and collides with a small rock. Will it leave a clean hole through, or will more happen?

Why zero tolerance on nudity in space?

Pronunciation of umlaut vowels in the history of German

Am I a Rude Number?

What is the wife of a henpecked husband called?

Dilemma of explaining to interviewer that he is the reason for declining second interview

Difference between `vector<int> v;` and `vector<int> v = vector<int>();`

Strange Sign on Lab Door

Finding a mistake using Mayer-Vietoris

Why would space fleets be aligned?

Why did other German political parties disband so fast when Hitler was appointed chancellor?

Why is working on the same position for more than 15 years not a red flag?

What is the lore based reason that the Spectator has the Create Food and Water trait, instead of simply not requiring food and water?

How much mayhem could I cause as a sentient fish?

How to prevent cleaner from hanging my lock screen in Ubuntu 16.04

How do Chazal know that the descendants of a Mamzer may never marry into the general populace?

what does しにみえてる mean?

What's a good word to describe a public place that looks like it wouldn't be rough?



Can BERT do the next-word-predict task?


Word labeling with TensorflowWhat is the difference between word-based and char-based text generation RNNs?What's the proper Word2vec model to get pre-trained word embedding for a classification task?Is it possible to have variable window size for Continuous Bag of Words method of training word embeddings?Can LSTM have a confidence score for each word predicted?Words as a input to networkis Glove better for word similarity Skip-gram/CBOW?Genarate one hour time interval array using pandas in python (import from csv) to predict next valueWhy ELMo's word embedding can represent the word better than glove?What is the reason for the speedup of transformer-xl?













3












$begingroup$


As it is bidirectional.



How can we edit BERT to do the next-word-predict task?










share|improve this question









$endgroup$












  • $begingroup$
    Have you seen the original publication? It seems to be addressing prediction at the sentence level, as explained in its section 3.3.2.
    $endgroup$
    – mapto
    20 hours ago
















3












$begingroup$


As it is bidirectional.



How can we edit BERT to do the next-word-predict task?










share|improve this question









$endgroup$












  • $begingroup$
    Have you seen the original publication? It seems to be addressing prediction at the sentence level, as explained in its section 3.3.2.
    $endgroup$
    – mapto
    20 hours ago














3












3








3


1



$begingroup$


As it is bidirectional.



How can we edit BERT to do the next-word-predict task?










share|improve this question









$endgroup$




As it is bidirectional.



How can we edit BERT to do the next-word-predict task?







neural-network deep-learning attention-mechanism transformer bert






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked 20 hours ago









jetjet

1706




1706












  • $begingroup$
    Have you seen the original publication? It seems to be addressing prediction at the sentence level, as explained in its section 3.3.2.
    $endgroup$
    – mapto
    20 hours ago


















  • $begingroup$
    Have you seen the original publication? It seems to be addressing prediction at the sentence level, as explained in its section 3.3.2.
    $endgroup$
    – mapto
    20 hours ago
















$begingroup$
Have you seen the original publication? It seems to be addressing prediction at the sentence level, as explained in its section 3.3.2.
$endgroup$
– mapto
20 hours ago




$begingroup$
Have you seen the original publication? It seems to be addressing prediction at the sentence level, as explained in its section 3.3.2.
$endgroup$
– mapto
20 hours ago










1 Answer
1






active

oldest

votes


















2












$begingroup$

BERT can't be used for next word prediction, at least not with the current state of the research on masked language modeling.



BERT is trained on a masked language modeling task and therefore you cannot "predict the next word". You can only mask a word and ask BERT to predict it given the rest of the sentence (both to the left and to the right of the masked word).



This way, with BERT you can't sample text like if it were a normal autoregressive language model. However, BERT can be seen as a Markov Random Field Language Model and be used for text generation as such. See article BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model for details. The authors released source code and a Google Colab notebook.






share|improve this answer











$endgroup$













  • $begingroup$
    Thank you very much.
    $endgroup$
    – jet
    3 hours ago











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46377%2fcan-bert-do-the-next-word-predict-task%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2












$begingroup$

BERT can't be used for next word prediction, at least not with the current state of the research on masked language modeling.



BERT is trained on a masked language modeling task and therefore you cannot "predict the next word". You can only mask a word and ask BERT to predict it given the rest of the sentence (both to the left and to the right of the masked word).



This way, with BERT you can't sample text like if it were a normal autoregressive language model. However, BERT can be seen as a Markov Random Field Language Model and be used for text generation as such. See article BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model for details. The authors released source code and a Google Colab notebook.






share|improve this answer











$endgroup$













  • $begingroup$
    Thank you very much.
    $endgroup$
    – jet
    3 hours ago
















2












$begingroup$

BERT can't be used for next word prediction, at least not with the current state of the research on masked language modeling.



BERT is trained on a masked language modeling task and therefore you cannot "predict the next word". You can only mask a word and ask BERT to predict it given the rest of the sentence (both to the left and to the right of the masked word).



This way, with BERT you can't sample text like if it were a normal autoregressive language model. However, BERT can be seen as a Markov Random Field Language Model and be used for text generation as such. See article BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model for details. The authors released source code and a Google Colab notebook.






share|improve this answer











$endgroup$













  • $begingroup$
    Thank you very much.
    $endgroup$
    – jet
    3 hours ago














2












2








2





$begingroup$

BERT can't be used for next word prediction, at least not with the current state of the research on masked language modeling.



BERT is trained on a masked language modeling task and therefore you cannot "predict the next word". You can only mask a word and ask BERT to predict it given the rest of the sentence (both to the left and to the right of the masked word).



This way, with BERT you can't sample text like if it were a normal autoregressive language model. However, BERT can be seen as a Markov Random Field Language Model and be used for text generation as such. See article BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model for details. The authors released source code and a Google Colab notebook.






share|improve this answer











$endgroup$



BERT can't be used for next word prediction, at least not with the current state of the research on masked language modeling.



BERT is trained on a masked language modeling task and therefore you cannot "predict the next word". You can only mask a word and ask BERT to predict it given the rest of the sentence (both to the left and to the right of the masked word).



This way, with BERT you can't sample text like if it were a normal autoregressive language model. However, BERT can be seen as a Markov Random Field Language Model and be used for text generation as such. See article BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model for details. The authors released source code and a Google Colab notebook.







share|improve this answer














share|improve this answer



share|improve this answer








edited 8 hours ago

























answered 19 hours ago









ncasasncasas

3,6231130




3,6231130












  • $begingroup$
    Thank you very much.
    $endgroup$
    – jet
    3 hours ago


















  • $begingroup$
    Thank you very much.
    $endgroup$
    – jet
    3 hours ago
















$begingroup$
Thank you very much.
$endgroup$
– jet
3 hours ago




$begingroup$
Thank you very much.
$endgroup$
– jet
3 hours ago


















draft saved

draft discarded




















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46377%2fcan-bert-do-the-next-word-predict-task%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

is 'sed' thread safeWhat should someone know about using Python scripts in the shell?Nexenta bash script uses...

How do i solve the “ No module named 'mlxtend' ” issue on Jupyter?

Pilgersdorf Inhaltsverzeichnis Geografie | Geschichte | Bevölkerungsentwicklung | Politik | Kultur...