Web scraping the titles and descriptions of trending YouTube videosScraping Instagram for Hashtag dataLoading...

Averaging over columns while ignoring zero entries

Why aren't there more Gauls like Obelix?

Draw this image in the TIKZ package

Tool for measuring readability of English text

A running toilet that stops itself

Short story about cities being connected by a conveyor belt

What exactly is the meaning of "fine wine"?

Was this cameo in Captain Marvel computer generated?

Sort array by month and year

Do I need a return ticket to Canada if I'm a Japanese National?

Are small insurances worth it?

How to make sure I'm assertive enough in contact with subordinates?

If nine coins are tossed, what is the probability that the number of heads is even?

PTIJ: Sport in the Torah

Ultrafilters as a double dual

Is this Paypal Github SDK reference really a dangerous site?

A vote on the Brexit backstop

Why does a car's steering wheel get lighter with increasing speed

Create chunks from an array

Boss Telling direct supervisor I snitched

How to educate team mate to take screenshots for bugs with out unwanted stuff

After Brexit, will the EU recognize British passports that are valid for more than ten years?

Book where society has been split into 2 with a wall down the middle where one side embraced high tech whereas other side were totally against tech

I am the light that shines in the dark



Web scraping the titles and descriptions of trending YouTube videos


Scraping Instagram for Hashtag dataLoading YouTube videos inside a bootstrap modalWeb-scraping Reddit BotCrawling through a YouTube channel via the YouTube APIThe YouTube crawlerScript to retrieve new YouTube videosWeb scraping many URLs and writing to Excel filePython script to scrape titles of public Youtube playlistYouTube page scraping using JsoupWeb scraping and downloading mangaScraping and printing titles from Craigslist













18












$begingroup$


This scrapes the titles and descriptions of trending YouTube videos and writes them to a CSV file. What improvements can I make?



from bs4 import BeautifulSoup
import requests
import csv

source = requests.get("https://www.youtube.com/feed/trending").text
soup = BeautifulSoup(source, 'lxml')

csv_file = open('YouTube Trending Titles on 12-30-18.csv','w')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['Title', 'Description'])

for content in soup.find_all('div', class_= "yt-lockup-content"):
try:
title = content.h3.a.text
print(title)

description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
print(description)

except Exception as e:
description = None

print('n')
csv_writer.writerow([title, description])

csv_file.close()









share|improve this question











$endgroup$

















    18












    $begingroup$


    This scrapes the titles and descriptions of trending YouTube videos and writes them to a CSV file. What improvements can I make?



    from bs4 import BeautifulSoup
    import requests
    import csv

    source = requests.get("https://www.youtube.com/feed/trending").text
    soup = BeautifulSoup(source, 'lxml')

    csv_file = open('YouTube Trending Titles on 12-30-18.csv','w')
    csv_writer = csv.writer(csv_file)
    csv_writer.writerow(['Title', 'Description'])

    for content in soup.find_all('div', class_= "yt-lockup-content"):
    try:
    title = content.h3.a.text
    print(title)

    description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
    print(description)

    except Exception as e:
    description = None

    print('n')
    csv_writer.writerow([title, description])

    csv_file.close()









    share|improve this question











    $endgroup$















      18












      18








      18


      7



      $begingroup$


      This scrapes the titles and descriptions of trending YouTube videos and writes them to a CSV file. What improvements can I make?



      from bs4 import BeautifulSoup
      import requests
      import csv

      source = requests.get("https://www.youtube.com/feed/trending").text
      soup = BeautifulSoup(source, 'lxml')

      csv_file = open('YouTube Trending Titles on 12-30-18.csv','w')
      csv_writer = csv.writer(csv_file)
      csv_writer.writerow(['Title', 'Description'])

      for content in soup.find_all('div', class_= "yt-lockup-content"):
      try:
      title = content.h3.a.text
      print(title)

      description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
      print(description)

      except Exception as e:
      description = None

      print('n')
      csv_writer.writerow([title, description])

      csv_file.close()









      share|improve this question











      $endgroup$




      This scrapes the titles and descriptions of trending YouTube videos and writes them to a CSV file. What improvements can I make?



      from bs4 import BeautifulSoup
      import requests
      import csv

      source = requests.get("https://www.youtube.com/feed/trending").text
      soup = BeautifulSoup(source, 'lxml')

      csv_file = open('YouTube Trending Titles on 12-30-18.csv','w')
      csv_writer = csv.writer(csv_file)
      csv_writer.writerow(['Title', 'Description'])

      for content in soup.find_all('div', class_= "yt-lockup-content"):
      try:
      title = content.h3.a.text
      print(title)

      description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
      print(description)

      except Exception as e:
      description = None

      print('n')
      csv_writer.writerow([title, description])

      csv_file.close()






      python csv web-scraping beautifulsoup youtube






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 30 '18 at 21:41









      Jamal

      30.4k11120227




      30.4k11120227










      asked Dec 30 '18 at 20:57









      austingaeaustingae

      570115




      570115






















          5 Answers
          5






          active

          oldest

          votes


















          39












          $begingroup$

          Why web-scrape, when you can get the data properly through the YouTube Data API, requesting the mostpopular list of videos? If you make a GET request to https://www.googleapis.com/youtube/v3/videos?key=…&part=snippet&chart=mostpopular, you will get the same information in a documented JSON format.



          Using the Python client, the code looks like:



          import csv
          import googleapiclient.discovery

          def most_popular(yt, **kwargs):
          popular = yt.videos().list(chart='mostPopular', part='snippet', **kwargs).execute()
          for video in popular['items']:
          yield video['snippet']

          yt = googleapiclient.discovery.build('youtube', 'v3', developerKey=…)
          with open('YouTube Trending Titles on 12-30-18.csv', 'w') as f:
          csv_writer = csv.writer(f)
          csv_writer.writerow(['Title', 'Description'])
          csv_writer.writerows(
          [snip['title'], snip['description']]
          for snip in most_popular(yt, maxResults=20, regionCode=…)
          )


          I've also restructured the code so that all of the CSV-writing code appears together, an inside a with open(…) as f: … block.






          share|improve this answer











          $endgroup$













          • $begingroup$
            Great suggestion! I searched but didn't find that API. Are mostpopular and trending synonyms ?
            $endgroup$
            – Josay
            Dec 30 '18 at 21:56






          • 5




            $begingroup$
            @Josay When I include the regionCode in the API call ('CA' for me), then the first 20 results are identical to the list on the "Trending" page, which indicates that they are indeed different names for the same thing.
            $endgroup$
            – 200_success
            Dec 30 '18 at 22:06












          • $begingroup$
            Oh, I see; I didn't know the purpose of API, but I will look more into it. Thanks for the info.
            $endgroup$
            – austingae
            Dec 30 '18 at 23:55






          • 3




            $begingroup$
            Is the developerkey a thing required to be taken from Google ? If yes, then the scraping doesn't require it and scraping may be better from that point of view. why sign up for something that you don't need to ?
            $endgroup$
            – Whirl Mind
            Dec 31 '18 at 15:32






          • 3




            $begingroup$
            @WhirlMind Because the HTML structure of the Trending page could change at any time, and is essentially an undocumented API. The YouTube Data API is guaranteed to be stable, and any changes to it will be preceded by a suitable transition period. Obtaining a developer key is a trivial process, if you already have a Google account.
            $endgroup$
            – 200_success
            Dec 31 '18 at 16:09



















          14












          $begingroup$

          Context manager



          You open a file at the beginning of the program and close it explicitly at the end.



          Python provides a nice way to allocate and release resources (such as files) easily: they are called Context managers. They give you the guarantee that the cleanup is performed at the end even in case of exception.



          In your case, you could write:



          with open('YouTube Trending Titles on 12-30-18.csv','w') as file:
          ....


          Exception



          All exceptions are caught by except Exception as e. It may look like a good idea at first but this can lead to various issues:




          • it's hard to know what types of error are actually expected here

          • most errors are better not caught (except for special situations). For instance, if you write a typo, you'll end up with an ignored NameError or AttributeError and debugging will be more painful than it should be.


          Also, from the content of the except block, it looks like you are only expecting the logic about description to fail. If so, it would be clearer to put in the try (...) except the smallest amount of code.



          For instance:



          title = content.h3.a.text
          try:
          description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
          except Exception as e:
          description = None
          print(title)
          print(description)
          print('n')


          Proper solution



          Google usually offers API to retrieve things such like trending videos. I haven't found it but I'll let you try to find something that works properly. Google is your friend...






          share|improve this answer









          $endgroup$





















            9












            $begingroup$

            I'd definitely look into using an API directly as @200_success suggested to avoid any web-scraping or HTML parsing, but here are some additional suggestions to improve your current code focused mostly around HTML parsing:





            • you could get some speed and memory improvements if you would use a SoupStrainer to allow BeautifulSoup parse out only the desired elements from the HTML:




              The SoupStrainer class allows you to choose which parts of an incoming document are parsed.




              from bs4 import BeautifulSoup, SoupStrainer

              trending_containers = SoupStrainer(class_="yt-lockup-content")
              soup = BeautifulSoup(source, 'lxml', parse_only=trending_containers)



            • instead of .find_all() and .find() you could have used more concise CSS selectors. You would have:



              soup.select('.yt-lockup-content')


              instead of:



              soup.find_all('div', class_= "yt-lockup-content")


              and:



              content.select_one('.yt-lockup-description.yt-ui-ellipsis.yt-ui-ellipsis-2')


              instead of:



              content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2")


            • note how I've omitted div tag names above - I think they are irrelevant as the class values actually define the type of an element in this case


            • organize imports as per PEP8






            share|improve this answer











            $endgroup$





















              3












              $begingroup$

              I would separate the "input" (finding titles and descriptions) from the output (writing to screen or file). One good way to do that is to use a generator:



              from bs4 import BeautifulSoup
              import requests
              import csv

              def soup():
              source = requests.get("https://www.youtube.com/feed/trending").text
              soup = BeautifulSoup(source, 'lxml')

              def find_videos(soup):
              for content in soup.find_all('div', class_= "yt-lockup-content"):
              try:
              title = content.h3.a.text
              description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
              except Exception as e:
              description = None
              yield (title, description)

              with open('YouTube Trending Titles on 12-30-18.csv', 'w') as csv_file:

              csv_writer = csv.writer(csv_file)
              csv_writer.writerow(['Title', 'Description'])

              for (title, description) in find_videos(soup()):
              csv_writer.writerow([title, description])


              Disclaimar: I haven't tested this code.






              share|improve this answer









              $endgroup$





















                0












                $begingroup$

                it show unicode error due to use of write row






                share|improve this answer








                New contributor




                ashish saha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






                $endgroup$













                  Your Answer





                  StackExchange.ifUsing("editor", function () {
                  return StackExchange.using("mathjaxEditing", function () {
                  StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
                  StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
                  });
                  });
                  }, "mathjax-editing");

                  StackExchange.ifUsing("editor", function () {
                  StackExchange.using("externalEditor", function () {
                  StackExchange.using("snippets", function () {
                  StackExchange.snippets.init();
                  });
                  });
                  }, "code-snippets");

                  StackExchange.ready(function() {
                  var channelOptions = {
                  tags: "".split(" "),
                  id: "196"
                  };
                  initTagRenderer("".split(" "), "".split(" "), channelOptions);

                  StackExchange.using("externalEditor", function() {
                  // Have to fire editor after snippets, if snippets enabled
                  if (StackExchange.settings.snippets.snippetsEnabled) {
                  StackExchange.using("snippets", function() {
                  createEditor();
                  });
                  }
                  else {
                  createEditor();
                  }
                  });

                  function createEditor() {
                  StackExchange.prepareEditor({
                  heartbeatType: 'answer',
                  autoActivateHeartbeat: false,
                  convertImagesToLinks: false,
                  noModals: true,
                  showLowRepImageUploadWarning: true,
                  reputationToPostImages: null,
                  bindNavPrevention: true,
                  postfix: "",
                  imageUploader: {
                  brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                  contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                  allowUrls: true
                  },
                  onDemand: true,
                  discardSelector: ".discard-answer"
                  ,immediatelyShowMarkdownHelp:true
                  });


                  }
                  });














                  draft saved

                  draft discarded


















                  StackExchange.ready(
                  function () {
                  StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210613%2fweb-scraping-the-titles-and-descriptions-of-trending-youtube-videos%23new-answer', 'question_page');
                  }
                  );

                  Post as a guest















                  Required, but never shown

























                  5 Answers
                  5






                  active

                  oldest

                  votes








                  5 Answers
                  5






                  active

                  oldest

                  votes









                  active

                  oldest

                  votes






                  active

                  oldest

                  votes









                  39












                  $begingroup$

                  Why web-scrape, when you can get the data properly through the YouTube Data API, requesting the mostpopular list of videos? If you make a GET request to https://www.googleapis.com/youtube/v3/videos?key=…&part=snippet&chart=mostpopular, you will get the same information in a documented JSON format.



                  Using the Python client, the code looks like:



                  import csv
                  import googleapiclient.discovery

                  def most_popular(yt, **kwargs):
                  popular = yt.videos().list(chart='mostPopular', part='snippet', **kwargs).execute()
                  for video in popular['items']:
                  yield video['snippet']

                  yt = googleapiclient.discovery.build('youtube', 'v3', developerKey=…)
                  with open('YouTube Trending Titles on 12-30-18.csv', 'w') as f:
                  csv_writer = csv.writer(f)
                  csv_writer.writerow(['Title', 'Description'])
                  csv_writer.writerows(
                  [snip['title'], snip['description']]
                  for snip in most_popular(yt, maxResults=20, regionCode=…)
                  )


                  I've also restructured the code so that all of the CSV-writing code appears together, an inside a with open(…) as f: … block.






                  share|improve this answer











                  $endgroup$













                  • $begingroup$
                    Great suggestion! I searched but didn't find that API. Are mostpopular and trending synonyms ?
                    $endgroup$
                    – Josay
                    Dec 30 '18 at 21:56






                  • 5




                    $begingroup$
                    @Josay When I include the regionCode in the API call ('CA' for me), then the first 20 results are identical to the list on the "Trending" page, which indicates that they are indeed different names for the same thing.
                    $endgroup$
                    – 200_success
                    Dec 30 '18 at 22:06












                  • $begingroup$
                    Oh, I see; I didn't know the purpose of API, but I will look more into it. Thanks for the info.
                    $endgroup$
                    – austingae
                    Dec 30 '18 at 23:55






                  • 3




                    $begingroup$
                    Is the developerkey a thing required to be taken from Google ? If yes, then the scraping doesn't require it and scraping may be better from that point of view. why sign up for something that you don't need to ?
                    $endgroup$
                    – Whirl Mind
                    Dec 31 '18 at 15:32






                  • 3




                    $begingroup$
                    @WhirlMind Because the HTML structure of the Trending page could change at any time, and is essentially an undocumented API. The YouTube Data API is guaranteed to be stable, and any changes to it will be preceded by a suitable transition period. Obtaining a developer key is a trivial process, if you already have a Google account.
                    $endgroup$
                    – 200_success
                    Dec 31 '18 at 16:09
















                  39












                  $begingroup$

                  Why web-scrape, when you can get the data properly through the YouTube Data API, requesting the mostpopular list of videos? If you make a GET request to https://www.googleapis.com/youtube/v3/videos?key=…&part=snippet&chart=mostpopular, you will get the same information in a documented JSON format.



                  Using the Python client, the code looks like:



                  import csv
                  import googleapiclient.discovery

                  def most_popular(yt, **kwargs):
                  popular = yt.videos().list(chart='mostPopular', part='snippet', **kwargs).execute()
                  for video in popular['items']:
                  yield video['snippet']

                  yt = googleapiclient.discovery.build('youtube', 'v3', developerKey=…)
                  with open('YouTube Trending Titles on 12-30-18.csv', 'w') as f:
                  csv_writer = csv.writer(f)
                  csv_writer.writerow(['Title', 'Description'])
                  csv_writer.writerows(
                  [snip['title'], snip['description']]
                  for snip in most_popular(yt, maxResults=20, regionCode=…)
                  )


                  I've also restructured the code so that all of the CSV-writing code appears together, an inside a with open(…) as f: … block.






                  share|improve this answer











                  $endgroup$













                  • $begingroup$
                    Great suggestion! I searched but didn't find that API. Are mostpopular and trending synonyms ?
                    $endgroup$
                    – Josay
                    Dec 30 '18 at 21:56






                  • 5




                    $begingroup$
                    @Josay When I include the regionCode in the API call ('CA' for me), then the first 20 results are identical to the list on the "Trending" page, which indicates that they are indeed different names for the same thing.
                    $endgroup$
                    – 200_success
                    Dec 30 '18 at 22:06












                  • $begingroup$
                    Oh, I see; I didn't know the purpose of API, but I will look more into it. Thanks for the info.
                    $endgroup$
                    – austingae
                    Dec 30 '18 at 23:55






                  • 3




                    $begingroup$
                    Is the developerkey a thing required to be taken from Google ? If yes, then the scraping doesn't require it and scraping may be better from that point of view. why sign up for something that you don't need to ?
                    $endgroup$
                    – Whirl Mind
                    Dec 31 '18 at 15:32






                  • 3




                    $begingroup$
                    @WhirlMind Because the HTML structure of the Trending page could change at any time, and is essentially an undocumented API. The YouTube Data API is guaranteed to be stable, and any changes to it will be preceded by a suitable transition period. Obtaining a developer key is a trivial process, if you already have a Google account.
                    $endgroup$
                    – 200_success
                    Dec 31 '18 at 16:09














                  39












                  39








                  39





                  $begingroup$

                  Why web-scrape, when you can get the data properly through the YouTube Data API, requesting the mostpopular list of videos? If you make a GET request to https://www.googleapis.com/youtube/v3/videos?key=…&part=snippet&chart=mostpopular, you will get the same information in a documented JSON format.



                  Using the Python client, the code looks like:



                  import csv
                  import googleapiclient.discovery

                  def most_popular(yt, **kwargs):
                  popular = yt.videos().list(chart='mostPopular', part='snippet', **kwargs).execute()
                  for video in popular['items']:
                  yield video['snippet']

                  yt = googleapiclient.discovery.build('youtube', 'v3', developerKey=…)
                  with open('YouTube Trending Titles on 12-30-18.csv', 'w') as f:
                  csv_writer = csv.writer(f)
                  csv_writer.writerow(['Title', 'Description'])
                  csv_writer.writerows(
                  [snip['title'], snip['description']]
                  for snip in most_popular(yt, maxResults=20, regionCode=…)
                  )


                  I've also restructured the code so that all of the CSV-writing code appears together, an inside a with open(…) as f: … block.






                  share|improve this answer











                  $endgroup$



                  Why web-scrape, when you can get the data properly through the YouTube Data API, requesting the mostpopular list of videos? If you make a GET request to https://www.googleapis.com/youtube/v3/videos?key=…&part=snippet&chart=mostpopular, you will get the same information in a documented JSON format.



                  Using the Python client, the code looks like:



                  import csv
                  import googleapiclient.discovery

                  def most_popular(yt, **kwargs):
                  popular = yt.videos().list(chart='mostPopular', part='snippet', **kwargs).execute()
                  for video in popular['items']:
                  yield video['snippet']

                  yt = googleapiclient.discovery.build('youtube', 'v3', developerKey=…)
                  with open('YouTube Trending Titles on 12-30-18.csv', 'w') as f:
                  csv_writer = csv.writer(f)
                  csv_writer.writerow(['Title', 'Description'])
                  csv_writer.writerows(
                  [snip['title'], snip['description']]
                  for snip in most_popular(yt, maxResults=20, regionCode=…)
                  )


                  I've also restructured the code so that all of the CSV-writing code appears together, an inside a with open(…) as f: … block.







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Dec 30 '18 at 22:05

























                  answered Dec 30 '18 at 21:51









                  200_success200_success

                  130k16153419




                  130k16153419












                  • $begingroup$
                    Great suggestion! I searched but didn't find that API. Are mostpopular and trending synonyms ?
                    $endgroup$
                    – Josay
                    Dec 30 '18 at 21:56






                  • 5




                    $begingroup$
                    @Josay When I include the regionCode in the API call ('CA' for me), then the first 20 results are identical to the list on the "Trending" page, which indicates that they are indeed different names for the same thing.
                    $endgroup$
                    – 200_success
                    Dec 30 '18 at 22:06












                  • $begingroup$
                    Oh, I see; I didn't know the purpose of API, but I will look more into it. Thanks for the info.
                    $endgroup$
                    – austingae
                    Dec 30 '18 at 23:55






                  • 3




                    $begingroup$
                    Is the developerkey a thing required to be taken from Google ? If yes, then the scraping doesn't require it and scraping may be better from that point of view. why sign up for something that you don't need to ?
                    $endgroup$
                    – Whirl Mind
                    Dec 31 '18 at 15:32






                  • 3




                    $begingroup$
                    @WhirlMind Because the HTML structure of the Trending page could change at any time, and is essentially an undocumented API. The YouTube Data API is guaranteed to be stable, and any changes to it will be preceded by a suitable transition period. Obtaining a developer key is a trivial process, if you already have a Google account.
                    $endgroup$
                    – 200_success
                    Dec 31 '18 at 16:09


















                  • $begingroup$
                    Great suggestion! I searched but didn't find that API. Are mostpopular and trending synonyms ?
                    $endgroup$
                    – Josay
                    Dec 30 '18 at 21:56






                  • 5




                    $begingroup$
                    @Josay When I include the regionCode in the API call ('CA' for me), then the first 20 results are identical to the list on the "Trending" page, which indicates that they are indeed different names for the same thing.
                    $endgroup$
                    – 200_success
                    Dec 30 '18 at 22:06












                  • $begingroup$
                    Oh, I see; I didn't know the purpose of API, but I will look more into it. Thanks for the info.
                    $endgroup$
                    – austingae
                    Dec 30 '18 at 23:55






                  • 3




                    $begingroup$
                    Is the developerkey a thing required to be taken from Google ? If yes, then the scraping doesn't require it and scraping may be better from that point of view. why sign up for something that you don't need to ?
                    $endgroup$
                    – Whirl Mind
                    Dec 31 '18 at 15:32






                  • 3




                    $begingroup$
                    @WhirlMind Because the HTML structure of the Trending page could change at any time, and is essentially an undocumented API. The YouTube Data API is guaranteed to be stable, and any changes to it will be preceded by a suitable transition period. Obtaining a developer key is a trivial process, if you already have a Google account.
                    $endgroup$
                    – 200_success
                    Dec 31 '18 at 16:09
















                  $begingroup$
                  Great suggestion! I searched but didn't find that API. Are mostpopular and trending synonyms ?
                  $endgroup$
                  – Josay
                  Dec 30 '18 at 21:56




                  $begingroup$
                  Great suggestion! I searched but didn't find that API. Are mostpopular and trending synonyms ?
                  $endgroup$
                  – Josay
                  Dec 30 '18 at 21:56




                  5




                  5




                  $begingroup$
                  @Josay When I include the regionCode in the API call ('CA' for me), then the first 20 results are identical to the list on the "Trending" page, which indicates that they are indeed different names for the same thing.
                  $endgroup$
                  – 200_success
                  Dec 30 '18 at 22:06






                  $begingroup$
                  @Josay When I include the regionCode in the API call ('CA' for me), then the first 20 results are identical to the list on the "Trending" page, which indicates that they are indeed different names for the same thing.
                  $endgroup$
                  – 200_success
                  Dec 30 '18 at 22:06














                  $begingroup$
                  Oh, I see; I didn't know the purpose of API, but I will look more into it. Thanks for the info.
                  $endgroup$
                  – austingae
                  Dec 30 '18 at 23:55




                  $begingroup$
                  Oh, I see; I didn't know the purpose of API, but I will look more into it. Thanks for the info.
                  $endgroup$
                  – austingae
                  Dec 30 '18 at 23:55




                  3




                  3




                  $begingroup$
                  Is the developerkey a thing required to be taken from Google ? If yes, then the scraping doesn't require it and scraping may be better from that point of view. why sign up for something that you don't need to ?
                  $endgroup$
                  – Whirl Mind
                  Dec 31 '18 at 15:32




                  $begingroup$
                  Is the developerkey a thing required to be taken from Google ? If yes, then the scraping doesn't require it and scraping may be better from that point of view. why sign up for something that you don't need to ?
                  $endgroup$
                  – Whirl Mind
                  Dec 31 '18 at 15:32




                  3




                  3




                  $begingroup$
                  @WhirlMind Because the HTML structure of the Trending page could change at any time, and is essentially an undocumented API. The YouTube Data API is guaranteed to be stable, and any changes to it will be preceded by a suitable transition period. Obtaining a developer key is a trivial process, if you already have a Google account.
                  $endgroup$
                  – 200_success
                  Dec 31 '18 at 16:09




                  $begingroup$
                  @WhirlMind Because the HTML structure of the Trending page could change at any time, and is essentially an undocumented API. The YouTube Data API is guaranteed to be stable, and any changes to it will be preceded by a suitable transition period. Obtaining a developer key is a trivial process, if you already have a Google account.
                  $endgroup$
                  – 200_success
                  Dec 31 '18 at 16:09













                  14












                  $begingroup$

                  Context manager



                  You open a file at the beginning of the program and close it explicitly at the end.



                  Python provides a nice way to allocate and release resources (such as files) easily: they are called Context managers. They give you the guarantee that the cleanup is performed at the end even in case of exception.



                  In your case, you could write:



                  with open('YouTube Trending Titles on 12-30-18.csv','w') as file:
                  ....


                  Exception



                  All exceptions are caught by except Exception as e. It may look like a good idea at first but this can lead to various issues:




                  • it's hard to know what types of error are actually expected here

                  • most errors are better not caught (except for special situations). For instance, if you write a typo, you'll end up with an ignored NameError or AttributeError and debugging will be more painful than it should be.


                  Also, from the content of the except block, it looks like you are only expecting the logic about description to fail. If so, it would be clearer to put in the try (...) except the smallest amount of code.



                  For instance:



                  title = content.h3.a.text
                  try:
                  description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
                  except Exception as e:
                  description = None
                  print(title)
                  print(description)
                  print('n')


                  Proper solution



                  Google usually offers API to retrieve things such like trending videos. I haven't found it but I'll let you try to find something that works properly. Google is your friend...






                  share|improve this answer









                  $endgroup$


















                    14












                    $begingroup$

                    Context manager



                    You open a file at the beginning of the program and close it explicitly at the end.



                    Python provides a nice way to allocate and release resources (such as files) easily: they are called Context managers. They give you the guarantee that the cleanup is performed at the end even in case of exception.



                    In your case, you could write:



                    with open('YouTube Trending Titles on 12-30-18.csv','w') as file:
                    ....


                    Exception



                    All exceptions are caught by except Exception as e. It may look like a good idea at first but this can lead to various issues:




                    • it's hard to know what types of error are actually expected here

                    • most errors are better not caught (except for special situations). For instance, if you write a typo, you'll end up with an ignored NameError or AttributeError and debugging will be more painful than it should be.


                    Also, from the content of the except block, it looks like you are only expecting the logic about description to fail. If so, it would be clearer to put in the try (...) except the smallest amount of code.



                    For instance:



                    title = content.h3.a.text
                    try:
                    description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
                    except Exception as e:
                    description = None
                    print(title)
                    print(description)
                    print('n')


                    Proper solution



                    Google usually offers API to retrieve things such like trending videos. I haven't found it but I'll let you try to find something that works properly. Google is your friend...






                    share|improve this answer









                    $endgroup$
















                      14












                      14








                      14





                      $begingroup$

                      Context manager



                      You open a file at the beginning of the program and close it explicitly at the end.



                      Python provides a nice way to allocate and release resources (such as files) easily: they are called Context managers. They give you the guarantee that the cleanup is performed at the end even in case of exception.



                      In your case, you could write:



                      with open('YouTube Trending Titles on 12-30-18.csv','w') as file:
                      ....


                      Exception



                      All exceptions are caught by except Exception as e. It may look like a good idea at first but this can lead to various issues:




                      • it's hard to know what types of error are actually expected here

                      • most errors are better not caught (except for special situations). For instance, if you write a typo, you'll end up with an ignored NameError or AttributeError and debugging will be more painful than it should be.


                      Also, from the content of the except block, it looks like you are only expecting the logic about description to fail. If so, it would be clearer to put in the try (...) except the smallest amount of code.



                      For instance:



                      title = content.h3.a.text
                      try:
                      description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
                      except Exception as e:
                      description = None
                      print(title)
                      print(description)
                      print('n')


                      Proper solution



                      Google usually offers API to retrieve things such like trending videos. I haven't found it but I'll let you try to find something that works properly. Google is your friend...






                      share|improve this answer









                      $endgroup$



                      Context manager



                      You open a file at the beginning of the program and close it explicitly at the end.



                      Python provides a nice way to allocate and release resources (such as files) easily: they are called Context managers. They give you the guarantee that the cleanup is performed at the end even in case of exception.



                      In your case, you could write:



                      with open('YouTube Trending Titles on 12-30-18.csv','w') as file:
                      ....


                      Exception



                      All exceptions are caught by except Exception as e. It may look like a good idea at first but this can lead to various issues:




                      • it's hard to know what types of error are actually expected here

                      • most errors are better not caught (except for special situations). For instance, if you write a typo, you'll end up with an ignored NameError or AttributeError and debugging will be more painful than it should be.


                      Also, from the content of the except block, it looks like you are only expecting the logic about description to fail. If so, it would be clearer to put in the try (...) except the smallest amount of code.



                      For instance:



                      title = content.h3.a.text
                      try:
                      description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
                      except Exception as e:
                      description = None
                      print(title)
                      print(description)
                      print('n')


                      Proper solution



                      Google usually offers API to retrieve things such like trending videos. I haven't found it but I'll let you try to find something that works properly. Google is your friend...







                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Dec 30 '18 at 21:55









                      JosayJosay

                      26k14087




                      26k14087























                          9












                          $begingroup$

                          I'd definitely look into using an API directly as @200_success suggested to avoid any web-scraping or HTML parsing, but here are some additional suggestions to improve your current code focused mostly around HTML parsing:





                          • you could get some speed and memory improvements if you would use a SoupStrainer to allow BeautifulSoup parse out only the desired elements from the HTML:




                            The SoupStrainer class allows you to choose which parts of an incoming document are parsed.




                            from bs4 import BeautifulSoup, SoupStrainer

                            trending_containers = SoupStrainer(class_="yt-lockup-content")
                            soup = BeautifulSoup(source, 'lxml', parse_only=trending_containers)



                          • instead of .find_all() and .find() you could have used more concise CSS selectors. You would have:



                            soup.select('.yt-lockup-content')


                            instead of:



                            soup.find_all('div', class_= "yt-lockup-content")


                            and:



                            content.select_one('.yt-lockup-description.yt-ui-ellipsis.yt-ui-ellipsis-2')


                            instead of:



                            content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2")


                          • note how I've omitted div tag names above - I think they are irrelevant as the class values actually define the type of an element in this case


                          • organize imports as per PEP8






                          share|improve this answer











                          $endgroup$


















                            9












                            $begingroup$

                            I'd definitely look into using an API directly as @200_success suggested to avoid any web-scraping or HTML parsing, but here are some additional suggestions to improve your current code focused mostly around HTML parsing:





                            • you could get some speed and memory improvements if you would use a SoupStrainer to allow BeautifulSoup parse out only the desired elements from the HTML:




                              The SoupStrainer class allows you to choose which parts of an incoming document are parsed.




                              from bs4 import BeautifulSoup, SoupStrainer

                              trending_containers = SoupStrainer(class_="yt-lockup-content")
                              soup = BeautifulSoup(source, 'lxml', parse_only=trending_containers)



                            • instead of .find_all() and .find() you could have used more concise CSS selectors. You would have:



                              soup.select('.yt-lockup-content')


                              instead of:



                              soup.find_all('div', class_= "yt-lockup-content")


                              and:



                              content.select_one('.yt-lockup-description.yt-ui-ellipsis.yt-ui-ellipsis-2')


                              instead of:



                              content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2")


                            • note how I've omitted div tag names above - I think they are irrelevant as the class values actually define the type of an element in this case


                            • organize imports as per PEP8






                            share|improve this answer











                            $endgroup$
















                              9












                              9








                              9





                              $begingroup$

                              I'd definitely look into using an API directly as @200_success suggested to avoid any web-scraping or HTML parsing, but here are some additional suggestions to improve your current code focused mostly around HTML parsing:





                              • you could get some speed and memory improvements if you would use a SoupStrainer to allow BeautifulSoup parse out only the desired elements from the HTML:




                                The SoupStrainer class allows you to choose which parts of an incoming document are parsed.




                                from bs4 import BeautifulSoup, SoupStrainer

                                trending_containers = SoupStrainer(class_="yt-lockup-content")
                                soup = BeautifulSoup(source, 'lxml', parse_only=trending_containers)



                              • instead of .find_all() and .find() you could have used more concise CSS selectors. You would have:



                                soup.select('.yt-lockup-content')


                                instead of:



                                soup.find_all('div', class_= "yt-lockup-content")


                                and:



                                content.select_one('.yt-lockup-description.yt-ui-ellipsis.yt-ui-ellipsis-2')


                                instead of:



                                content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2")


                              • note how I've omitted div tag names above - I think they are irrelevant as the class values actually define the type of an element in this case


                              • organize imports as per PEP8






                              share|improve this answer











                              $endgroup$



                              I'd definitely look into using an API directly as @200_success suggested to avoid any web-scraping or HTML parsing, but here are some additional suggestions to improve your current code focused mostly around HTML parsing:





                              • you could get some speed and memory improvements if you would use a SoupStrainer to allow BeautifulSoup parse out only the desired elements from the HTML:




                                The SoupStrainer class allows you to choose which parts of an incoming document are parsed.




                                from bs4 import BeautifulSoup, SoupStrainer

                                trending_containers = SoupStrainer(class_="yt-lockup-content")
                                soup = BeautifulSoup(source, 'lxml', parse_only=trending_containers)



                              • instead of .find_all() and .find() you could have used more concise CSS selectors. You would have:



                                soup.select('.yt-lockup-content')


                                instead of:



                                soup.find_all('div', class_= "yt-lockup-content")


                                and:



                                content.select_one('.yt-lockup-description.yt-ui-ellipsis.yt-ui-ellipsis-2')


                                instead of:



                                content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2")


                              • note how I've omitted div tag names above - I think they are irrelevant as the class values actually define the type of an element in this case


                              • organize imports as per PEP8







                              share|improve this answer














                              share|improve this answer



                              share|improve this answer








                              edited Dec 31 '18 at 0:51

























                              answered Dec 31 '18 at 0:45









                              alecxealecxe

                              15.3k53579




                              15.3k53579























                                  3












                                  $begingroup$

                                  I would separate the "input" (finding titles and descriptions) from the output (writing to screen or file). One good way to do that is to use a generator:



                                  from bs4 import BeautifulSoup
                                  import requests
                                  import csv

                                  def soup():
                                  source = requests.get("https://www.youtube.com/feed/trending").text
                                  soup = BeautifulSoup(source, 'lxml')

                                  def find_videos(soup):
                                  for content in soup.find_all('div', class_= "yt-lockup-content"):
                                  try:
                                  title = content.h3.a.text
                                  description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
                                  except Exception as e:
                                  description = None
                                  yield (title, description)

                                  with open('YouTube Trending Titles on 12-30-18.csv', 'w') as csv_file:

                                  csv_writer = csv.writer(csv_file)
                                  csv_writer.writerow(['Title', 'Description'])

                                  for (title, description) in find_videos(soup()):
                                  csv_writer.writerow([title, description])


                                  Disclaimar: I haven't tested this code.






                                  share|improve this answer









                                  $endgroup$


















                                    3












                                    $begingroup$

                                    I would separate the "input" (finding titles and descriptions) from the output (writing to screen or file). One good way to do that is to use a generator:



                                    from bs4 import BeautifulSoup
                                    import requests
                                    import csv

                                    def soup():
                                    source = requests.get("https://www.youtube.com/feed/trending").text
                                    soup = BeautifulSoup(source, 'lxml')

                                    def find_videos(soup):
                                    for content in soup.find_all('div', class_= "yt-lockup-content"):
                                    try:
                                    title = content.h3.a.text
                                    description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
                                    except Exception as e:
                                    description = None
                                    yield (title, description)

                                    with open('YouTube Trending Titles on 12-30-18.csv', 'w') as csv_file:

                                    csv_writer = csv.writer(csv_file)
                                    csv_writer.writerow(['Title', 'Description'])

                                    for (title, description) in find_videos(soup()):
                                    csv_writer.writerow([title, description])


                                    Disclaimar: I haven't tested this code.






                                    share|improve this answer









                                    $endgroup$
















                                      3












                                      3








                                      3





                                      $begingroup$

                                      I would separate the "input" (finding titles and descriptions) from the output (writing to screen or file). One good way to do that is to use a generator:



                                      from bs4 import BeautifulSoup
                                      import requests
                                      import csv

                                      def soup():
                                      source = requests.get("https://www.youtube.com/feed/trending").text
                                      soup = BeautifulSoup(source, 'lxml')

                                      def find_videos(soup):
                                      for content in soup.find_all('div', class_= "yt-lockup-content"):
                                      try:
                                      title = content.h3.a.text
                                      description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
                                      except Exception as e:
                                      description = None
                                      yield (title, description)

                                      with open('YouTube Trending Titles on 12-30-18.csv', 'w') as csv_file:

                                      csv_writer = csv.writer(csv_file)
                                      csv_writer.writerow(['Title', 'Description'])

                                      for (title, description) in find_videos(soup()):
                                      csv_writer.writerow([title, description])


                                      Disclaimar: I haven't tested this code.






                                      share|improve this answer









                                      $endgroup$



                                      I would separate the "input" (finding titles and descriptions) from the output (writing to screen or file). One good way to do that is to use a generator:



                                      from bs4 import BeautifulSoup
                                      import requests
                                      import csv

                                      def soup():
                                      source = requests.get("https://www.youtube.com/feed/trending").text
                                      soup = BeautifulSoup(source, 'lxml')

                                      def find_videos(soup):
                                      for content in soup.find_all('div', class_= "yt-lockup-content"):
                                      try:
                                      title = content.h3.a.text
                                      description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
                                      except Exception as e:
                                      description = None
                                      yield (title, description)

                                      with open('YouTube Trending Titles on 12-30-18.csv', 'w') as csv_file:

                                      csv_writer = csv.writer(csv_file)
                                      csv_writer.writerow(['Title', 'Description'])

                                      for (title, description) in find_videos(soup()):
                                      csv_writer.writerow([title, description])


                                      Disclaimar: I haven't tested this code.







                                      share|improve this answer












                                      share|improve this answer



                                      share|improve this answer










                                      answered Jan 1 at 4:49









                                      AndersAnders

                                      1312




                                      1312























                                          0












                                          $begingroup$

                                          it show unicode error due to use of write row






                                          share|improve this answer








                                          New contributor




                                          ashish saha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                          Check out our Code of Conduct.






                                          $endgroup$


















                                            0












                                            $begingroup$

                                            it show unicode error due to use of write row






                                            share|improve this answer








                                            New contributor




                                            ashish saha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                            Check out our Code of Conduct.






                                            $endgroup$
















                                              0












                                              0








                                              0





                                              $begingroup$

                                              it show unicode error due to use of write row






                                              share|improve this answer








                                              New contributor




                                              ashish saha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                              Check out our Code of Conduct.






                                              $endgroup$



                                              it show unicode error due to use of write row







                                              share|improve this answer








                                              New contributor




                                              ashish saha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                              Check out our Code of Conduct.









                                              share|improve this answer



                                              share|improve this answer






                                              New contributor




                                              ashish saha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                              Check out our Code of Conduct.









                                              answered 10 mins ago









                                              ashish sahaashish saha

                                              1




                                              1




                                              New contributor




                                              ashish saha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                              Check out our Code of Conduct.





                                              New contributor





                                              ashish saha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                              Check out our Code of Conduct.






                                              ashish saha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                              Check out our Code of Conduct.






























                                                  draft saved

                                                  draft discarded




















































                                                  Thanks for contributing an answer to Code Review Stack Exchange!


                                                  • Please be sure to answer the question. Provide details and share your research!

                                                  But avoid



                                                  • Asking for help, clarification, or responding to other answers.

                                                  • Making statements based on opinion; back them up with references or personal experience.


                                                  Use MathJax to format equations. MathJax reference.


                                                  To learn more, see our tips on writing great answers.




                                                  draft saved


                                                  draft discarded














                                                  StackExchange.ready(
                                                  function () {
                                                  StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210613%2fweb-scraping-the-titles-and-descriptions-of-trending-youtube-videos%23new-answer', 'question_page');
                                                  }
                                                  );

                                                  Post as a guest















                                                  Required, but never shown





















































                                                  Required, but never shown














                                                  Required, but never shown












                                                  Required, but never shown







                                                  Required, but never shown

































                                                  Required, but never shown














                                                  Required, but never shown












                                                  Required, but never shown







                                                  Required, but never shown







                                                  Popular posts from this blog

                                                  is 'sed' thread safeWhat should someone know about using Python scripts in the shell?Nexenta bash script uses...

                                                  How do i solve the “ No module named 'mlxtend' ” issue on Jupyter?

                                                  Pilgersdorf Inhaltsverzeichnis Geografie | Geschichte | Bevölkerungsentwicklung | Politik | Kultur...