POPULARITY
Runway ML and Getty Images' partnership to create a new AI video model, the general availability of Microsoft Copilot, the Magicoder series of fully open-source LLMs for code, and the Unlocking Spell on Base LLMs' new tuning-free alignment method called URIAL. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:34 Runway and Getty Images team up on AI video 02:55 Microsoft Copilot Now Available to All Users 04:41 GPU Cloud Economics Explained – The Hidden Truth 05:53 Fake sponsor 07:55 Object Recognition as Next Token Prediction 09:08 Magicoder: Source Code Is All You Need 10:46 The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning 12:47 Outro
Devin Davies is the developer behind the awesome recipe organizer and meal planner app Crouton. He is already hard at work at adapting his iPad version of the app to visionOS and discovering some cool concepts like floating timers that you can place on the item you are cooking. Bonus content and early episodes with chapter markers are available by supporting the podcast at www.patreon.com/ipadpros. Bonus content and early episodes are also now available in Apple Podcasts! Subscribe today to get instant access to iPad Possibilities, iPad Ponderings, and iPad Historia!Show notes are available at www.VisionPros.fm. Feedback is welcomed at tim@visionpros.fm.Links: https://crouton.apphttps://mastodon.social/@JustMeDevinChapter Markers: 00:00:00: Opening00:01:16: Support the podcast!00:01:47: Devin Davies00:03:39: Crouton00:06:46: Live at WWDC00:07:53: Apple Vision Pro00:09:23: Developing in Vision Pro00:10:20: Ergonomics00:11:26: Vision Pro Outdoors00:12:45: Replacing the Mac?00:15:21: The Form Factor00:17:16: SDK and Simulator00:18:50: System Apps00:19:58: Settings app00:20:22: Surprises in the Simulator?00:21:05: Design Considerations00:22:04: Object Recognition 00:23:44: visionOS limitations00:24:29: Data Entry00:25:55: UI Navigation00:26:59: Multiple Timers00:30:39: Combustion Thermometer 00:31:29: 3D Elements?00:33:29: Anything else about visionOS?00:34:52: New app ideas?00:39:19: Anything else about Crouton?00:40:03: Universal Apps?00:41:10: Starting from an iPad app00:42:38: No Turn by Turn00:42:54: Wrap-up00:43:09: Crouton.app00:43:15: On Mastodon00:43:49: Closing Hosted on Acast. See acast.com/privacy for more information.
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.08.01.551579v1?rss=1 Authors: Sanghavi, S., Kar, K. Abstract: A spatially distributed population of neurons in the macaque inferior temporal (IT) cortex supports object recognition behavior, but the cell-type specificity of the population in forming behaviorally sufficient object decodes remain unclear. To address this, we recorded neural signals from the macaque IT cortex and compared the object identity information and the alignment of decoding strategies derived from putative inhibitory (Inh) and excitatory (Exc) neurons to the monkeys' behavior. We observed that while Inh neurons represented significant category information, decoding strategies based on Exc neural population activity outperformed those from Inh neurons in overall accuracy and their image-level match to the monkeys' behavioral reports. Interestingly, both Exc and Inh responses explained a fraction of unique variance of the monkeys' behavior, demonstrating a distinct role of the two cell types in generating object identity solutions for a downstream readout. We observed that current artificial neural network (ANN) models of primate ventral stream, designed with AI goals of performance optimization on image categorization, better predict Exc neurons (and its contribution to object recognition behavior) than Inh neurons. Beyond, the refinement of linking propositions between IT and object recognition behavior, our results guide the development of more biologically constrained brain models by offering novel cell-type specific neural benchmarks. Copy rights belong to original authors. Visit the link for more info Podcast created by Paper Player, LLC
Very few photographers I know enjoy spending the time to add keywords to their photos. Quite honestly, it is the least enjoyable task in our craft. But all of that is changing as of today. ON1 Software just released Photo Keyword AI. Does this transform how we find images? Indeed it does. And I will explain why on today's TDS Photography Podcast. I hope you enjoy the show. This is The Digital Story Podcast #900, June 20, 2023. Today's theme is "The End of Keywording as We Know It - ON1 Photo Keyword AI." I'm Derrick Story.
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.03.15.532720v1?rss=1 Authors: Schnell, A. E., Leemans, M., Vinken, K., Op de Beeck, H. Abstract: Many species are able to recognize objects, but it has been proven difficult to pinpoint and compare how different species solve this task. Recent research suggested to combine computational and animal modelling in order to obtain a more systematic understanding of task complexity and compare strategies between species. In the present study, we created a large multidimensional stimulus set and designed a visual categorization task partially based upon modelling with a convolutional deep neural network (cDNN). Experiments included rats (N = 11; 1115 daily sessions in total for all rats together) and humans (N = 50). Each species was able to master the task and generalize to a variety of new images. Nevertheless, rats and humans showed very little convergence in terms of which object pairs were associated with high and low performance, suggesting the use of different strategies. There was an interaction between species and whether stimulus pairs favoured early or late processing in a cDNN. A direct comparison with cDNN representations revealed that rat performance was best captured by late convolutional layers while human performance related more to the higher-up fully connected layers. These findings highlight the additional value of using a computational approach for the design of object recognition tasks. Overall, this computationally informed investigation of object recognition behaviour reveals a strong discrepancy in strategies between rodent and human vision. Copy rights belong to original authors. Visit the link for more info Podcast created by Paper Player, LLC
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2022.11.17.516923v1?rss=1 Authors: Wu, Y.-h., Podvalny, E., He, B. J. Abstract: While there is a wealth of knowledge about core object recognition--our ability to recognize clear, high-contrast object images, how the brain accomplishes object recognition tasks under increased uncertainty remains poorly understood. We investigated the spatiotemporal neural dynamics underlying object recognition under increased uncertainty by combining MEG and 7 Tesla fMRI in humans during a threshold-level object recognition task. We observed an early, parallel rise of recognition-related signals across ventral visual and frontoparietal regions that preceded the emergence of category-related information. Recognition-related signals in ventral visual regions were best explained by a two-state representational format whereby brain activity bifurcated for recognized and unrecognized images. By contrast, recognition-related signals in frontoparietal regions exhibited a reduced representational space for recognized images, yet with sharper category information. These results provide a spatiotemporally resolved view of neural activity supporting object recognition under uncertainty, revealing a pattern distinct from that underlying core object recognition. Copy rights belong to original authors. Visit the link for more info Podcast created by Paper Player, LLC
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2022.11.16.476617v1?rss=1 Authors: Alilovic, J., Lampers, E., Slagter, H., van Gaal, S. Abstract: We occasionally misinterpret ambiguous sensory input or report a stimulus when none is presented. It is unknown whether such errors have a sensory origin and reflect true perceptual illusions, or whether they have a more cognitive origin (e.g., are due to guessing), or both. When participants performed an error-prone and challenging face/house discrimination task, multivariate EEG analyses revealed that during decision errors (mistaking a face for a house), sensory stages of visual information processing initially represent the presented stimulus category. Crucially however, when participants were confident in their erroneous decision, so when the illusion was strongest, this neural representation flipped later in time and reflected the incorrectly reported percept. This flip in neural pattern was absent for decisions that were made with low confidence. This work demonstrates that decision confidence arbitrates between perceptual decision errors, which reflect true illusions of perception, and cognitive decision errors, which do not. Copy rights belong to original authors. Visit the link for more info Podcast created by Paper Player, LLC
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2022.09.21.508907v1?rss=1 Authors: Branzi, F., Martin, C., Biau, E. Abstract: The intention to name an object modulates neural responses during object recognition tasks. However, the nature of this modulation is still unclear. We established whether a core operation in language, i.e., lexical access, can be observed even when the task does not require language (size- judgment task), and whether response selection in verbal versus non-verbal semantic tasks relies on similar neuronal processes. We measured and compared neuronal oscillatory activities and behavioural responses to the same set of pictures of meaningful objects, while the type of task participants had to perform (picture-naming versus size-judgment) and the type of stimuli to measure lexical access (cognate versus non-cognate) were manipulated. Despite activation of words was facilitated when the task required explicit word-retrieval (picture-naming task), lexical access occurred even without the intention to name the object (non-verbal size-judgment task). Activation of words and response selection were accompanied by beta (25-35 Hz) desynchronisation and theta (3-7 Hz) synchronisation, respectively. These effects were observed in both picture-naming and size-judgment tasks, suggesting that words became activated via similar mechanisms, irrespective of whether the task involves language explicitly. This finding has important implications to understand the link between core linguistic operations and performance in verbal and non-verbal semantic tasks. Copy rights belong to original authors. Visit the link for more info Podcast created by PaperPlayer
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2022.09.17.508376v1?rss=1 Authors: Ayzenberg, V., Simmons, C., Behrmann, M. Abstract: Despite their anatomical and functional distinctions, there is growing evidence that the dorsal and ventral visual pathways interact to support object recognition. However, the exact nature of these interactions remains poorly understood. Is the presence of identity-relevant object information in the dorsal pathway simply a byproduct of ventral input? Or, might the dorsal pathway be a source of input to the ventral pathway for object recognition? In the current study, we used high-density EEG - a technique with high temporal precision and spatial resolution sufficient to distinguish parietal and temporal lobes - to characterize the dynamics of dorsal and ventral pathways during object viewing. Using multivariate analyses, we found that category decoding in the dorsal pathway preceded that in the ventral pathway. Importantly, the dorsal pathway predicted the multivariate responses of the ventral pathway in a time-dependent manner, rather than the other way around. Together, these findings suggest that the dorsal pathway is a critical source of input to the ventral pathway. Copy rights belong to original authors. Visit the link for more info Podcast created by PaperPlayer
We humans have learned a lot about drugs from experimenting on rats. In this episode I discuss the findings of those experiments, and I show (yet again) that the war on drugs is designed in a way that ensures as many people as possible will struggle with addiction, and that their addictions will be as bad as they could possibly be. We couldn't design a less effective system if we tried (and I do in this episode). To read more about the mechanisms of dopamine's action and how dopamine aids in the learning process, check out the Frontiers article, "A New Paradigm for Training Hyperactive Dopamine Transporter Knockout Rats: Influence of Novel Stimuli on Object Recognition."To learn more about Rat Park, check out Maia Szalavitz's book, Unbroken Brain, or my book Dr. Junkie. To read more about B.F. Skinner's so-called "Skinner Box," which showed that rats given random doses of food will press the lever more than those given consistent doses, check out this Simply Psychology article.For more on why punishment make addiction worse, check out Maia Szalavitz's work. For more on how changing our setting can cause our tolerance mechanisms to fail, check out this article from The Journal of Psychopharmacology For statistics regarding drug arrests, see The FBI Uniform Crime Report. Cigarette smoking numbers come from The American Lung Association, “Overall Tobacco Trends,” and CDC Data: “NHIS 1974-2018.”Music on this episode from Pixabay.
With the need for facial or object recognition in more applications today, integrating MIPI based camera sensors to assorted processors can be a challenge for design engineers. In this episode of The Current, Todd Baker speaks to our very own Intelligent Sensing Engineer, Don Gunn, who takes us through new solutions he has developed to make proof of concepts easier, and how engineers can build their own camera enabled embedded systems with NXP's i.MX or Renesas's RZ processors and onsemi's imaging sensors.
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.11.11.377655v1?rss=1 Authors: Seijdel, N., Loke, J., van de Klundert, R., van der Meer, M., Quispel, E., van Gaal, S., de Haan, E. H., Scholte, H. S. Abstract: While feed-forward activity may suffice for recognizing objects in isolation, additional visual operations that aid object recognition might be needed for real-world scenes. One such additional operation is figure-ground segmentation; extracting the relevant features and locations of the target object while ignoring irrelevant features. In this study of 60 participants, we show objects on backgrounds of increasing complexity to investigate whether recurrent computations are increasingly important for segmenting objects from more complex backgrounds. Three lines of evidence show that recurrent processing is critical for recognition of objects embedded in complex scenes. First, behavioral results indicated a greater reduction in performance after masking objects presented on more complex backgrounds; with the degree of impairment increasing with increasing background complexity. Second, electroencephalography (EEG) measurements showed clear differences in the evoked response potentials (ERPs) between conditions around 200ms - a time point beyond feed-forward activity and object decoding based on the EEG signal indicated later decoding onsets for objects embedded in more complex backgrounds. Third, Deep Convolutional Neural Network performance confirmed this interpretation; feed-forward and less deep networks showed a higher degree of impairment in recognition for objects in complex backgrounds compared to recurrent and deeper networks. Together, these results support the notion that recurrent computations drive figure-ground segmentation of objects in complex scenes. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.10.26.354258v1?rss=1 Authors: Nicholson, D., Prinz, A. A. Abstract: What limits our ability to find an object we are looking for? There are two competing models: one explains attentional limitations during visual search in terms of a serial processing computation, the other attributes limitations to noisy parallel processing. Both models predict human visual search behavior when applied to the simplified stimuli often used in experiments, but it remains unclear how to extend them to account for search of complex natural scenes. Models exist of natural scene search, but they do not predict whether a given scene will limit search accuracy. Here we propose an alternate mechanism to explain limitations across stimuli types: visual search is limited by an "untangling" computation, proposed to underlie object recognition. To test this idea, we ask whether models of object recognition account for visual search behavior. The current best-in-class models are artificial neural networks (ANNs) that accurately predict both behavior and neural activity in the primate visual system during object recognition tasks. Unlike dominant visual search models, ANNs can provide predictions for any image. First we test ANN-based object recognition models with simplified stimuli typically used in studies of visual search. We find these models exhibit a hallmark effect of such studies: a drop in target detection accuracy as the number of distractors increases. Further experiments show this effect results from learned representations: networks that are not pre-trained for object recognition can achieve near perfect accuracy. Next we test these models with complex natural images, using a version of the Pascal VOC dataset where each image has a visual search difficulty score, derived from human reaction times. We find models exhibit a drop in accuracy as search difficulty score increases. We conclude that ANN-based object recognition models account for aspects of visual search behavior across stimuli types, and discuss how to extend these results. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.10.19.345561v1?rss=1 Authors: Kell, A. J. E., Bokor, S. L., Jeon, Y.-N., Toosi, T., Issa, E. B. Abstract: We report that the marmoset, a 300-gram simian primate with a flat cortex, performs a challenging high-level perceptual task in a strikingly human-like manner. Across the same set of 400 images, marmosets' image-by-image core object recognition behavior was highly similar to that of humans -- and was nearly as human-like as was macaques' (r=0.73 vs. r=0.77). Separately, we found that marmosets' visual abilities far outstripped those of rodents -- marmosets substantially outperformed rats and generalized in a far more robust manner across images. Thus, core aspects of visual perception are conserved across simian primates, and marmosets may be a powerful small model organism for high-level visual neuroscience. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.09.02.279216v1?rss=1 Authors: Reddy, L., Cichy, R. M., VanRullen, R. Abstract: Numerous theories propose a key role for brain oscillations in visual perception. Most of these theories postulate that sensory information is encoded in specific oscillatory components (e.g., power or phase) of specific frequency bands. These theories are often tested with whole-brain recording methods of low spatial resolution (EEG or MEG), or depth recordings that provide a local, incomplete view of the brain. Opportunities to bridge the gap between local neural populations and whole-brain signals are rare. Here, using representational similarity analysis we ask which MEG oscillatory components (power and phase, across various frequency bands) correspond to low or high-level visual object representations, using brain representations from fMRI, or layer-wise representations in Deep Neural Networks (DNNs) as a template for low/high-level object representations. The results showed that around stimulus onset and offset, most transient oscillatory signals correlated with low-level brain patterns (V1). During stimulus presentation, sustained beta (~20Hz) and gamma (>60Hz) power best correlated with V1, while oscillatory phase components correlated with IT representations. Surprisingly, this pattern of results did not always correspond to low- or high-level DNN layer activity. In particular, sustained beta-band oscillatory power reflected high-level DNN layers, suggestive of a feed-back component. These results begin to bridge the gap between whole-brain oscillatory signals and object representations supported by local neuronal activations. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.03.234658v1?rss=1 Authors: Wallace, K. J., Hofmann, H. A. Abstract: Sex differences in behavior and cognition can be driven by differential selection pressures from the environment and in the underlying neuromolecular mechanisms of decision-making. The highly social cichlid fish Astatotilapia burtoni exhibits dynamic and complex social hierarchies, yet explicit cognitive testing (outside of social contexts) and investigations of sex differences in cognition have yet to be fully explored. Here we assessed male and female A. burtoni in two cognitive tasks: a novel object recognition task and a spatial task. We hypothesized that given both males and females navigate dynamic social environments, we would observe less pronounced sex differences in cognition relative to other species with more 'static' sex differences. In the present study we find that both sexes prefer the familiar object in a novel object recognition task, but the time at which they exhibit this preference differs between the sexes. Females more frequently learned a spatial task, exhibiting longer decision latencies and quicker error correction, suggesting a potential speed-accuracy tradeoff. Furthermore, the sexes differ in space use in both tasks and in a principal component analysis of the spatial task. A model selection analysis finds that preference, approach, and interaction duration in the novel object recognition task that reach a threshold of importance averaged across all models. This work highlights the need to explicitly test for sex differences in cognition to better understand how individuals navigate dynamic social environments. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.07.04.188169v1?rss=1 Authors: Huang, T., Zhen, Z., Liu, J. Abstract: Human not only can effortlessly recognize objects, but also characterize object categories into semantic concepts and construct nested hierarchical structures. Similarly, deep convolutional neural networks (DCNNs) can learn to recognize objects as perfectly as human; yet it is unclear whether they can learn semantic relatedness among objects that is not provided in the learning dataset. This is important because it may shed light on how human acquire semantic knowledge on objects without top-down conceptual guidance. To do this, we explored the relation among object categories, indexed by representational similarity, in two typical DCNNs (AlexNet and VGG11). We found that representations of object categories were organized in a hierarchical fashion, suggesting that the relatedness among objects emerged automatically when learning to recognize them. Critically, the emerged relatedness of objects in the DCNNs was highly similar to the WordNet in human, implying that top-down conceptual guidance may not be a prerequisite for human learning the relatedness among objects. Finally, the developmental trajectory of the relatedness among objects during training revealed that the hierarchical structure was constructed in a coarse-to-fine fashion, and evolved into maturity before the establishment of object recognition ability. Taken together, our study provides the first empirical evidence that semantic relatedness of objects emerged as a by-product of object recognition, implying that human may acquire semantic knowledge on objects without explicit top-down conceptual guidance. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.05.10.086959v1?rss=1 Authors: Kar, K., DiCarlo, J. J. Abstract: Distributed neural population spiking patterns in macaque inferior temporal (IT) cortex that support core visual object recognition require additional time to develop for specific ("late-solved") images suggesting the necessity of recurrent processing in these computations. Which brain circuit motifs are most responsible for computing and transmitting these putative recurrent signals to IT? To test whether the ventral prefrontal cortex (vPFC) is a critical recurrent circuit node in this system, here we pharmacologically inactivated parts of the vPFC and simultaneously measured IT population activity, while monkeys performed object discrimination tasks. Our results show that vPFC inactivation deteriorated the quality of the late-phase (>150 ms from image onset) IT population code, along with commensurate, specific behavioral deficits for "late-solved" images. Finally, silencing vPFC caused the monkeys' IT activity patterns and behavior to become more like those produced by feedforward artificial neural network models of the ventral stream. Together with prior work, these results argue that fast recurrent processing through the vPFC is critical to the production of behaviorally-sufficient object representations in IT. Copy rights belong to original authors. Visit the link for more info
Facial and object recognition are becoming more prevalent in the Army's artificial intelligence technologies. The Army is also noticing hackers are using certain pixels to confuse programs that analyze images. Now the Army Research Office is looking for ways to stop these hacks. For more, Federal News Network's Scott Maucione spoke to MaryAnne Fields, the office's program manager for intelligent systems. Hear more on Federal Drive with Tom Temin.
Show Notes: (2:09) Genevieve discussed her undergraduate experience studying Electrical Engineering and Mathematics at the University of Arizona. (3:18) Genevieve talked about her Master’s work in Electrical Machines from the University of Tokyo. (6:59) Genevieve went in-depth about her research work on transverse-flux motor design during her Master’s, in which she won the Outstanding Paper award at ICEMS 2009. (11:39) Genevieve talked about her motivation to pursue a Ph.D. degree in Computer Science at Brown University after coming back from Japan. (14:17) Genevieve shared her story of finding her research advisor (Dr. James Hays) as a graduate student. (18:44) Genevieve discussed her work building and maintaining the SUN Attributes dataset, a widely used resource for scene understanding, during her first year of her Ph.D. degree. (21:52) Genevieve talked about the paper Basic Level Scene Understanding (2013), her collaboration with researchers from MIT, Princeton, and University of Washington to build a system that can automatically understand 3D scenes from a single image. (24:32) Genevieve talked about the paper Bootstrapping Fine-grained Classifiers: Active Learning with a Crowd in the Loop presented at the NIPS conference in 2013, her collaboration with researchers from UCSD and Cal-Tech to propose an iterative crowd-enabled active learning algorithm for building high-precision visual classifiers from unlabeled images. (28:25) Genevieve discussed her Ph.D. thesis titled “Collective Insight: Crowd-Driven Image Understanding.” (34:02) Genevieve mentioned her next career move - becoming a Postdoctoral Researcher at Microsoft Research New England. (36:40) Genevieve talked about her teaching experience for 2 graduate-level courses: Data-Driven Computer Vision at Brown University in Spring 2016 and Deep Learning For Computer Vision at Tufts University in Spring 2017. (38:04) Genevieve shared her 2 advice for graduate students who want to make a dent in the AI/Machine Learning research community. (41:45) Genevieve went over her startup TRASH, which develops computational filmmaking tools for mobile iphono-graphers. (43:45) Genevieve mentioned the benefit of having TRASH as part of the NYU Tandon Future Labs, which is a network of business incubator and accelerators that support early stage ventures in NYC. (45:00) Genevieve talked about the research trends in computer vision, augmented reality, and scene understanding that she’s most interested in at the moment. (45:59) Closing segment. Her Contact Info: Website GitHub LinkedIn Twitter CV Her Recommended Resources: The Trouble with Trusting AI to Interpret Police Body-Cam Video Microsoft Research Podcast Stanford’s CS231n: Convolutional Neural Networks for Visual Recognition Yann LeCun's letter to CVPR chair after bad reviews on a Vision System that "learnt" features & reviews Paperspace CVPR 2019 Michael Black’s Perceiving Systems Lab at the Max Planck Institute for Intelligent Systems Nassim Taleb’s “The Black Swan”
Show Notes: (2:09) Genevieve discussed her undergraduate experience studying Electrical Engineering and Mathematics at the University of Arizona. (3:18) Genevieve talked about her Master’s work in Electrical Machines from the University of Tokyo. (6:59) Genevieve went in-depth about her research work on transverse-flux motor design during her Master’s, in which she won the Outstanding Paper award at ICEMS 2009. (11:39) Genevieve talked about her motivation to pursue a Ph.D. degree in Computer Science at Brown University after coming back from Japan. (14:17) Genevieve shared her story of finding her research advisor (Dr. James Hays) as a graduate student. (18:44) Genevieve discussed her work building and maintaining the SUN Attributes dataset, a widely used resource for scene understanding, during her first year of her Ph.D. degree. (21:52) Genevieve talked about the paper Basic Level Scene Understanding (2013), her collaboration with researchers from MIT, Princeton, and University of Washington to build a system that can automatically understand 3D scenes from a single image. (24:32) Genevieve talked about the paper Bootstrapping Fine-grained Classifiers: Active Learning with a Crowd in the Loop presented at the NIPS conference in 2013, her collaboration with researchers from UCSD and Cal-Tech to propose an iterative crowd-enabled active learning algorithm for building high-precision visual classifiers from unlabeled images. (28:25) Genevieve discussed her Ph.D. thesis titled “Collective Insight: Crowd-Driven Image Understanding.” (34:02) Genevieve mentioned her next career move - becoming a Postdoctoral Researcher at Microsoft Research New England. (36:40) Genevieve talked about her teaching experience for 2 graduate-level courses: Data-Driven Computer Vision at Brown University in Spring 2016 and Deep Learning For Computer Vision at Tufts University in Spring 2017. (38:04) Genevieve shared her 2 advice for graduate students who want to make a dent in the AI/Machine Learning research community. (41:45) Genevieve went over her startup TRASH, which develops computational filmmaking tools for mobile iphono-graphers. (43:45) Genevieve mentioned the benefit of having TRASH as part of the NYU Tandon Future Labs, which is a network of business incubator and accelerators that support early stage ventures in NYC. (45:00) Genevieve talked about the research trends in computer vision, augmented reality, and scene understanding that she’s most interested in at the moment. (45:59) Closing segment. Her Contact Info: Website GitHub LinkedIn Twitter CV Her Recommended Resources: The Trouble with Trusting AI to Interpret Police Body-Cam Video Microsoft Research Podcast Stanford’s CS231n: Convolutional Neural Networks for Visual Recognition Yann LeCun's letter to CVPR chair after bad reviews on a Vision System that "learnt" features & reviews Paperspace CVPR 2019 Michael Black’s Perceiving Systems Lab at the Max Planck Institute for Intelligent Systems Nassim Taleb’s “The Black Swan”
The AI Eye: VSBLTY (CSE: $VSBY.C) CEO Talks About Object Recognition, Accenture (NYSE: $ACN) Collaborates with Google Cloud (NASDAQ: $GOOG) to Transform Customer Experience Using AI
The AI Eye: VSBLTY (CSE: $VSBY.C) CEO Talks About Object Recognition, Accenture (NYSE: $ACN) Collaborates with Google Cloud (NASDAQ: $GOOG) to Transform Customer Experience Using AI
Show Summary: Envision Co-Founder, Karthik, joins Jeff Thompson in the studio to talk about the EnvisionAI app. This is an OCR (Optical Character Recognition), Object Recognition and Magnifier for Low-Vision users. The EnvisionAI app is one of the fastest OCR apps that I have experienced. The Object Recognition is up to par with my expectations and with the ability to “Teach” the app specific items makes this app stand out a bit from the rest. Karthik says to stay tuned as some Object Recognition advancements will soon be showing up in the future that everyone will be excited about. The ability to scan multiple documents and with audio guidance to help center the text makes EnvisionAI compete with the best of OCR apps. Even more, the magnification feature allows one to switch the contrast to the user’s preference. The intuitive layout and settings makes EnvisionAI one of my favorite apps. With the ability to open images and photos from outside the app and the ability to teach specific objects and with facial recognition makes this Swiss Army knife like app one of my biggest surprises this year. Yes, I know EnvisionAI app has been around, but until I downloaded the app and begin my 14 day free trial, I did know what EnvisionAI could do, and it does it fast. I highly recommend everyone to download the EnvisionAI app, either on and Android or iOS device, and try it for free for 14 days. Then, make your decision. And be sure to give Karthik and the Envision Team some feedback as they are built around the community input and want to hear what you have to say. Karthik shares with us how the EnvisionAI came to be and how the BVI community is everything when it comes to making EnvisionAI everything it is today. You can find out more about Envision and the EnvisionAI app and the subscription plans that best fit your plan, on the web at www.LetsEnvision.com You can download the EnvisionAI app from the App Storeand from the GooglePlay Store. Contact: Thank you for listening! You can follow us on Twitter @BlindAbilities On the web at www.BlindAbilities.com Send us an email Get the Free Blind Abilities App on the App Store.
CliffCentral.com — The Bretts attempt to finish their feast of CES (hopefully the ADHD doesn’t kick in again!). They also chat about energy being the new internet and Object Recognition in AR.
The Free Braigo App in the App Store not only does quality OCR, the Brig App does some very interesting Object recognition, too. Jeff Thompson gives a quick demo of this new OCR app which actually does quite a good job reading the text and the accessibility is sweet. It would be nice to be able to open the created text in another app or copy the text to a word processing app but that doesn’t seem to be happening. You can save the photos here and there, but maybe I missed that. I like the simple set-up and the tap anywhere approach to take a picture. There is no location, tilt, or complexity you find in the KNFB Reader App, but, for FREE? I say they are off to a great start, especially with the effort to bring Object Recognition to the table. Possibly a wooden table at that! No, the object recognition is not all that and has a way to go but I have to give the Braigo App a strong 4 stars for being what it is. I would like to note that the images are stored in the camera roll and requires one to actually go to the camera roll and delete them as well as going into the Braigo App's menu and deleting there as well. I sent a message to the dev team with this memory hog issue plus the fact that this happens without warning. You can get the Braigo App in the App Store. Thank you for listening. You can follow us on Twitter @BlindAbilities On the web at www.BlindAbilities.com Send us an email Get the Free Blind Abilities App on the App Store.
In this episode Ronald (@opticalworm) talks about image processing and object recognition.
We consider how object recognition has evolved over the past 30 years. In alignment theory, 2-D projections are used to determine whether an additional picture is of the same object. To recognize faces, we use intermediate-sized features and correlation.
Academic researchers have developed an approach that promises to make object-recognition systems efficient users of system memory and thus also of computational resources. Article link: http://doi.ieeecomputersociety.org/10.1109/MC.2010.263
Medizinische Fakultät - Digitale Hochschulschriften der LMU - Teil 08/19
Thu, 24 Jul 2008 12:00:00 +0100 https://edoc.ub.uni-muenchen.de/9039/ https://edoc.ub.uni-muenchen.de/9039/1/Osman_Erol.pdf Osman, Erol ddc:600, ddc:610, Medizinische Fakultät